comparison requires the identification of all possible
combinations of n objects in the panoramic view and their
comparison to the configuration of the input image. In our scene
analysis we make use of the distance and topology metrics from
equation 1.
The distance metric uses the distance ratio matrix. Considering
a scene with n objects, the distance ratio matrix for this scene is
a square matrix D of rank n(n-1)/2. Its rows and columns
correspond to baselines formed between object pairs (e.g. AB,
AC, AD, BC etc.). The matrix elements are describing distance
ratios for these baselines. Specifically, for every row of D, we
pick the corresponding entry to serve as the unit distance, and
populate the row by the ratio of all other distances over this unit
distance. More formally, each entry dij in D corresponds to the
ratio of the ith distance (Si) over the jth distance (Sj):
S. (2)
—
di = :
J
There are two properties of this matrix that become immediately
apparent. First, its diagonal elements are equal to 1, as they
correspond to the ratio of a distance to itself. Second, dj; is
equal to //d;;.
For the orientation metric we use the position relation matrix.
This matrix reflects an intrinsic reference frame, and models
query/image object orientations in respect to "lefi-of” or "right-
of" the features themselves.
Let us assume to have a scene with n objects obj, ...obj,.. For
each object obj; (such that /x i « n) an extended, imaginary
baseline is identified, connecting the centroid of obj; to the
centroid of each object obj; (such that i <j <n). Object obj; is
arbitrarily considered as the "top" object and obj; the "bottom"
object, and for every other object obj; in the scene (such that / <
k <n k =i, k # j) it is determined whether obj; lies left-of or
right-of this line. Fixing the same objects in both the query and
image scenes to be either top or bottom renders any rotations in
the scenes immaterial. The calculation of left-of or right-of is a
simple matter considering we know (from the feature matching
algorithm) the pixel coordinates of each feature's MBR in the
image and query scenes. Alternatively, objects can be reduced
to a point representation, with each point substituted by its
central position, and the position is established in a similar
manner. Reducing objects to points reduces computational costs
(eliminating the calculation of MBRs) but increases the
possibility of errors. The position relation matrix provides a
tabulation of all relations between the objects comprising a
scene. For a detailed description of the two matrices the reader
is referred to [Stefanidis et al., 2002].
For both metrics we establish a metric of similarity by
comparing the corresponding matrices and In both cases the
similarity of two such matrices is described by their normalized
correlation coefficient as:
yo ZU De@-0) (3)
NU Iy(Q-Qy
where I and OQ are the position relation (or distance ratio)
matrices for the image and query scenes respectively, with
I and Q being the averages of their respective elements. This
coefficient is scaled between 0 and 1 to give a total scene
position (resp. distance) matching percentage between the query
and image object configurations.
In our approach we first use the distance similarity metric to
reduce the field of candidate matches, and then we proceed with
the topology metric in order to find the best match. In order to
facilitate the implementation of our approach we segment the
panoramic image in a number of overlapping subsectors, each
approximately 1.5 times the size of the incoming image. This
eliminates the examination of impossible object configurations.
For a 600x 480 pixel incoming image we proceed by
segmenting the panoramic image in subsectors with the
following pixel extends: (0-900), (300-1200), (600-1500), (900-
1800), (1200-2100), (1500-2400), (1800-2700), (2100-3000),
(2400-3300), and (2700-3600). Using this configuration we can
expect that the real image will always be a part of one
subsector, since we produce the panoramic view using a
synthetic camera model similar to the one that actually captures
the incoming imagery. For each subsector we search for the
objects that belong in this sector. If the number of objects is
smaller than the number of objects in the real image we
continue in the next sector (as we assume that no new objects
have been created). If the number of the objects in this sector is
equal or greater than the number of objects we have in the real
image we create all possible combinations of objects and
compare them with the configuration of objects in the real
image using the distance similarity metric. Configurations that
produce high distance similarity percentages (above 95% in our
experiments) are used as input for subsequent orientation
analysis. The two similarity metrics are integrated to produce a
weighted average, using a weight of 0.75 for the distance and
0.25 for the orientation metric. Using this combined metric we
rank the configurations and select as best match the one
producing the highest similarity. This allows us to position the
input image in space and thus estimate its azimuth.
4. EXPERIMENTS
In order to examine the performance of our algorithms we
performed a variety of experiments using a VR model of the
campus of the University of Maine. All the experiments were
run on Pentium III processor with 512 MB RAM and the
algorithms were implemented in Visual Basic 6.0. In this
section we present an example of these experiments, to
demonstrate typical accuracy and time metrics.
Using the VR model we assumed a camera position and a
camera model. Using this information and the corresponding
VR model we created two synthetic images facing in different
directions. We then created the two panorama images shown in
figure 5. Both panoramas depict the same area, but their initial
axes differ by 180°. Thus the center of the top panorama is at
the edge of the bottom panorama image, and vice versa. In the
bottom panorama image in figure 5 we identified 25 different
objects, while in the top we identified 24. Each panorama was
segmented into 10 overlapping subsectors as described in 3.1
above. In Figure 4 we can see the number of objects in each
subsector.
In the first synthetic image (Figure 6 left), we identified seven
different objects while in the second image (Figure 6 right) we
identified five different objects. Accordingly, the matching
candidate subregions for the first image were 4,5,6,7,8, and 9
for the top panorama, and 1,2,3, and 10 for the bottom
panorama. For the second image the candidate subregions are
—164—
3,4
for
Fig
par
Wh
to
wh
sec
ori
res|
cor
ori
The
for
sec
cor
can
tim
apr
For
obj
cor
wh
can
see
the
cha
res
poi
ex
all
44
cor
can
apr
In
rec
are
in
usi
suc
we
im:
req
im:
nat
to
che
urt