Close-range imaging, long-range vision

  
comparison requires the identification of all possible 
combinations of n objects in the panoramic view and their 
comparison to the configuration of the input image. In our scene 
analysis we make use of the distance and topology metrics from 
equation 1. 
The distance metric uses the distance ratio matrix. Considering 
a scene with n objects, the distance ratio matrix for this scene is 
a square matrix D of rank n(n-1)/2. Its rows and columns 
correspond to baselines formed between object pairs (e.g. AB, 
AC, AD, BC etc.). The matrix elements are describing distance 
ratios for these baselines. Specifically, for every row of D, we 
pick the corresponding entry to serve as the unit distance, and 
populate the row by the ratio of all other distances over this unit 
distance. More formally, each entry dij in D corresponds to the 
ratio of the ith distance (Si) over the jth distance (Sj): 
S. (2) 
— 
di = : 
J 
There are two properties of this matrix that become immediately 
apparent. First, its diagonal elements are equal to 1, as they 
correspond to the ratio of a distance to itself. Second, dj; is 
equal to //d;;. 
For the orientation metric we use the position relation matrix. 
This matrix reflects an intrinsic reference frame, and models 
query/image object orientations in respect to "lefi-of” or "right- 
of" the features themselves. 
Let us assume to have a scene with n objects obj, ...obj,.. For 
each object obj; (such that /x i « n) an extended, imaginary 
baseline is identified, connecting the centroid of obj; to the 
centroid of each object obj; (such that i <j <n). Object obj; is 
arbitrarily considered as the "top" object and obj; the "bottom" 
object, and for every other object obj; in the scene (such that / < 
k <n k =i, k # j) it is determined whether obj; lies left-of or 
right-of this line. Fixing the same objects in both the query and 
image scenes to be either top or bottom renders any rotations in 
the scenes immaterial. The calculation of left-of or right-of is a 
simple matter considering we know (from the feature matching 
algorithm) the pixel coordinates of each feature's MBR in the 
image and query scenes. Alternatively, objects can be reduced 
to a point representation, with each point substituted by its 
central position, and the position is established in a similar 
manner. Reducing objects to points reduces computational costs 
(eliminating the calculation of MBRs) but increases the 
possibility of errors. The position relation matrix provides a 
tabulation of all relations between the objects comprising a 
scene. For a detailed description of the two matrices the reader 
is referred to [Stefanidis et al., 2002]. 
For both metrics we establish a metric of similarity by 
comparing the corresponding matrices and In both cases the 
similarity of two such matrices is described by their normalized 
correlation coefficient as: 
yo ZU De@-0) (3) 
NU Iy(Q-Qy 
where I and OQ are the position relation (or distance ratio) 
matrices for the image and query scenes respectively, with 
I and Q being the averages of their respective elements. This 
coefficient is scaled between 0 and 1 to give a total scene 
position (resp. distance) matching percentage between the query 
and image object configurations. 
In our approach we first use the distance similarity metric to 
reduce the field of candidate matches, and then we proceed with 
the topology metric in order to find the best match. In order to 
facilitate the implementation of our approach we segment the 
panoramic image in a number of overlapping subsectors, each 
approximately 1.5 times the size of the incoming image. This 
eliminates the examination of impossible object configurations. 
For a 600x 480 pixel incoming image we proceed by 
segmenting the panoramic image in subsectors with the 
following pixel extends: (0-900), (300-1200), (600-1500), (900- 
1800), (1200-2100), (1500-2400), (1800-2700), (2100-3000), 
(2400-3300), and (2700-3600). Using this configuration we can 
expect that the real image will always be a part of one 
subsector, since we produce the panoramic view using a 
synthetic camera model similar to the one that actually captures 
the incoming imagery. For each subsector we search for the 
objects that belong in this sector. If the number of objects is 
smaller than the number of objects in the real image we 
continue in the next sector (as we assume that no new objects 
have been created). If the number of the objects in this sector is 
equal or greater than the number of objects we have in the real 
image we create all possible combinations of objects and 
compare them with the configuration of objects in the real 
image using the distance similarity metric. Configurations that 
produce high distance similarity percentages (above 95% in our 
experiments) are used as input for subsequent orientation 
analysis. The two similarity metrics are integrated to produce a 
weighted average, using a weight of 0.75 for the distance and 
0.25 for the orientation metric. Using this combined metric we 
rank the configurations and select as best match the one 
producing the highest similarity. This allows us to position the 
input image in space and thus estimate its azimuth. 
4. EXPERIMENTS 
In order to examine the performance of our algorithms we 
performed a variety of experiments using a VR model of the 
campus of the University of Maine. All the experiments were 
run on Pentium III processor with 512 MB RAM and the 
algorithms were implemented in Visual Basic 6.0. In this 
section we present an example of these experiments, to 
demonstrate typical accuracy and time metrics. 
Using the VR model we assumed a camera position and a 
camera model. Using this information and the corresponding 
VR model we created two synthetic images facing in different 
directions. We then created the two panorama images shown in 
figure 5. Both panoramas depict the same area, but their initial 
axes differ by 180°. Thus the center of the top panorama is at 
the edge of the bottom panorama image, and vice versa. In the 
bottom panorama image in figure 5 we identified 25 different 
objects, while in the top we identified 24. Each panorama was 
segmented into 10 overlapping subsectors as described in 3.1 
above. In Figure 4 we can see the number of objects in each 
subsector. 
In the first synthetic image (Figure 6 left), we identified seven 
different objects while in the second image (Figure 6 right) we 
identified five different objects. Accordingly, the matching 
candidate subregions for the first image were 4,5,6,7,8, and 9 
for the top panorama, and 1,2,3, and 10 for the bottom 
panorama. For the second image the candidate subregions are 
—164— 
3,4 
for 
Fig 
par 
Wh 
to 
wh 
sec 
ori 
res| 
cor 
ori 
The 
for 
sec 
cor 
can 
tim 
apr 
For 
obj 
cor 
wh 
can 
see 
the 
cha 
res 
poi 
ex 
all 
44 
cor 
can 
apr 
In 
rec 
are 
in 
usi 
suc 
we 
im: 
req 
im: 
nat 
to 
che 
urt
1
2
...
177
178
179
180
181
...
640
641
Full text: Close-range imaging, long-range vision

Access restriction

Copyright

Note to user