Each keypoints of the database within the k-NN is added to
the list of matches of the image it belongs. A basic method
to retrieve images corresponding to the scene is to rank
the images by descending order of the size of the lists of
matches.
4 GEOMETRICAL CONSISTENCY
Since every point of the query is associated with many
points in the database, irrelevant points of the query will
still influence the ranking. However, we can make the as
sumption that for those matches, the relative positions of
the query and target points within their respective images
are not coherent.Thus, a geometric constraint over the en
semble of matches between two images shall be able to
remove the irrelevant matches.
We test two criteria of geometrical consistency. The first
criteria is to estimate the 2D affine transform between the
two images, and then to remove the points not coherent
with it. Although the transformation between the images is
indeed 3D, we assume that under small perspective changes,
a 2D affine transform is enough to catch the transforma
tion of a single plane (in our case, the front of the build
ing). The algorithm used to estimate the affine transform
is RANSAC, a model estimation technique which can deal
with a large fraction of outliers (Fischler and Bolles, 1981).
An example of matches after the removal of non-coherent
points is shown on Fig. 4.
Figure 4: Matching points after the non-coherent to the
estimate 2D affine transform matches have been remove.
The second criterion is to keep only the matches which
correspond to the most frequent angle difference between
matched points (Jegou et al., 2008). This is done by com
puting an histogram of the difference between the principal
direction of the query and the target point of a match. We
then keep the matches corresponding to the most frequent
value in the histogram.
A pairwise matching using a distance contrast crite
rion (named Image Matching there after).
A k-NN search plus a vote (named Brute vote).
A k-NN search plus the angle differences consistency
criterion (named Angle differences).
A k-NN search plus the 2D affine transform consis
tency criterion (name Ransac).
We set k — 10 for the k-NN Search. The parameters for
the RANSAC algorithm were empiricaly set to 15 pixels
maximum distance to fit the model and minimum 3 inliers
for the affine transformation.
The first dataset consisted of 82 images of a single street
(about 350 000 keypoints). The query set contained im
ages taken by a mobile phone in front of some of the shops
in the street. As the images (both in the query set and in the
database) are direct views of the buildings, we considered
this test as easy, since the transformation between query
and its corresponding target images is simple. The second
dataset contained 300 images of a large boulevard (about
3.5 millions of keypoints). The queries where taken with
a mobile phone from the sidewalk. As the vehicle taking
the pictures was in the middle of the street, the targeted re
gions of the images (a shop, for instance) are very small.
Thus, few keypoints of each image are describing some
thing we might be looking for. As there are many severe
transformations (scaling, viewpoint changes), we consider
this test difficult. For both sets, we have manually built
the groundtruth by annotating which images correspond to
each query.
We have used three criteria for the evaluation. The first
consisted in measuring the rank of the first relevant image
retrieved (average of the query set). The second measure
was the evolution of the number of relevant image in the
retrieved set, as the size of this set increased. The third
criterion was the precision, the number of relevant images
retrieved over the number of images retrieved.
5.2 Results on Dataset 1
An example of the first images retrieved using the Brute
vote is shown on Fig. 5. The first images retrieved with
this technique have about 2000 matching keypoints (im
ages in this set contain about 5000 keypoints). There are
several occlusions between the query image and the im
ages of iTowns (for instance the car in front of the shop).
However, a relevant image is found within the first images.
Fig. 6 presents the same result, but with the Angle dif
ferences refinement. The first images retrieved have about
200 matching keypoints. As we can see, the refinement
introduced a re-ranking of the first images profitable to
195