In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 — Paris, France, 3-4 September, 2009
STUDY OF SIFT DESCRIPTORS FOR IMAGE MATCHING BASED LOCALIZATION IN URBAN
STREET VIEW CONTEXT
David Picard 1 , Matthieu Cord 1 and Eduardo Valle 2
1 LIP6 UPMC
Paris 6
104 avenue du Président Kennedy
75016 Paris FRANCE
{david.picard, matthieu.cord} @lip6.fr
2 ETIS, CNRS, ENSEA, Univ Cergy-Pontoise,
F-95000 Cergy-Pontoise
mail @eduardovalle.com
KEY WORDS: Image, Databases, Matching, Retrieval, Urban, High resolution
ABSTRACT
In this paper we evaluate the quality of vote-based retrieval using SIFT descriptors in a database of street view photog
raphy, a challeging context where the fraction of mismatched descriptors tends to be very high. This work is part of the
iTowns project, for which high resolution street views of Paris have been taken. The goal is to retrieve the views of a
urban scene given a query picture. We have carried out experiments for several techniques of image matching, including
a post-processing step to check the geometric consistency of the results. We have shown that the efficiency of SIFT based
matching depends largely on the image database content, and that the post-processing step is essential to the retrieval
performances.
1 INTRODUCTION
In this paper, we evaluate the effectiveness of a voting strat
egy using SIFT descriptors for near-duplicate retrieval of
urban scenes. We have observed that, compared to previ
ously repported applications of SIFT (object recognition,
stereoscopy, etc.) (Lowe, 2003) this context presents the
challenge of a very high rate of descriptor mismatches,
due to the complexity of both the scene and the transfor
mations it might suffer. We have thus, evaluated how dif
ferent strategies to filter out the false matches can improve
the effectiveness of retrieval.
This study is part of the iTowns project, which is about
defining a new generation of multimedia web tools that
mixes a broadband 3D geographic image-based browser
with an image-based search engine 1 . Fig. 1 shows an ex
ample of pictures taken for the project.
The first goal of the new type of search engine, is to re
trieve, in the high-resolution database, the scene correspond
ing to a given query image. Let us imagine the following
scenario: a user is looking for information about a restau
rant in front of him (feedback from patrons, for instance).
He takes a picture of the restaurant with his phone and send
it to the iTowns web server. The image is matched on the
database and the desired information is retrieved and sent
back to the user.
In order to accomplish this goal, there is basically three
steps to perform :
1. Match the query image with the corresponding scene
in the database.
2. Find information associated with the scene and re
lated to the query.
1 See http://itowns.ign.fr
3. Retrieve only relevant information regarding the user
interests.
In this paper, we focus on the first part, and consider the
use of state of the art techniques for near-duplicate image
matching. Recently, techniques have been developed for
the detection of copies where transformations between im
ages are well known (rotation, scaling, global illumination
change etc). Those techniques involve the extraction of
points of interest in the images, then the matching of the
points in the query with the points in the database, and the
aggregation of the matches for images of the database us
ing a voting strategy. We try to extend these techniques
to the matching of images with less constrainted, and thus
more realistic transformations (change of viewpoint, local
illumination, etc).
The paper is organized as follows: the next section intro
duces keypoint-based image matching. We explain in sec
tion 3 the strategy used to perform an efficient approximate
k-NN search in the database in order to associate query
points with points in the database. Then, we detail in sec
tion 4 the geometrical consistency used to filter irrelevant
matches. Experiments are done on two representative sub
sets of the iTowns collection, and results are shows in sec
tion 5, before we conclude.
2 KEYPOINTS BASED IMAGE MATCHING
The essential elements of keypoint-based image matching
appeared in (Schmid and Mohr, 1997): the use of points of
interest, local descriptors computed around those points, a
dissimilarity criterion based on a vote-counting algorithm,
and a step of consistency checking on the matches before
the final vote count and ranking of the results. We use
the SIFT points of interest (Lowe, 2003) to describe the