The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part Bl. Beijing 2008
170
Image registration consists of four steps:
• Extraction of SIFT features
• Matching of SIFT features
• Removal of outliers in the matched
• Selecting the transformation model between the two
images and estimating its parameters
The last two steps are often combined, in particular, if outlier
detection is part of the parameter estimation procedure.
In the following section we briefly outline the SIFT algorithm
and describe the matching strategy. The third section presents
results of some experiments carried with data from a LiDAR
test site in Stuttgart, Germany. The paper closes with some
conclusions and recommendations.
2. IMAGE REGISTRATION BASED ON SIFT
FEATURE MATCHING
2.1 The SIFT Algorithm
The Scale Invariant Feature Transform (SIFT) was introduced
by David Lowe (Lowe, 1999). SIFT is an approach for
detecting and extracting distinct features from images which
can be used to perform matching between overlapping images.
The features are invariant to image scale and rotation and robust
with respect to changes in illumination, noise and to some
extend changes in the 3D camera viewpoint.
Detection stages for the SIFT features are the following:
• Scale-space extrema detection
• Keypoint localization
• Orientation assignment
• Generation of keypoint descriptors
Key points for SIFT features correspond to scale-space extrema
at different scales. Therefore, the first step towards the
detection of extremas is filtering the images with Gaussian
kernels at different scales, and the generation of the difference
of Gaussian filtered images of adjacent scales.
Mathematically this reads as follows
L(x,y,o) = G(x,y,o)* I(x,y) (2-1)
D(x, >>, cr) = (G(x, y, ko) - G(x, y, o)) * I(x, y) (2-2)
where I is the image, G the Gaussian kernel and L the scale-
space image generated by convolution^). D represents the
difference of Gaussian filtered images, also called DoG.
The filtered image are organised in a kind of an image pyramid
in which the blurred images are grouped by octave. An octave
corresponds to doubling the (j -value of the Gaussian kernel.
On each image scale (octave), a fixed number of blurred images
(sub-levels) is created. In the experiments (Section 3) we use 5
octaves with 5 differently blurred images on each octave.
to its eight neighbours in the current image and the nine
neighbours in the neighbouring (higher and lower) scales. If the
pixel is a local maximum or minimum, it is selected as a
candidate keypoint.
The SIFT descriptor is a weighted and interpolated histogram of
the gradient orientations and locations in a patch surrounding
the keypoint. To determine the keypoint orientation, a gradient
orientation histogram is computed employing the
neighbourhood of the keypoint. Each neighbouring pixel
contributes by its gradient magnitude to determine the keypoint
orientation. Keypoints with low contrast are removed and
responses along edges are eliminated. All the properties of the
keypoint are measured relative to the keypoint orientation. This
provides invariance to rotation. Histograms contain 8 bins each
and each description contains an array of 4 histograms around
the keypoint which leads to a SIFT vector of 128 elements. For
more details, please refer to Lowe (1999, 2004).
2.2 Matching SIFT Features
In the matching stage, the descriptor matrixes of a reference and
a target image have to be compared. Matching is based on the
idea to find the nearest neighbours in the descriptor matrixes.
For robustness reasons the ratio of the nearest neighbour to the
second nearest neighbour is used. The Euclidean distance is
given by
where y and z are two descriptor vectors. The inner product of the
vector (y - z) with itself
(y-z)°(y-z) = y°y + z°z- 2\y\\z\ cos (Z(y, z)) (2-4)
shows that with normalized vectors y and z (length 1) the angle
a = arccos(Z(^,z)) (2-5)
can be used as a good approximation within the minimum
search, in particular for small angles. For each target descriptor,
the first and second nearest descriptors must be found. Then a
pair of nearest descriptors gives a pair of matched keypoints if
the ratio between the distances to the first and second nearest
descriptors are lower than a given threshold t.
ratio (first, second) < t (2-6)
In the experiments we studied the impact of t on the matching
result by selecting t = 0.5, 0.6, 0.7, 0.8, 0.9, 1.
2.3 Removal of Outliers
Several methods exist for removing the outliers. RANSAC and
Baarda’s data snooping method are the methods that we have
In order to detect the local maxima and minima of of the DoG
images across scales, each pixel in the DoG images is compared