Full text: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing (Part B1-1)

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part Bl. Beijing 2008 
170 
Image registration consists of four steps: 
• Extraction of SIFT features 
• Matching of SIFT features 
• Removal of outliers in the matched 
• Selecting the transformation model between the two 
images and estimating its parameters 
The last two steps are often combined, in particular, if outlier 
detection is part of the parameter estimation procedure. 
In the following section we briefly outline the SIFT algorithm 
and describe the matching strategy. The third section presents 
results of some experiments carried with data from a LiDAR 
test site in Stuttgart, Germany. The paper closes with some 
conclusions and recommendations. 
2. IMAGE REGISTRATION BASED ON SIFT 
FEATURE MATCHING 
2.1 The SIFT Algorithm 
The Scale Invariant Feature Transform (SIFT) was introduced 
by David Lowe (Lowe, 1999). SIFT is an approach for 
detecting and extracting distinct features from images which 
can be used to perform matching between overlapping images. 
The features are invariant to image scale and rotation and robust 
with respect to changes in illumination, noise and to some 
extend changes in the 3D camera viewpoint. 
Detection stages for the SIFT features are the following: 
• Scale-space extrema detection 
• Keypoint localization 
• Orientation assignment 
• Generation of keypoint descriptors 
Key points for SIFT features correspond to scale-space extrema 
at different scales. Therefore, the first step towards the 
detection of extremas is filtering the images with Gaussian 
kernels at different scales, and the generation of the difference 
of Gaussian filtered images of adjacent scales. 
Mathematically this reads as follows 
L(x,y,o) = G(x,y,o)* I(x,y) (2-1) 
D(x, >>, cr) = (G(x, y, ko) - G(x, y, o)) * I(x, y) (2-2) 
where I is the image, G the Gaussian kernel and L the scale- 
space image generated by convolution^). D represents the 
difference of Gaussian filtered images, also called DoG. 
The filtered image are organised in a kind of an image pyramid 
in which the blurred images are grouped by octave. An octave 
corresponds to doubling the (j -value of the Gaussian kernel. 
On each image scale (octave), a fixed number of blurred images 
(sub-levels) is created. In the experiments (Section 3) we use 5 
octaves with 5 differently blurred images on each octave. 
to its eight neighbours in the current image and the nine 
neighbours in the neighbouring (higher and lower) scales. If the 
pixel is a local maximum or minimum, it is selected as a 
candidate keypoint. 
The SIFT descriptor is a weighted and interpolated histogram of 
the gradient orientations and locations in a patch surrounding 
the keypoint. To determine the keypoint orientation, a gradient 
orientation histogram is computed employing the 
neighbourhood of the keypoint. Each neighbouring pixel 
contributes by its gradient magnitude to determine the keypoint 
orientation. Keypoints with low contrast are removed and 
responses along edges are eliminated. All the properties of the 
keypoint are measured relative to the keypoint orientation. This 
provides invariance to rotation. Histograms contain 8 bins each 
and each description contains an array of 4 histograms around 
the keypoint which leads to a SIFT vector of 128 elements. For 
more details, please refer to Lowe (1999, 2004). 
2.2 Matching SIFT Features 
In the matching stage, the descriptor matrixes of a reference and 
a target image have to be compared. Matching is based on the 
idea to find the nearest neighbours in the descriptor matrixes. 
For robustness reasons the ratio of the nearest neighbour to the 
second nearest neighbour is used. The Euclidean distance is 
given by 
where y and z are two descriptor vectors. The inner product of the 
vector (y - z) with itself 
(y-z)°(y-z) = y°y + z°z- 2\y\\z\ cos (Z(y, z)) (2-4) 
shows that with normalized vectors y and z (length 1) the angle 
a = arccos(Z(^,z)) (2-5) 
can be used as a good approximation within the minimum 
search, in particular for small angles. For each target descriptor, 
the first and second nearest descriptors must be found. Then a 
pair of nearest descriptors gives a pair of matched keypoints if 
the ratio between the distances to the first and second nearest 
descriptors are lower than a given threshold t. 
ratio (first, second) < t (2-6) 
In the experiments we studied the impact of t on the matching 
result by selecting t = 0.5, 0.6, 0.7, 0.8, 0.9, 1. 
2.3 Removal of Outliers 
Several methods exist for removing the outliers. RANSAC and 
Baarda’s data snooping method are the methods that we have 
In order to detect the local maxima and minima of of the DoG 
images across scales, each pixel in the DoG images is compared
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.