The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVIJ. Part Bib. Beijing 2008
this set of points will result a lot of outliers. The bottom image
uses SIFT operator to extract points which are almost on the
main building. We don’t need a pre processing step compared
with using Harris operator. Points extracted by SIFT are highly
distinctive. This is because the SIFT operator takes advantage
of scale-space extrema detection, and detected points are local
extrema with respect to both space and scale.
2.2 Computation of fundamental matrix
The fundamental matrix expresses the geometry structure
between two views. The general method needs at least 8
corresponding points, m i <r-> m' j , to solve linearly matrix F
which satisfies the condition fn' j Fm j = 0 . With more than 8
pairs of points, a least-squares approach minimizes the cost
function in equation (1)
As described in (Hartley, 2000) and (Marc, 2004), we can
recover the structure of scene and the motion of camera from
single or multi-view. In this paper we consider multi-view. The
critical problem of reconstruction model in multi-view is to find
corresponding features in the images. In a complex man-made
scene, even advanced point extracting algorithm like Scale-
invariant feature transform (SIFT) (Lowe, 2004) still induce a
lot of wrong matches. In such case, a traditional least-squares
based approach will fail to compute the fundamental matrix.
Therefore a robust method is needed.
2.1 Feature extraction and matching
Typical point extraction and matching approaches make use of
the Harris operator to extract comer points in multi-view
separately and then compare them with an intensity constraint
using dissimilar measurement, e.g. sum-of-square-differences
(SSD) or zero-mean normalized cross-correlation (ZNCC).
These measurements are invariant to image translation and are
difficult to choose measuring window size especially in
repeated or deficient texture region. Therefore we need an
advanced approach like SIFT to cope with large variations in
camera pose.
C(F) = £ (d(m', Fmj + d(m,F T m'f)
When the outliers are more than 50%, the least-squares
approach will fail. We use a well-developed
estimation method, RANdom Sample Consensus
(RANSAC) (Fischler, 1981), to detect outliers. The
results before and after outliers deleting are shown
in
Figure.3. Each figure is superimposed by two views, and the
two end points of each red line in the figure denote a
corresponding point pair.