International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B5. Istanbul 2004
the required 3D geometry. If it were possible to automat-
ically find a set of features between overlapping images,
then it would be possible to run the bundle adjustment, and
compute the exterior orientation. In the past, such match-
ing has been done only in restricted situations. These re-
strictions are that the motion between the images is rela-
tively small, and that there is some a-priori process which
labels images as overlapping. For example, if the images
were taken from a video sequence the baseline between
then is small and they are also likely to have significant
overlap.
If we consider the set of images that are used in a typi-
cal photogrammetry project these two restrictions do not
hold. The input to the photogrammetric model building
process is a set of unordered images with a wide physical
separation, that is with a wide baseline. There have been a
number of recent attempts to compute reliable correspon-
dences between such images. The most complete system
(Schaffalitzky and Zisserman, 2002) is a wide baseline, un-
ordered matching system that uses affine invariants as the
feature descriptor. This system attempts to construct a min-
imal spanning tree of the adjacency graph and it does not
allow this spanning tree to form cycles. However, cycles
are very important in the bundle adjustment process be-
cause they reduce the propogation of errors. Another ap-
proach (Ferrari et al., 2003) tries to expand on the number
of correspondences between adjacent views by using the
concept of tracks. The method of (Martinec and Pajdla,
2002) also has tracks, but uses a trilinear tensor to increase
the number of tracks that have been found. There is also
a variety of different descriptors (Baumberg, 2000, Pritch-
ett and Zisserman, 1998) that have been used as features,
other than the corner based approaches.
This paper describes a way of solving the wide baseline,
unordered image matching problem which can interface
with standardized photogrammetric packages. The input
is a set of images without any intrinsic camera parame-
ters, and the output is a set of correspondences between
these images. These correspondences are then sent to a
commercial photogrammetric package, which uses them
to compute the exterior orientation of the cameras. While
the bundle adjustment process requires camera calibration,
the correspondence process does not. The pixel size, and
aspect ratio for the camera, can be obtained from the cam-
era manufacturers data. Since many modern digital cam-
eras have a zoom lens this camera parameter is not easily
obtained without a formal calibration step. The automatic
correspondence process also has the ability to automati-
cally find the focal length of the camera. Thus our system
computes the correspondences between images and auto-
matically calibrates the focal length.
Our approach computes the fundamental matrix between
image pairs, and the trilinear tensor between image triples.
It is based on the SIFT corner detector (Lowe, 1999), which
has been shown experimentally to be the most robust cor-
ner features (Mikolajczk and Schmid, 2003). The cor-
respondences that support the trilinear tensor are chained
strung together in longer tracks using a breadth first search.
This method allows cycles in the image adjacency graph,
and also uses a chained trilinear tensor, which makes it dif-
ferent from other approaches. The idea of using chained
trilinear tensors was first described in (Roth and White-
head, 2000), and has been shown to create very reliable
correspondences. If there are n input images one would
expect the overall complexity of this process to be O(n?),
However, in practice, the complexity is actually O(m?),
where m is the average number of images that have sign-
ficant overlap with any given image. We will show ex-
perimentally that m is typically about one half of n, the
total number of images. Therefore O(rm?) complexity is
not excessive because 7n is relatively small. A typical pho-
togrammetric project has thirty images (1?) and there are
at most ten to fifteen overlapping images (m) for gach of
these thirty images. The total processing time to compute
all the correspondences in such a case is about fifteen min-
utes on a 2 gigahertz machine. Finally, as a by product
we can use the calcuated fundamental matrices to perform
autocalibration of the camera focal length.
3 ALGORITHM STEPS
The input is an unordered set of images. The output is a set
of correspondences among these images. The processing
is as follows:
For every image pair compute
a fundamental matrix.
For all fundamental matrices
that have enough supporting matches
compute an associated trilinear tensor.
Chain the trilinear tensor matches
using bread first search.
Output the chained matches
as valid correspondences.
4 PROJECTIVE VISION
Structure from motion algorithms assume that a camera
calibration is available. This assumption is removed in the
projective paradigm. To explain the basic ideas behind the
projective paradigm, we must define some notation. As
in (Xu and Zhang, 1996), given a vector x = [x,y,-]”,
x defines the augmented vector created by adding one as
the last element. The projection equation for a point in 3D
space defined as X — [X, Y, Z]". is:
sm = PX
where s is an arbitrary scalar, P is a 3 by 4 projection ma-
trix, and m — [x, y]" is the projection of this 3D point onto
the 2D camera plane. If this camera is calibrated, then the
calibration matrix C, containing the information particu-
lar to this camera (focal length, pixel dimensions, etc.) is
known. If the raw pixel co-ordinates of this point in the
camera plane are u — [u, v]? , then:
ü= Cm
re P p—t "m mr
al N M RENS, jud rdc Se P Tj 7
M hant a Reyund ^ ume unm s 0 Jum IU NN