CIP A 2003 XIX th International Symposium, 30 September-04 October, 2003, Antalya, Turkey
105
e terrain
:omplex.
a small
ould be
from an
dent ray
; point
:essfully
from the
le Rollei
a Sony
cations).
res of a
a rigid
mera. In
illy, i.e.,
settings
ley were
ic 5 used
il length
i for the
f, was
roviding
e size as
> of the
shorter
ct some
resented
3. PHOTOGRAMMETRIC RESTITUTION
A subset of 10 Rollei images covering a part of the first
courtyard was selected to compare the 3-D reconstruction
process normally used in close-range photogrammetry with
methods preferably applied in computer vision. The digital
images were taken parallel to the object facade having
relatively short distances between the camera positions. Image
data were obtained by manual pointwise measurement within
the AICON DPA-Pro and PhotoModeler software. Then,
interior and exterior orientation parameters as well as 3-D
coordinates of object points were determined by self
calibrating bundle adjustment. Fig. 3 shows the result of the
visualization of this part of the castle.
Figure 3. Wartburg Castle: Part of the first courtyard
4. FULLY AUTOMATIC SEQUENCE ORIENTATION,
AUTO-CALIBRATION, AND 3D RECONSTRUCTION
The goal of this part was to investigate, whether a sequence of
images, for which the only thing known is, that they are
perspective and that they mutually overlap, is enough for a
metric reconstruction of the scene. Additionally we were
interested to compare the automatically computed camera
calibration information to given calibration information.
4.1 Sequence Orientation Based on the Trifocal Tensor
While other approaches use image pairs as their basic building
block (Pollefeys, 2002), our solution for the fully automatic
orientation of an image sequence relies on triplets which are
linked together (Hao and Mayer, 2003). To deal with the
complexity of larger images, image pyramids are employed. By
using the whole image as search-space, the approach works
without parameter adjustment for a large number of different
types of scenes.
The basic problem for the fully automatic computation of the
orientation of images of an image sequence is the
determination of (correct) correspondences. We tackle this
problem by using point features and by sorting out valid
correspondences employing the redundancy in image triplets.
Particularly, we make use of the trifocal tensor (Hartley and
Zisserman, 2000) and RANSAC (random sample consensus;
Fischler and Bolles, 1981). Like the fundamental matrix for
image pairs, the trifocal tensor comprises a linear means for
the description of the relation of three perspective images.
Only by the linearity it becomes feasible to obtain a solution
when no approximate values are given. RANSAC, on the other
hand, gives means to find a solution when many blunders
exist.
Practically, first points are extracted with the Forstner
operator. In the first image the number of points is reduced by
regional non-maximum suppression. The points are then
matched by (normalized) cross-correlation and sub-pixel
precise coordinates are obtained by least squares matching. To
cope with the computational complexity of larger images, we
employ image pyramids. On the coarsest level of the image
pyramid, with a size of approximately 100 x 100 pixels, we use
the whole image size as search space and determine
fundamental matrices for image pairs. From the fundamental
matrices, epipolar lines are computed. They reduce the search
space on the next level. There, the trifocal tensor is
determined. With it a point given in two images can be
projected into a third image, allowing to check a triple of
matches, i.e., to sort out blunders. For large images, the
trifocal tensor is also computed for the third coarsest level. To
achieve highly precise and reliable results, after the linear
solution projection matrices are determined and with them a
robust bundle adjustment is computed for the pairs as well as
for the triplets.
To orient the whole sequence, the triplets are linked. This is
done in two steps. First, the image points in the second and
third image of the nth triplet are projected into the third image
of the n plus first triplet by the known trifocal tensor for the n
plus first triplet. As the (projective) 3D coordinates of the nth
triplet are known, the orientation of the third image in the
projective space of the nth triplet can be computed via inverse
projection. To obtain high precision, a robust bundle
adjustment is employed. In the second step. 3D coordinates in
the coordinate system defined by the nth triplet are determined
linearly for all points in the n plus first triplet that have not
been computed before. The solution is again improved by
robust bundle adjustment. Starting with the first image, this
incrementally results into the projective projection matrices for
all images as well as in 3D points. After having basically
oriented the sequence on the two or three coarsest levels of the
image pyramid, finally, the 3D points are projected into all
images via the computed projection matrices. The resulting
points are then tracked over one or two levels through the
pyramid.
Figure 4 gives results for the first four images of the sequence
taken with the Rollei d30 metric 5 camera. One can see that the
points have been tracked pretty well even for the wall close to
the camera, where the disparities are rather large. For the
whole sequence of ten images we have obtained 56 10-fold, 41
9-fold, 91 8-fold, 71 7-fold, 55 6-fold, 30 5-fold, 41 4-fold,
and 44 3-fold matches after robust adjustment with a standard
deviation of 0.08 pixels.