The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B5. Beijing 2008
634
reference image, the only unknown is the relative scale which is
computed as the median of the ratios of the distances to the five
3D points in the two models.
We employ image pyramids to make the procedure more
efficient. Because of this we can afford to use as initial search
space the whole image, though on the highest pyramid level
with a typical resolution of 100 X 100 pixels. On the second and
third highest level the epipolar lines derived from the essential
matrices and trifocal constraints are employed, respectively.
After reconstructing triplets they are linked based on the
overlapping images. E.g., triplets consisting of images 1 -2 -3
and 2 -3 -4 overlap by images 2 and 3. For those two images
projection matrices can be computed from the trifocal tensors
(Hartley and Zisserman, 2003) and from them in turn a
Euclidean transformation mapping from the first to the second
triplet. In (Mayer, 2007b) we have shown how to speed up
linking by conducting it hierarchically, at the same time
avoiding also a bias in the estimation process due to the
combination of sequences of very different lengths (e.g., when
one links 3 images to 90 images). During linking we also track
points by projecting them into newly linked images and
determining the image coordinates via LSM, resulting in highly
precise n-fold points.
determine planes from the 3D points. Particularly, we follow
(Mayer, 2007a). Because the vertical direction is predominant in
urban scenes, we determine it first from the image of the
vanishing point in the form of the intersection point of the
projections of vertical lines in the scene into the images
computed by means of RANSAC. Orienting the whole scene
vertically helps considerably to determine the boundaries of the
partially vertical planes.
The planes themselves are also obtained by RANSAC and
additionally least squares adjustment. For the planes two
parameters must be given by the user: A threshold determining
the distance of points from the plane and the maximum distance
between points on the plane, the latter avoiding large planes
consisting of a dense cluster of correct points and few randomly
distributed points which by chance lie on the plane.
For each plane texture is determined also for partially occluded
regions by means of a consensus approach (Mayer, 2007a). The
latter allows to reconstruct the correct texture even it is visible
in less than 50% of the images which can see the particular
region. The results of plane reconstruction have been used for
facade interpretation (Mayer and Reznik, 2007, Reznik and
Mayer, 2007).
The linking of the triplets is done on the second or third highest
level of the pyramid, depending on the image size. After linking
the points are projected into the original resolution images, once
again producing highly accurate relative coordinates by means
of LSM.
After all steps we employ robust bundle adjustment (McGlone
et al., 2004). E.g., also when estimating essential matrices and
trifocal tensors we compute a bundle solution every couple of
hundred iterations as we found that only the maximum
likelihood bundle solution is reliable for difficult sequences
(Mayer, 2008)
The outcome of the above process are the relative orientations
of cameras as well as 3D points. The coordinate system is fixed
to the first camera and the scale is determined by the base from
the first to the second camera for which the length is set to one.
While this gives basic information about the 3D structure of the
scene, it does not allow, e.g., to compute visibility. Ideal for this
would be dense depth maps, but there is no standard robust
approach for their computation available. Recent approaches
such as (Strecha et al., 2004, Lhuillier and Quan, 2005,
Hirschmuller, 2008) all have their shortcomings.
(Pollefeys et al., 2008) have shown dense depth maps computed
in real-time for extended areas, but the resulting 3D model
suffers from occlusions and incorrect shapes as no information
about the imaged objects is included. (Comelis et al., 2008)
make use of the knowledge that facades or similar objects are
imaged by employing ruled surfaces parallel to the vertical
direction. This improves the result, but still some non-vertical
objects are not reconstructed with their correct shape. Finally
we note, that (Pollefeys et al., 2008) and (Comelis et al., 2008)
both employ dense video data, which considerably restricts the
search space, thus allowing for real-time processing on
graphical processing units (GPU).
As we focus on urban scenes where planes are abundant and
often describe important objects such as walls, we decided to
3. 3D RECONSTRUCTION FROM IMAGES FROM A
MICRO UAV
A Micro UAV is a very small and light UAV. Thus, it is very
appropriate to explore built-up areas. It renders it possible to fly
through streets and into courtyards and to take there images of
buildings and their facades from an off-ground perspective
independently of ground conditions or obstacles on the ground.
In our first experiments we investigated if images from a Micro
UAV can be used for 3D reconstruction. We employed a
quad-copter, i.e., a UAV with four rotors, with a diameter of 1
meter, and a weight under 1 kg. It carried a ten Megapixel
consumer camera. Figure 1 shows the planned image
configuration ’’Circle”.
Figure 1. Planned image configuration ’’Circle”.