633
AUTOMATED 3D RECONSTRUCTION OF URBAN AREAS FROM NETWORKS OF
WIDE-BASELINE IMAGE SEQUENCES
HelmutMayer, JanBartelsen
Institute of Geoinformation and Computer Vision, Bundeswehr University Munich - (Helmut.Mayer,
Jan.Bartelsen)@unibw.de, www.unibw.de/ipk
KEY WORDS: Computer Vision, Virtual Landscape, Close Range Photogrammetry, Visualization, Urban Planning
ABSTRACT:
The efficient automated reconstruction of highly detailed 3D models of urban areas for visualization and analysis is an active area of
research for diverse applications ranging from surveillance to architecture. A flexible and cheap data source are wide-baseline image
sequences generated with hand-held consumer cameras with several to tens of Megapixels. Image sequences are particularly suitable
for the reconstruction of 3D structures along linear objects such as roads. This paper presents an approach for 3D reconstruction from
image sequences taken with a weakly calibrated camera with no need for approximations for position and attitude, markers on the
ground, or even ground control. The generated 3D reconstruction result is relative, i.e., the scale is not known, but Euclidean, that is,
right angles are preserved. The paper shows that the approach allows to produce a 3D reconstruction consisting of points, camera
positions and orientations, as well as vertically oriented planes from image sequences taken with a Micro Unmanned Aerial Vehicle
(UAV) under challenging wind conditions and without navigation information. Finally, the paper discusses how sequences can be
linked into networks, or also images into blocks, clarifying which image configurations exist and how they can be adequately treated
when prior knowledge about them is available.
1. INTRODUCTION
A recent special issue of the International Journal of Computer
Vision on “Modeling and Representations of Large-Scale 3D
Scenes” (Zhu and Kanade, 2008) with a special focus on urban
areas exemplifies the importance of the field with applications
in “mapping, surveillance, transportation planning, archaeology,
and architecture” (Zhu and Kanade, 2008). Of particular interest
are (Pollefeys et al., 2008, Comelis et al., 2008) which like us
employ images as primary data source, yet with a focus on
video data taken from cars and using GPS and INS data.
Contrary to this, our approach for 3D reconstruction is aiming at
wide-baseline scenarios with basically no need for
approximations for position and attitude or markers in the scene.
While our previous work was on uncalibrated cameras (Mayer,
2005), we now assume that the camera is weakly calibrated,
meaning that principal distance and point as well as sheer are
known up to a couple of percent. Based on this assumption we
can use the 5-point algorithm of (Nister, 2004) which makes the
reconstruction much more stable, particularly for (nearly) planar
scenes.
While no approximations for position and attitude are needed
and also the images are allowed to be rotated against each other,
the images of the sequence still have to fulfill certain constraints
to obtain a useful result. First of all, all triplets of images in the
sequence have to overlap significantly, to allow for the reliable
propagation of 3D structure. Additionally for a reliable
matching, the appearance of the visible objects should not
change too severely from image to image and there should not
be large areas with occlusions.
We introduce our approach to 3D reconstruction from
wide-baseline image sequences in Section 2. Besides camera
orientations we reconstruct 3D points and from them planes
which are a good means to describe dense 3D structure in urban
areas, e.g., to determine visibility.
This gives way to the 3D reconstruction from image sequences
taken from a Micro Unmanned Aerial Vehicle (UAV) presented
in Section 3. In spite of the lack of information on strongly
varying position and attitude of the camera we could still orient
the images and produce a 3D model including textured planes.
The experiences with the UAV led us to an analysis of different
imaging configurations, consisting of sequences which can be
linked at the ends or also in between, in both cases leading to
networks, as well as more random configurations which can
give way to image blocks. In Section 4 we show how the
different configurations can be adequately treated. We finally
end up with conclusions.
2. 3D RECONSTRUCTION FROM WIDE-BASELINE
Our current approach for 3D reconstruction from wide-baseline
image sequences extends (Mayer, 2005) to a (weakly) calibrated
setup. It starts by extracting points (Forstner and Gulch, 1987).
The eigen-vectors of the points are employed to normalize the
orientation of the image patches (Mayer, 2008) subsequently
used for cross-correlation employing color information. If the
correlation score is beyond a low threshold of 0.5, affine least
squares matching (LSM) is used. Matches are checked a second
time via the correlation score after matching, this time with a
more conservative threshold of 0.8.
From corresponding points in two or three images essential
matrices or calibrated trifocal tensors (Hartley and Zisserman,
2003) are robustly computed using the five point algorithm by
(Nister, 2004) in conjunction with Random Sample Consensus -
RANSAC (Fischler and Bolles, 1981). To obtain a more reliable
solution we employ the robust geometric information criterion -
GRIC of (Torr, 1997). For three images two times the five point
algorithm is employed with the same reference image and the
same five points in the reference image. Because of the