In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 — Paris, France, 3-4 September, 2009
AUTOMATED SELECTION OF TERRESTRIAL IMAGES FROM SEQUENCES
FOR THE TEXTURE MAPPING OF 3D CITY MODELS
Sébastien Bénitez and Caroline Baillard
SIRADEL, 3 allée Adolphe Bobierre CS 24343, 35043 Rennes, France
sben i tez @ siradel .com
KEY WORDS: Building, Texture, Image, Sequences, Terrestrial, Automation.
ABSTRACT:
The final purpose of this study is to texture map existing 3D building models using calibrated images acquired with a terrestrial
vehicle. This paper focuses on the preliminary step of automated selection of texture images from a sequence. Although not
particularly complex, this step is particularly important for large-scale facade mapping where thousands of images might be
available. Three methods inspired from well-know computer graphics techniques are compared: one is 2D-based and relies on the
analysis of a 2D map; the two other methods use the information provided by a 3D vector database describing buildings. The 2D
approach is satisfactory in most cases, but facades located behind low buildings cannot be textured. The 3D approaches provide
more exhaustive wall textures. In particular, a wall-by-wall analysis based on 3D ray tracing is a good compromise to achieve a
relevant selection whilst limiting computation.
1. INTRODUCTION
With the development of faster computers and more accurate
sensors (cameras and lasers), the automatic and large-scale
production of a virtual 3D world very close to ground truth has
become realistic. Several research laboratories around the world
have been working on this issue for some years. Früh and
Zakhor have proposed a method for automatically producing
3D city models using a land-based mobile mapping system
equipped with lasers and cameras; the laser points are registered
with an existing Digital Elevation Model or vector map, then
merged with aerial LIDAR data (Früh and Zakhor, 2003; Früh
and Zakhor, 2004). At the French National Geographical
Institute (IGN), the mobile mapping system Stereopolis has
been designed for capturing various kinds of information in
urban areas, including laser points and texture images of
building facades (Bentrah et ai, 2004). The CAOR laboratory
from ENSMP has also been working on a mobile system named
LARA-3D for the acquisition of 3D models in urban areas
(Brun et ai, 2007; Goulette et ai, 2007), based on laser point
clouds, a fish-eye camera, and possibly an external Digital
Elevation Model. Recently, a number of private companies
have commercialized their own mobile mapping systems for 3D
city modeling, like StreetMapper or TITAN for instance
(Hunter, 2009; Mrstik et al., 2009). The purpose of such
systems is often the 3D modeling as well as the texture
mapping of the 3D models.
In this study we are interested in texturing existing 3D building
models by mapping terrestrial images onto the provided façade
planes. As a part of the mapping strategy, one first needs to
determine which images each façade can be seen from. It is
particularly important for large-scale facade texture mapping
where thousands of images can be available. Every single
image can be relevant for the final texturing stage. There are
few references on this issue. In (Pénard et ai, 2005) a 2D map
is used to extract the main building facades and the
corresponding images. All the images viewing at least a part of
a façade are selected. In (Haala, 2004), a panoramic camera is
used and a single image is sufficient to provide texture for
many façades. Given a façade, the best view is the one
providing the highest resolution. It is selected by analyzing the
orientations and distances of the building facades in relation to
the camera stations. In (Aliène, 2008), a façade is represented
by a mesh. Each face of the mesh is associated to one input
view by minimizing an energy function combining the total
number of texels representing the mesh in the images, and the
color continuity between two neighbouring faces.
In our study, only two triangles per facade are available, and a
façade texture generally consists of a mixture of 4 to 12 input
views. The following mapping strategy has been chosen for
texturing a given façade:
Pre-selecting a set of relevant input images, from
which the façade can be seen;
Merging these images into a single texture image;
Registering the texture image with the existing façade
3D model.
This paper only focuses on the first stage. The purpose of this
operation is to select a set of potentially useful images based on
purely geometrical criteria. The generation of a seamless
texture image without occlusion artifacts will be handled within
the second stage. Three possible approaches for the image pre
selection are presented and discussed. The first approach is
similar to the one used in (Pénard et al., 2005) and relies on the
analysis of a 2D map. The two other methods use the
information provided by a 3D vector database describing
buildings. All methods are based on standard techniques
commonly used in computer graphics for visibility
computations, namely the ray-tracing and z-buffering
techniques (Strasser, 1974). These two techniques have now
been used for decades and are very well known in the computer
graphics community. They can easily be optimized and
accelerated via a hardware implementation.
This paper is organized as follows. Section 2 presents the test
data set used for this study. Sections 3, 4 and 5 detail the three
selection methods. The results and perspectives are discussed in
section 6.
2. TEST DATASET
The test area is a part of the historical center of the city of
Rennes in France. It is 1 km 2 wide and corresponds to the
densest part of the city. Existing 3D building models were
provided with an absolute accuracy around lm. It contains 1475
buildings consisting of 11408 walls. The texture image database
associated to the area was simulated via a virtual path created
through the streets. A point was created every 5 meters along
this path. Each point is associated to two cameras facing the left
and the right sides of the path. The camera centers are located
at 2.3 meters above the ground in order to simulate a vehicle
height. The internal and external parameters of the cameras are
approximately known. The path is about 4.9 kilometers long,