d 2004
irveys
| Civil
nertial
erican
3
Eng
nsing,
Ititude
chives
odate,
pment
using
lop on
- 6B-
| time
er and
es of
'rdam,
MULTI-IMAGE FUSION FOR OCCLUSION-FREE FACADE TEXTURING
Jan Bóhm
Institut für Photogrammetrie, Universitát Stuttgart, Germany
Jan.Boehm@ifp.uni-stuttgart de
Commission V, WG V/2
KEY WORDS: Texture, Fusion, Rectification, Terrestrial Imagery
ABSTRACT
Facade texturing is a key point for realistic rendering of virtual city models, since it suggests to the viewer a level of detail that is much
higher than actually provided by the geometry. Facade textures are usually derived from terrestrial imagery acquired from a position
on the ground. One problem frequently encountered is the disturbance of texture images by partial occlusion of the facade with other
objects. These occluding objects can be moving objects such as pedestrians and cars or static objects such as trees and street signs.
This paper presents a method for the detection and removal of these disturbances and allows for the generation of occlusion-free texture
images by multi-image fusion.
1 INTRODUCTION
Today many approaches for the creation of three-dimensional vir-
tual city models are reported in literature. As widespread as the
approaches for their generation are the applications of such mod-
els (Haala et al., 2002). When it comes to visualizing virtual
city models, a key point in most applications, facade texturing is
essential for realistic rendering. Due to the modern media and
entertainment industry and their use of highly sophisticated com-
puter equipment and specialist in the field, today's audience has
high expectations to the quality of computer-generated visualiza-
tions. This raises the demand for high-quality texturing in virtual
city models. One problem, and a major cause for the lack in the
quality of facade textures, is the disturbance of images by partial
occlusion of the facade with other objects, such as pedestrians,
cars, trees, street signs and so on. Especially in inner-city areas
it is impossible to avoid these occlusions, as the choices for the
camera stations are limited. Therefore strategies for the detection
and removal of these disturbances are essential.
The creation of facade textures is usually a labor-intensive man-
ual task involving the acquisition of terrestrial images of the fa-
cade, the rectification of the images, the mapping onto the ge-
ometry of the model and various improvements to the imagery.
Automated approaches, which have been reported in literature,
solve the tasks by projecting primitives (triangles or texels) form
the images to the object geometry (El-Hakim et al., 1998). This
requires absolute orientation of the images, derived from bundle
adjustment. If more than one mapping is available for a primitive,
redundancy can be used to remove occlusions. Our approach dif-
fers in that it does not explicitly map image points to 3D geom-
etry, but that we rather attempt to warp the images in 2D using
perspective transformation. Thereby pixel-level registered image
sequences are generated providing the redundancy to eliminate
occluding objects by means of background estimation.
The occluding objects can roughly be categorized into two classes:
moving objects and static objects. To detect and remove mov-
ing objects from terrestrial images of a fagade, it is sufficient
to acquire a sequence of images from a single camera station.
This image sequence can be processed with algorithms from the
field of video-sequence analysis known as background estima-
tors, normally used for change detection. However these algo-
rithms usually require long sequences (>100 images) to converge
to a satisfactory result. To acquire such long sequences with high-
resolution digital cameras would significantly increase the time
and effort to acquire texture images. Therefore we explore alter-
native approaches suitable to short sequences («x10 images).
e
867
For the case of static objects, a single camera station is insuffi-
cient. Images from several different stations have to be acquired
and fused. We solve the fusion of these images without entering
the three-dimensional domain, avoiding over-proportional com-
putational costs and complexity. Instead we attempt to solve
solely in the two-dimensional image domain by mapping the im-
age to the planar surface of the facade. This step of mapping,
also referred to as rectification, is always involved when using
terrestrial imagery for facade texturing. We have studied both
manual and automated approaches for rectification in the past.
When an occluding object lies in front of the facade plane at a
certain distance, its mapping to the plane varies across the image
sequence, due to the oblique angles of the different camera sta-
tions. Thereby the case of a static occluding object is transformed
to ihe case of a moving object in the rectified sequence and hence
the same approaches for detection and removal as in the case of
moving objects can be applied.
Section 2 introduces to the task of facade texturing and explores
current approaches to the task. Section 3 and 4 detail our ap-
proach for image fusion and occlusion removal based on cluster-
ing. Examples of real facade imagery with occlusions and the
results of our processing are presented.
2 FACADE TEXTURING
A large number of systems exist for the 3D modeling of urban
environments. Based on either aerial imagery or laser scan data
they usually create polyhedral descriptions of individual build-
ings (see figure 2). Texturing these polyhedral structures greatly
enhances their visual realism and suggest a much higher level of
detail to the viewer as can be seen in figure 1. When aerial im-
agery is used, roof structures can be textured automatically. For
the texturing of the vertical facades terrestrial imagery is needed.
Facade texturing has received some attention in both the pho-
togrammetric and the computer vision community. Approaches
for manual, semi-automatic and fully automated texture extrac-
tion have been presented in the past. Some of the systems com-
bine the reconstruction process with the texture extraction.
The simplest approach is to use a single image and warp this
image onto the planar facade using the perspective transforma-
tion T'p. Four points in both the planar coordinate system of the
facade and the corresponding points in the image have to be de-
termined. Assuming that all facades are bound by a rectangular
curve the approach can be further simplified by splitting the per-
spective transformation into an intermediate perspective and into