×

You are using an outdated browser that does not fully support the intranda viewer.
As a result, some pages may not be displayed correctly.

We recommend you use one of the following browsers:

Full text

Title
CMRT09
Author
Stilla, Uwe

In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 — Paris, France, 3-4 September, 2009
211
IMPROVING IMAGE SEGMENTATION USING MULTIPLE VIEW ANALYSIS
Martin Drauschke, Ribana Roscher, Thomas Labe, Wolfgang Forstner
Department of Photogrammetry, Institute of Geodesy and Geoinformation, University of Bonn
martin.drauschke@uni-bonn.de, rroscher@uni-bonn.de, laebe@ipb.uni-bonn.de, wf@ipb.uni-bonn.de
KEY WORDS: Image Segmentation, Aerial Image, Urban Scene, Reconstruction, Building Detection
ABSTRACT
In our contribution, we improve image segmentation by integrating depth information from multi-view analysis. We
assume the object surface in each region can be represented by a low order polynomial, and estimate the best fitting pa
rameters of a plane using those points of the point cloud, which are mapped to the specific region. We can merge adjacent
image regions, which cannot be distinguished geometrically. We demonstrate the approach for finding spatially planar
regions on aerial images. Furthermore, we discuss the possibilities of extending of our approach towards segmenting
terrestrial facade images.
1 INTRODUCTION
The interpretation of images showing building scenes is a
challenging task, due to the complexity of the scenes and
the great variety of building structures. As far as human
perception is understood today, humans can easily group
visible patterns and use their shape to recognize objects,
cf. (Hoffman and Richards, 1984) and (Treisman, 1986).
Segmentation, understood as image partitioning often is
the first step towards finding basic image patterns. Early
image segmentation techniques are discussed in (Pal and
Pal, 1993). Since then, many other algorithms have been
proposed within the image analysis community: The data-
driven approaches often define grouping criteria based on
the color contrast between the regions or on textural infor
mation. Model-driven approaches often work well only on
simple scenes e. g. simple building structures with a flat
or gabled roof. However, they are limited when analyzing
more complex scenes.
Since we are interested in identifying entities of more than
two classes as e.g. buildings, roads and vegetation objects,
we cannot perform a image division into fore- and back
ground as summarized in (Sahoo et al., 1988). Our seg
mentation scheme partitions the image into several regions.
It is very difficult to divide an image into regions if some
regions are recognizable by a homogenous color, others
have a significant texture, and others are separable based
on the saturation or the intensity, e. g. (Fischer and Buh-
mann, 2003) and (Martin et al., 2004). However, often
such boundaries are not consistent with geometric bound
aries. According to (Binford, 1981), there are seven classes
of boundaries depending on illumination, geometry and re
flectivity. Therefore, geometric information should be in
tegrated into the segmentation procedure.
Our approach is motivated by the interpretation of building
images, aerial and terrestrial, where many surface patches
can be represented by low order polynomials. We assume a
multi-view setup with one reference image and its intensity
based segmentation, which is then improved by exploiting
the 3D-information from the depth image derived from all
images. Using the determined orientation data, we are able
to map each 3D point to an unique region. Assuming, ob
ject surfaces are planar in each region, we can estimate a
plane through the selected points. The adjacent regions are
merged together if they have similar planes. Finally, we
obtain an image partition with regions representing dom
inant object surfaces as building parts or ground. We are
convinced that the derived regions are much better for an
object-based classification than the regions of the initial
segmentation, because many regions have simple, charac
teristic shapes.
The paper is structured as followed. In sec. 2 we discuss
recent approaches of combining images and point cloud
information, mostly with the focus on building reconstruc
tion. Then in sec. 3 we briefly sketch our approach for
deriving a dense point cloud from three images. So far, our
approach is semi-automatic due to the setting of the point
cloud’s scale, but we discuss the possibility of automatiza
tion for all its steps. In sec. 4 we present how we estimate
the most dominant plane in the dense point cloud restricted
on those points, which are mapped to pixels of the same re
gion. The merging strategy is presented in sec. 5. Here we
only study the segmentation of aerial imagery and present
our results in sec. 6. Adaptations for segmenting facade
images are discussed in each step separately. We summa
rize our contribution in the final section.
2 COMBINING POINT CLOUDS AND IMAGES
The fusion of imagery with LIDAR data has successfully
be done in the field of building reconstruction. In (Rotten
steiner and Jansa, 2002) regions of interests for building
extraction are detected in the set of laser points, and pla
nar surfaces are estimated in each region. Then the color
information of the aerial image is used to merge adjacent
coplanar point cloud parts. Contrarily, in (Khoshelham,
2005) regions are extracted from image data, and the spa
tial arrangement of corresponding points of a LIDAR point
cloud is used as a property for merging adjacent regions.
In (Sohn, 2004) multispectral imagery is used to classify
vegetation in the LIDAR point cloud using a vegetation in
dex. The advantage of using LIDAR data is to work with
high-precision positioned points and a very limited portion
of outliers. The disadvantage is its expensive acquisition,
especially for aerial scenes. Hence, we prefer to derive a
point cloud from multiple image views of an object.