The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B4. Beijing 2008
278
much detail, it is impressive to see that their system achieves
reasonable results for hundreds of buildings and runs on an
operational basis. (Rottensteiner et al., 2005; Rottensteiner,
2006) extract low-level features for building reconstruction
from multispectral images and surface models. Building
parameters are estimated in a consistent way by considering
geometrical regularities employing soft constraints, while false
and conflicting hypotheses are eliminated through a robust
estimation. Recent approaches make also use of the distinct 3D
characteristics of buildings and some of them focus also on
facades, which can be included by acquiring terrestrial data.
(Dorninger & Nothegger, 2007) present an approach for 3D
segmentation, which can make full use of high resolution 3D
points from laser scanning (up to 20 points per m 2 ) or image
matching showing also at least partially the vertical facades. The
approach relies on a full 3D representation of planes in
parameter space and clustering the points in this space. Results
for points from laser and image data prove that very high quality
models can be generated. (Zebedin et al. 2006, 2007) show how
facade planes derived from dense 3D points can be refined by
sweeping the planes and projecting them into all views where
they are visible. This does not only lead to very good estimates
for the plane, but also for the texture as demonstrated by a
number of convincing examples. An alternative to close the gap
of missing structures of facades in classical nadir data and
missing structures from roofs in terrestrial data is to capture the
scene with an oblique looking sensor. (Hebei & Stilla 2007)
show this for the case of an oblique-viewing laser scanner. Yet,
also for this approach, a high precision multi-aspect registration
of point clouds is necessary for further processing the data.
A clear tendency can be seen that also recent approaches for
road extraction integrate 3D information. (Clode et al. 2007),
for instance, use high resolution LIDAR height and intensity
data to delineate the road geometry in 3D. Primary road
hypotheses are generated by classification employing colour
intensities and height gradients. The result is then vectorized by
convolving a complex-valued disk (so-called Phase Coded
Disk) with the image. The Phase Coded Disk represents
basically the local features of the road model. Center line and
width of the road are obtained from the magnitude image while
the direction is determined from the corresponding phase image.
(Hinz 2004a, 2004b) employs a DSM and multiple-view
imagery to extract urban road networks. The extraction is based
on a detailed scale-dependent road and context model to deal
with the high complexity of this type of scenes. The
corresponding extraction strategy is subdivided into three
levels: level 1 comprises the analysis of context, i.e., the
segmentation of the scene into the urban, rural, and forest areas
as well as the analysis of context relations, e.g., the
determination of shadow areas and the detection of vehicles;
level 2 includes the detection of homogeneous ribbons as
preliminary road segments in coarse scale, collinear grouping
thin bright road markings in fine scale, and the construction of
lanes and carriageways from groups of road markings and road
sides; level 3, finally, completes the road network by fusing
road segments detected in overlapping images, iteratively
closing gaps in the extraction, and exploiting the network
characteristics to generate a topologically complete road
network. A key feature of the approach is the incorporation of a
scheme for internal evaluation. Hypotheses generated during
extraction are internally evaluated so that their relevance for
further processing can be assessed.
Typically, also multispectral information can be exploited since,
for both airborne sensors and spacebome sensors, the
acquisition of multi-spectral data in the visible domain has
reached a resolution regime, in which multi-spectral analysis
can substantially support the extraction. For instance, (Mena &
Malpica, 2005) and similarly (Zhang & Couloigner, 2006) use
colour information to derive various statistical and textural
parameters. Classifying an image based on these features yields
potential road segments, which are cleaned and skeletonized
into road center axes. Such approaches show limitations when
applied to images of low resolution compared to the object size.
Dealing with mixed pixels is thus an important issue if roads are
to be mapped using satellite images. To cope with this, (Bacher
& Mayer, 2004, 2005) developed a two-step strategy. First,
training information for a supervised classification is obtained
from an initial step of road extraction with very strict parameter
settings. The results are fed into a multispectral classification to
generate a so-called roadclass-image, which can be interpreted
as an additional channel. These multi-channel data are
processed simultaneously with the line- and network-based road
extraction approach of (Wiedemann & Hinz, 1999). By means
of this strategy, the linear properties of roads in each channel
are exploited and - if available - supplemented with area-based
colour information. An alternative is shown in (Ziems et al.,
2007), where colour information is employed to better identify
false alarms when determining potential errors in existing road
databases, e.g., if GIS road axes run through fields or bushes.
To this end, a statistical analysis of the colour distribution
derived from potential road areas in comparison with trained
distributions is carried out.
2.2 Integration of functional and temporal properties
With the increasing availability of airborne videos and highly
overlapping photogrammetric image sequences, also the
integration of temporal features becomes feasible. These show
great potential to add very valuable information additionally to
the geometric and radiometric properties of objects. This trend
can be seen in particular for road extraction where first
approaches for road mapping by activity analysis and car
tracking were recently developed. The work by (Pless 2006), for
instance, aims at detecting temporal changes in stabilized
airborne videos. It is based on a generic scheme to discern static
background from active foreground on the basis of eigenvalue
analysis. Especially in the case of dense traffic, active image
regions correspond to the main roads. While this approach is
mainly designed for the analysis of inner city areas, the
Bayesian car tracking system by (Koch et al. 2006) is able to
fuse multiple car tracks from different flight paths. By
employing a comprehensive Bayesian model for the sensor
characteristics as well as the detection and fusion scheme, the
inherent uncertainties in the physical and mathematical sensor
modelling and track hypothesis generation are handled in a very
consistent way. As final step, the - potentially interrupted - car
tracks are transformed into objects space and fused to
eventually delineate a precise and topologically intact road
network.
2.3 Integration of scale-space characteristics
The importance of incorporating the scale space behaviour of
objects into automatic extraction approaches has been
recognized already in the 1990s. It has been shown that
different levels of image resolution can be linked to certain