Gruen, Armin
models utilized (parametric, generic, functional, special) and models are used at different levels of complexity (e.g. a
signalized point versus a full house).
In object extraction, as in image analysis in general, there exist two fundamentally different approaches: bottom-up and
top-down. Bottom-up is a data-driven strategy, which extracts in a first step image primitives, groups them to higher
level entities, and through the process of hypothesis generation and verification, builds up the complete objects. The
main problem here is the instability and ambiguity of the segmentation process at the lowest level. At the higher level of
object aggregation techniques from artificial intelligence, such as constraint-based reasoning, uncertainty reasoning
with Dempster-Shafer, probabilistic relaxation, Bayesian reasoning, constraint satisfaction networks, semantic
networks, blackboards, etc. are used.
The top-down approach, which is model-driven, usually starts with hypotheses about the scene and tries to verify their
existence by compatibility checks with the existing image data. Indispensable to this technology are object models,
often used in explicit form. In essence, the object data structure inferred from the image(s) is matched to the model data
structure (Haralick, Shapiro, 1993). While this concept has a certain justification in robotics and navigation, where the
environment might be of reduced complexity, we encounter big problems in building extraction, because here the scene
knowledge is of purely generic type and the computational expense for hypotheses verification is prohibitively high.
In the more recent approaches of building extraction we see elements of both strategies used together in an interrelated
manner. This seems to be the right way to approach the problem.
Other current trends in building extraction include the following aspects:
e multi-image approaches
multi-cue algorithms
fusion of various information sources
Digital Surface Models (DSM) for detection and reconstruction
derivation of DSMs by laser scanners
generic roof modeling by decomposition into parts
use of a priori knowledge from maps and GIS
semi-automated reconstruction techniques
3 ASTRUCTURED APPROACH TO OBJECT EXTRACTION
Figure 2 is an attempt towards systematization of object extraction strategies.
We will explain the individual strategies with procedures which our group developed over the past years.
Crosscomparisons to other authors will be left to the reader.
Paths 1a and 1b take several images, one after the other, extract geometrical primitives from these single images (points,
corners, lines), convert those via matching or monoplotting into 3-D features and group them into higher level entities,
or in other words, establish the final model(s) through a topology generator. With our 2-D LSB-Snakes and Dynamic
Programming techniques we have developed semi-automated procedures for line feature extraction from space, aerial
and close-range images (Gruen, Li, 1997, Li, 1997, Li, Gruen, 1997). Our ARUBA system for building extraction
works fully automatic, but successfully only under simplified assumptions ( Henricsson, 1996, Henricsson et al., 1996).
Path 1b is the traditional computer vision approach. An often encountered basic version even avoids the image
matching step and aims at generating 3-D data from single images, relying on additional image cues like shadows, etc.
Also, the Bonn approach for automated house extraction, using essentially house corners as image primitives (Fischer et
al., 1998) can be classified under 1b.
In path 2 the features are extracted simultaneously from several images, if possible under control of the camera
model(s). As a result we obtain 3-D data in form of space curve segments, vector fields, or point clouds. Again, the final
model has to be derived by a topology generator. An example of this class of procedures are our 3-D LSB-Snakes,
which generate 3-D line objects in object space from any number of images simultaneously and semi-automatically
(Gruen, Li, 1997). Seed points have to be given by an operator or taken from a GIS database. The task of the topology
generator would then consist in the production of a topologically consistent road network.
On a lower level of automation (path 3) we have developed two procedures for building extraction and modeling, where
the operator manually extracts an unstructured or weakly structured point cloud from a stereomodel. The systems
TOBAGO and CC-Modeler then fit automatically planar faces to the point cloud and generate the complete building
model.
While TOBAGO (Dan, 1996, Gruen, Dan, 1997) uses a catalogue of housemodels for the purpose of fitting, CC-
Modeler (Gruen, Wang, 1998), although also driven by model assumptions, is fully generic in the sense that even other
objects than buildings can be modeled, as long as they are bounded by planar faces.
Our system DIPAD (Digital Photogrammetry and Architectural Design) follows path 4. It uses an existing CAD model
of the object, however coarse, and refines it semi-automatically in an iterative fashion. This defines a hierarchical
approach, where at each subsequent iteration a higher level of object refinement is obtained (Gruen, Streilein, 1994). It
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000. 311