ted with real-world
used as opposed to
xact matches to the
/pothesis generation
based object model
tructural parametric
the resolution of the
exible symbolic and
us constructed. Ad-
nt relations between
dary; these are pre-
olution levels of the
to translation, rota-
. Therefore, objects
ion levels, yet more
vs fast and efficient
hly compressed data
| recognition at any
rticularly with large
nages since the size
asoning operations.
are recognized at a
scene, they can be
ing local operations
ing a portable stan-
performed using a
‚thus making it fully
s. Experiments in-
D data: (i) natural
vering 10 x 20 km?
| suburban area of
(iii) various sets of
cquisition sensor of
1e same feature de-
ees were used for all
dent generic models
ecognize. In the fol-
'imental results that
(ith various types of
RAL TERRAIN
‚JENES
isory elevation data
various other pho-
pically stored in the
)EM) format. How-
ata of such outdoor
ning and planning.
of sensory informa-
ppropriate compres-
haracteristics of the
of the topographic
the terrain. There-
oy irregular triangu-
ic mesh coarsening
ain features. Such
ene and for various
mous visual naviga-
ation systems (GIS)
re.
Using the nearly-planar patches as modeling primitives, the
detected local surface topographic features can be used to au-
tomatically segment the region into collections of nearly co-
planar triangular patches. These patches are grouped using
generic models to describe interesting, more abstract global
scene features (e.g., hills, valleys, mountains, plains, etc.)
which provide a more abstract representation of the scene
suitable for various reasoning tasks. The original DEM data
consists of 1200 x 1200 elevation measurements sampled reg-
ularly at 1 m. intervals around the West-most section of Lake
Ontario. In Figure 3, the American shore is on the left side
of the figure and the Canadian shore is on the right side.
Figure 3 (top) illustrates an initial regular sub-sampling of
the original triangular mesh representation of the above men-
tioned scene (uniquely for experimental reasons). The to-
pographic mesh coarsening and scene feature detection and
grouping reduces the storage requirements by several orders
of magnitude. The resulting mesh is irregular in nature with
more points concentrated around interesting regions with high
feature density and much less points in flat regions.
Adopting the nearly-planar patches described above as model-
ing primitives, this scene was subsequently reduced to approx-
imately 40 such patches. Generic models can be constructed
for various global scene features to detect important sym-
bolic entities. Figure 3 (bottom) shows the symbolic scene
features identified using generic models based on collections
of nearly-planar patches. Several topologic representations
of this symbolic scene description can be formed (e.g., topo-
logic graphs, entity-relationship diagrams, etc.) to support
practical symbolic reasoning and planning tasks.
Figure 3: A triangular mesh representation of the ter-
rain (top) with the detected nearly-planar patches (bottom).
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
5 RECOGNITION OF MAN-MADE STRUCTURE
IN AERIAL IMAGES
For the experimental validation of our terrain model, we use
a set of geographic elevation data captured by aerial imaging
and covering ari area of roughly 240x240 meters with a one
meter resolution. This sensory data covers a plain with several
buildings of similar heights. For the experiments, we select
subregions with various resolutions and sizes, and therefore,
we generate test data covering a wide variety of scenes. For
simplicity and without loss of generality, we use rectangu-
lar subsections of the terrain for the individual experiments.
This choice does not bias nor affect the validity of the re-
sults. Figure 4 depicts the original range image of the entire
region. The grey-scale format (Figure 4-top) is such as that
the darker the pixels the higher their corresponding elevations.
It also illustrates the detected houses (Figure 4-bottom) in a
triangular mesh representing the full details of the scene.
Our building model is expressed in terms of a set of nearly-
planar patches as described earlier. Some of these (corre-
sponding to side walls) are far from horizontal and enclose
other raised patches (corresponding to roofs) which are close
to horizontal. Such a flexible model is very generic and able
to extract numerous other objects such as buses and vans if
they exist in the aerial scene. Therefore, we use some domain
knowledge and context constrains on the object's dimensions
to exclude spurious objects. If we were to recognize outdoor
vehicles using this generic model, only the parameters of the
constraints defining the model need to be changed.
We use a single generic model covering houses, apartment
buildings, and large hangars. When such a structure is recog-
nized, a set of derived parameters (e.g., height, area of floor
plan, volume, area of enclosing surface) are computed. They
are used to distinguish the different objects using domain-
specific knowledge (e.g., the average size of a house compared
to a high apartment building). The derived parameters are
also used to reconstruct the detected objects' geometry if re-
quired. Table 5 provides the obtained results for the scene in
question. The house labels are consistent with those labels in
Figure 4-top. The values reported here were computed using
the high resolution mesh sampled at 2 meters intervals. The
center of gravity (C.O.G) of each detected house is reported
in meters with respect to the origin at the top left corner in
Figure 4. Surfaces are given in square meters.
We verified the capability of our man-made structure recog-
nition system at various resolution levels of the available sen-
sory range data. Therefore, we applied our topographic mesh
coarsening algorithm mentioned earlier to the original dense
mesh representing this scene. After several coarsening it-
erations, the number of vertices and triangles representing
the same scene decreases significantly. The coarser mesh
was then used to identify the same houses using the same
model described above. The houses were detected with a
high accuracy (within the allowable errors in mesh coarsening)
and yielded almost identical results to the houses detected
in the original dense mesh. Figure 6 illustrates a close-up
to the coarse mesh in the neighborhood of the houses la-
beled 10 and 11 in Figure 4. It is clear from the table at
the bottom of Figure 6 that the results obtained from such
a coarser mesh are nearly identical to those obtained from
the original high resolution mesh. The maximum error in the
location of the house's center of gravity is 0.2 m only. This
amounts to about than 0.08394.
189
SEEN
A
EM
id