anbul 2004
deal with
r relations
h and how
n allowing
ding to the
utcome, if
| Supposed
modeling
analyzing
asible and
'rpretation
al., 2004),
02). The
alistically
ng from a
door, or a
jn. For the
he extrac-
iding local
xity of the
diagnosis.
paradigm.
shed from
h might be
ia is that a
an rely on
/'ellow), or
agnosis is
ows about
> problem,
rke et al,
ion on top
geometri-
r of blun-
ias to deal
;. This has
proach the
hetrists do
'onsensus,
les, 1981)
tion crite-
ke a larger
number of
All these
ed against
n is taken,
1on of ob-
jr applica-
etry (Hart-
(Schmidt
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B2. Istanbul 2004
and Brand, 2003), or to find planes in a large number of
3D points (Bauer et al., 2003).
Computer vision has understood many of the geometric
problems of the imaging process over the last decade very
well. Early results are summarized in (Faugeras, 1993),
while the state of the art is given by (Hartley and Zis-
serman, 2000, Faugeras and Luong, 2001). In the last
years a focus has been on geometric algebras. An impor-
tant ingredient is Grassman-Cayley Algebra as proposed
by (Faugeras and Papadopoulo, 1997). Recently, (Rosen-
hahn and Sommer, 2002) have extended the scope of geo-
metric modeling significantly, allowing, e.g., to deal with
articulated objects linearly. (Heuel, 2001) has presented
work where traditional statistics is linked with geometric
algebras making it possible to propagate stochastic infor-
mation.
25 Learning
From a practical, but also from a theoretical point of view
automatic learning, i.e., the automatic generation of mod-
els from given data or even experience, is of big impor-
tance as it avoids the tedious manual process of model
generation. The latter is one of the most important rea-
sons, why an automated extraction of objects with a wider
variety of appearances does not seem to be feasible yet.
For learning one has to distinguish between very different
degrees ranging from the mere adaptation of parameters to
the fully automatic generation of models for objects such
as buildings including their parts, their structure, and their
geometry, as, e.g., in (Englert, 1998).
Unfortunately, learning is, after standard textbooks have
been introduced a long while ago (Michalski et al., 1984,
Michalski et al., 1986), still not advanced enough to deal
well with real world problems as complex as object extrac-
tion. Yet, this is not a surprise as object extraction is a
large part of the overall vision problem which is even af-
ter a lot of research by extremely skilled humans not really
understood.
Also for learning statistics might come to help. Hid-
den Markov Models (HMM) have made possible a break-
through in the interpretation of written and spoken text.
Instead of describing words and their relations structurally
(grammar) and semantically, it was found for many ap-
plications enough just to analyze the statistical dependen-
cies of very few neighboring words based on HMM (Ney,
1999). Similar ideas have been introduced also into im-
age processing, but the much higher complexity makes
progress much more difficult.
Finally, concerning another popular means also used for
learning, namely artificial neural networks, we refer to the
discussion in a recent survey on statistical pattern recog-
nition (Jain et al., 2000). There it is stated, that “many
concepts in neural networks, which were inspired by bio-
logical neural networks, can be directly treated in a prin-
cipled way in statistical pattern recognition.” On the other
417
hand, it is noted that “neural networks, do offer several ad-
vantages such as. unified approaches for feature extraction
and classification and flexible procedures for finding good,
moderately nonlinear solutions.”
3 TESTING
A key factor for the practical use of a technique in many
areas is thorough testing. Yet, this is only useful after hav-
ing obtained a profound theoretical understanding of the
problem. There are different issues, where testing can help
significantly:
e It becomes evident what the best approaches can
achieve and therefore, what the state of the art is.
e The strengths but also the weaknesses of compet-
ing approaches become clearly visible and the whole
area can flourish by focusing on promising directions,
abandoning less promising ones, and by identifying
unexplored territory.
e Testing usually gives a large push to all people in-
volved. By trying to outperform other approaches one
learns much about the possibilities but also the limits
of one’s owns approach.
Unfortunately, it is not always easy to define what to ac-
tually test. This is most critical for practical issues, such
as the effectiveness of semi-automated approaches com-
pared to the manual approaches. It depends on many fac-
tors some of them needing lots of efforts for optimization
if the real potential of an approach is to be obtained. But
also for automated approaches there is a large number of
factors which influence the test and by this also which ap-
proaches perform well and which not. For roads, e.g., the
preferred characteristics of the terrain plays an important
role while for buildings, the situation is even worse. There,
approaches exist, assuming at least 4-fold image overlap,
while others rely on laser-scanner data only, both possibly
modeling different types of buildings, e.g., flat roofs versus
polyhedral objects.
Our experience shows that for many applications two basic
measures are suitable for testing, namely “correctness” and
"completeness" (Heipke et al., 1997). Other people use
different names for these concepts, but what we mean is the.
percentage of extracted object information which can be
matched to given ground truth data (correctness), as well
as the percentage of ground truth data that can be matched
to the extracted information (completeness). As one can
see, the matching of the object information to ground truth
data is an important issue. Road axes can be seen to match
as long as they are inside the actual area of the road or in-
side a buffer generated from specifications for the precision
of the acquisition. For buildings it is more complicated as
one can match ground truth data and extracted information
in 2D and in 3D. Usually, the computation is done in image
space (pixels) or 3D voxel space (Shufelt, 1999). To sep-
arate orientation errors from object extraction errors, in-
dividual objects can be optimally transformed before this