Close-range imaging, long-range vision

  
  
  
  
  
problems is the reliability of feature extraction, the other the 
exponential complexity of the matching task. For the later 
approach in order to have a two-dimensional representation of a 
three-dimensional shape, one has to decompose the shape into 
several views and store a two-dimensional representation for 
each view. This is referred to as an aspect-graph. For our 
system, since we have an approximated exterior orientation of 
the imaging device, we do not have to build the whole aspect 
graph, rather we create on-the-fly a single view of the shape 
corresponding to the orientation data, as we receive the image. 
2.1 View Class Representation 
With the exception of a few simple shapes (a sphere for 
example) every three dimensional object has a different 
appearance when seen from a different viewpoint. The more 
complex the shape is the stronger are the differences. This is the 
case for buildings. In the view class representation scheme 
(Koenderink & van Doorn 1979) the space of possible 
viewpoints is partitioned into view classes. The view classes are 
arranged in a graph known as the aspect graph (see Figure 1). 
Each node represents a single view class, each arc represents 
the transition from one viewpoint to another. The methods used 
to compute the view classes of an object can be very complex 
and the aspect graph of a non-trivial shape is quite large. In the 
subsequent matching process the input data has to be compared 
to a multitude of nodes. 
In our framework we make use of the knowledge of an 
approximated exterior orientation. This translates to a single 
point in the space of possible viewpoints. Therefore we are not 
required to compute the full aspect graph. If the viewpoint is 
known only to a very low accuracy it is sufficient to compute a 
small part of the aspect graph in the neighborhood of the 
approximated viewpoint. Our experience shows that for our 
application it is usually sufficient to reduce the graph to a single 
node. This single node does not need to be stored beforehand 
but can be computed on-the-fly. The matching process is 
restricted to only one instance further reducing computational 
costs. 
  
Figure 1: Part of the aspect graph of a simple building. 
2.2 Feature Extraction 
When designing an object recognition system one has to choose 
the type of features used for recognition and the algorithm used 
to perform the matching. The decision on the feature type is 
often guided by the available model data. In our case the 
buildings are modeled as polyhedrons, no in-plane facade detail 
or texture information is available. The strong discrepancy in 
feature detail in-between model and sensor data prevented us 
from using edge or corner detection. Since there is no texture 
information available, image correlation was also not an option. 
To achieve a robust detection we chose to detect the overall 
shape of the ‘building in the image rather than extracting single 
features. The intent was, that the overall shape is more robust 
against clutter of the scene, partial occlusion by trees, cars, 
pedestrians and other negative influences onto the scene. The 
silhouette of a building is a good representation for its overall 
shape. From an existing CAD database the CAD model of the 
building is rendered for a given view according to the 
calibration data of the camera. The details on how to select the 
specific building are given in (Klinec & Fritsch 2001). The 
‘virtual view’ of the building is used to extract the silhouette of 
the building (see Figure 2). This representation is then detected 
in the scene. 
  
  
  
  
  
  
  
    
  
Figure 2: CAD model of a building rendered for a given exte- 
rior orientation and the extracted silhouette below. 
2.3 Generalized Hough Transform 
We decided to use Generalized Hough Transform (GHT) to 
implement the detection. The GHT is a framework for both the 
representation and detection of two-dimensional shapes in 
images. Using the GHT we are able to detect the shape no 
matter whether it is shifted, rotated or optionally even scaled in 
the image. We need these degrees of freedom, since the 
orientation is only known approximately. Additionally the GHT 
allows for a certain tolerance in shape deviation, which is 
necessary, since the CAD model of the building is only a coarse 
generalization of its actual shape as it appears in the image. 
The well known conventional Hough Transform (Hough, 1962) 
is used to detect analytical curves such as lines or circles in an 
image. The requirement for the application of the Hough 
Transform is that the model can be formulated as a function of a 
set of parameters. The GHT is the generalization of that concept 
for the detection of arbitrary shapes for which a simple analytic 
description is not possible (Ballard, 1981). 
The GHT uses a gradient operator to compute edge magnitude 
and gradient direction. During the offline phase, which has to 
be computed once for a certain view class, the so-called R-table 
is built. First a reference point of the shape is selected usually 
the centroid of the shape is used. Then the distance vector r of 
every edge pixel P to the reference point O with respect to its 
gradient direction ® (see Figure 3) is stored in the R-table. Of 
course when iterating over all edge pixels, there can be several 
r vectors for an entry of ®. 
In the online phase, when the actual detection is performed, all 
gradients of the search images are computed. For each edge 
pixel the gradient direction points to an entry in the R-table, 
402 — 
Fi 
fre 
Fo 
pr 
co 
op 
Fo 
py’ 
api 
1S ( 
Th 
trai 
trai 
act 
din 
We 
its 
3.1 
Thi 
obs 
cali 
Ori 
froi 
Fro 
an 
pho 
Sev 
pro! 
sug 
4 P 
COO 
fou
1
2
...
505
506
507
508
509
...
640
641
Full text: Close-range imaging, long-range vision

Access restriction

Copyright

Note to user