ISPRS Commission III, Vol.34, Part 3A ,,»Photogrammetric Computer Vision‘, Graz, 2002
order to improve the directly measured exterior orientation of
the camera
3.2 Building Localization
When the task is to detect a three-dimensional shape in an
image, two general strategies for object representation are
available. One is mapping of the inherent three-dimensional
representation of the object, which leads to a 3D to 2D
matching problem. Alternatively, two-dimensional
representations can be applied, which leads to a 2D to 2D
matching problem. While the former is the more general and
theoretically more appealing approach, there are several
practical problems which often prevent its use. One of the
problems is the reliability of feature extraction, the other the
exponential complexity of the matching task. For the later
approach, in order to have a two-dimensional representation of
a three-dimensional shape, one has to decompose the shape
into several views and store a two-dimensional representation
for each view. This approach is referred to as an aspect-graph.
For our system it is not required to build the whole aspect
graph, since an approximated exterior orientation of the
imaging device is available. In this case, a single view of the
shape can be created on-the-fly for each image in
correspondence to the respective orientation data.
Additionally, when designing a object recognition system one
has to choose the type of features used for recognition. The
decision on the feature type is often guided by the available
model data. In our case the buildings are modelled as
polyhedrons, no in-plane facade detail or texture information is
available. This strong discrepancy in feature detail in-between
model and sensor data, thwarts the use of edge or corner
detection. Since there is no texture information available, also
image correlation is not an option.
To achieve a robust detection our system aims on the detection
of the overall shape of the building in the image rather than
extracting single features. The intent was, that the overall shape
is more robust against clutter of the scene, partial occlusion by
trees, cars, pedestrians and other negative influences during
image capture. À good representation of the overall shape of a
building is provided by its silhouette. Thus, based on the
existing CAD database, the 3D building model is rendered
according to the interior and exterior orientation of the camera
used for the collection of the actual image. This ‘virtual view’
of the building as it is depicted in Figure 2 is used to extract the
silhouette of the building. Now, this representation has to be
located within the corresponding image. For this purpose a
Generalized Hough Transformation (GHT) as described by
(Ballard & Brown 1982) was applied.
3.3 Generalized Hough Transformation
Generally speaking, the GHT provides a framework for both the
representation and detection of two-dimensional shapes in
images. Based on the GHT a shape can be detected no matter
whether it is shifted, rotated or optionally even scaled in
relation to the image. These degrees of freedom are required
since the orientation is only known approximately in our
application. Additionally the GHT allows for a certain tolerance
in shape deviation. This is also necessary, since the CAD model
of the building provides only a coarse generalization of its
actual shape as it is appearing in the image.
The Hough transform is a technique which can be used to
isolate features of a particular shape within an image. Because
it requires that the desired features be specified in some
parametric form, the classical Hough transform is most
commonly used for the detection of regular curves such as
A - 140
lines, circles, ellipses, etc. Compared with this, the generalized
Hough transform can be employed in applications, where a
simple analytic description of a feature(s) is not possible.
In this case, instead of using a parametric equation of the curve,
a look-up table is applied to define the relationship between the
boundary positions and orientations and the Hough parameters.
In our application the prototype shape used to compute the
look-up table values during a preliminary phase is provided by
the silhouette of the building as depicted in Figure 2. First, an
arbitrary reference point x,,, y,,, is defined within the feature.
The shape of the feature can then be defined with respect to this
point by the distance r and angle /J of normal lines drawn
from the boundary to the reference point. The resulting look-up
table will consist of these distance and direction pairs r, ,
indexed by the orientation @ of the boundary.
The Hough transform space is now defined in terms of the
possible positions of the shape in the image, i.e. the possible
ranges of x,y, In other words, the transformation is
defined by:
X,g = x +r cos B
| ; (1)
Yıer #Y +rsin B
An arbitrary edge operator provides edges pixels at position
X,, y, With orientation «for the image. Based on the generated
look-up table for the available orientation œ, the corresponding
values for » and f can be selected. Thus based on equation
(1) the accumulator array can now be updated for each edge
pixel by the calculated position x, y,,, . If — as in our case — in
addition to the position, the orientation and scale of the feature
are also unknown, separate accumulator arrays have to be
generated.
For our implementation the HALCON image processing
environment was used (Ulrich et al 2001). In order to
compensate for the computational costs of large R-tables, this
operator includes several modifications to the original GHT. As
an example it uses a hierarchical strategy generating image
pyramids to reduce the size of the tables. By transferring
approximation values to the next pyramid level the search space
is drastically reduced. Additionally, the expected accuracy of
the shape's location can be applied for an further reduction of
the search space.
Figure 3: Detected silhouette of the building
Figure 3 shows the silhouette of a building as automatically
detected by the GHT within the captured image. Based on the