ching can be
cal matches
matching is
areas unless
matched. In
veen the two
ies (or vice-
rresponding
| of methods
X match is
1atching the
of the area
plied to one
x features is
iatch is now
' considered
features is
dual feature
ween them
llelism). In
>chniques.
n be found
llustrate the
niques that
ation where
be matched
rom earlier
or purposes
second task
purpose of
'ases where
images and
de features
ith maps or
yperation is
1ges in the
ed. Change
ich as map
urveys and
ra-structure
the kind of
given input, the nature of the model and the constraints
provided by the knowledge of the imaging parameters. We
will consider three cases: matching range data with a digital
terrain model (DTM), matching an image with a 3-D model
with good camera orientation knowledge and a more general
matching situation. All of these cases do share a common
characteristic: the input image can be registered with the
model by one global transformation (though the global
transformation may be space variant to accommodate
distortions of the sensor). The task of matching thus becomes
that of estimating the parameters of the transformation. In
general, we would need to estimate the interior and exterior
camera parameters, though in most cases, some of the
parameters may be known or constrained to be in a certain
range.
3.1 Matching with a DTM
In this case, the model of the scene is simply an array of
heights on a grid. The DTM itself may be constructed by
stereo matching, by direct range sensing or by other means.
Let us consider the case where the input image is also a range
image (i.e. it contains height information).
We can consider both DTM and the range image to be like
intensity images where the image value represents height
rather than radiometric reflections. The search for
transformation parameters can be reduced to search in the
two-dimensional space of the ground plane. This search can
be conducted by using conventional area cross-correlation
methods. In such techniques, a measure of match is computed
by some metric on point to point differences of height (or
intensity) values: commonly used metrics are sum of the
squares of differences or the cross-correlation coefficient.
One array is translated relative to the other and the match
metric computed for different displacements and the one with
the best match is chosen. The search can be made more
efficient by utilizing a pyramid of varying resolution images:
coarse registration is achieved at the lower resolutions and the
search at higher resolutions is confined to the range given by
the lower resolution. Thus, high accuracy registration can be
achieved efficiently. An analysis of the accuracy of this
approach may be found in [7].
This technique is not directly applicable if the image is not a
range image but a conventional intensity image as the height
and intensity do not have a simple point to point correlation
and the intensity image is also a function of additional camera
parameters which need to be estimated. The author is not
aware of systems performing such registration but believes
that some form of feature matching as in the cases outlined
below will be required.
3.2 Matching a 3-D model with known camera
orientations
We now consider the task of registering an intensity image
with a 3-D model of the scene (we will call it a site model.
The site model itself may have been constructed from earlier
569
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
images and other sources of information. The site model may
contain various kinds of information, such as wireframe
models of buildings in the scene, transportation networks,
terrain heights and surface properties, depending on the
application. The site model is, in general, a symbolic data
structure and no point to point correspondence between it and
an image is possible. Instead, we seek to find the
transformation that projects the objects in the site model to
corresponding objects in the image.
In many photogrammetric and remote sensing applications,
the camera parameters are known with good precision.
Internal camera parameters are known by a calibration
procedure and external parameters are known from
measurements on the sensor platform. Let us consider the
case where camera parameters are known well enough so that
the projection of the site model overlays the image to be
registered well except for a translation in the image plane (the
precision of the location of the platform may be lower than
that of orientation). The task is now to find the correct
translation as in section 3.1 above.
In this task, however, we still can not apply the method of
pixel to pixel correlation as the projected model is not image
like- it may only contain outlines, many parts of the scene
may not be modelled at all and the projected structure does
not have intensity values associated with it!. Instead, we need
to compute some representations from both the image and the
model that are similar and can be matched. The matching
problem would be much easier if we could compute
descriptions from the image at the high levels of abstraction
that may be expected in the site model such as descriptions of
buildings and transportation networks. However, such
descriptions are difficult to infer reliably, so lower level
features need to be considered.
We have developed a system for matching a site model to
images where the dominant structures in the site model are
polyhedral buildings [2]. In this case, linear line segments
extracted from the image can provide sufficient features to
match with line segments from projections of the models.
Note that not all extracted lines will correspond to object
boundaries and not all object boundaries will be so detected,
but enough should be so that an overall match is possible.
Figure 1 shows an example image which is to be registered
with the model shown in Figure 2 (the figure shows the
projection of the model from the expected view point).
Figure 3 shows line segments extracted from the image of
Figure 1. The model lines and the image lines can now be
matched by selecting candidates from each set that
collectivelly vote for the best match. Note, however, that the
lines can not be matched on a point to point basis, as even
small errors will cause the lines to not align precisely. Instead,
we consider two line segments to match if they are within a
certain distance of each other. Further, the contribution of
1. we could consider constructing an image from the model, but
faithful reconstruction for the new imaging conditions is a difficult
task, requiring detailed knowledge of the reflectance properties of
the elements in the scene and of the imaging conditions