each line segment match is a function of the overlapped
distance between the two. The best match is determined from
the accumulated constributions. Figure 4 shows the
accumulator array, with the peak indicating the best match.
Figure 5 shows the result of registering the model to the
image (the model boundaries are overlaid on the image).
Details of the approach may be found in [14].
Note that the described processing only provides a
transformation that relates the models to the image. This is
not the same as actually matching building structures in the
model with the buildings in the image. This step requires
much more detailed processing. We need to examine how
many of the model features can actually be found in the image
and whether they are sufficient to confidently predict the
presence of the building. Details of such processing are also
given in [2].
ESS
Figure 1 Image from Fort Hood, Texas
ec SN
EF
e
c
Figure 2 Model projection from expected viewpoint
570
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
3.3 Unknown Camera Orientations
If several parameters of the camera orientation are not
known, the process of finding the best transformation as
described above becomes much more complex. Instead of
searching for two unknowns, we may need to search for five
or more unknowns. In principle, the search can still be
conducted as above, by hypothesizing different
transformation parameters and computing a match score, but
search in a five dimensional space may become prohibitive.
An alternative is to use alignment [11] techniques. Here, a
transformation (alignment) is computed from a small number
of feature matches; the transformation can then be used to
verify matching of the remaining features. The minimum
number of features needed to estimate the transformation
depends on the nature of features (points or lines, 2-D or 3-D)
and the complexity of the estimation method. In recent work,
several methods have been developed that provide closed
form solutions for computing the transformation from the.
matched features. |
The alignment approach avoids searching through the
transformation space. However, it requires correct matching
of initial features used to estimate the transformation. If only
low-level features, such as points or lines are used, it may not
be possible to obtain unambiguous matches. One commonly
used approach is to match all subsets of features in the model
to all subsets of features derived from the image. Clearly,
such computation can be very expensive (it is O(n™),where n
is the total number of features and m is the number required
to compute a transform). Use of higher level features, either
groups of features or even better, surfaces and volumes, can
greatly reduce the complexity by reducing the ambiguity.
Groups of features have been used for estimating the pose of
objects in indoor scenes (for example, see [13]) application in
outdoor scenes is likely to be much more difficult.
4. MATCHING FOR DEPTH ESTIMATION
To extract 3-D structure from two or more images, we need
to compute correspondences between points or features in the
multiple images; good surveys of various approaches can be
found in ([5],[6], [8]). One fundamental difference between
this task and that of registering an image to a site model is that
a single, global transformation is not applicable. Rather, the
transformation from one image point to another is a function
of the unknown height of the point. Thus, we can not use
global matching methods but need to match points and
features in smaller areas. Small area matches, on the other
hand, are highly ambiguous. A pixel by itself can only be
characterized by an intensity value; this value varies
somewhat with the viewpoint and many pixels in an image
will have similar intensities. Some context from the
neighborhood needs to be utilized to disambiguate.