The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3b. Beijing 2008
480
model. Chapter 3 presents the application of the 5-point
algorithm with infrared image sequences recorded as described
in chapter 2 and the strategy for quality comparison of the given
building model’s façades and the estimated surfaces of the 5-
point algorithm as well as the comparison of the measured
camera path and the estimated camera path of the algorithm. In
chapter 4 there are given some experimental results and quality
measurements using Nistér’s position estimation and chapter 5
finishes up with a conclusion.
2. NISTÉR’S 5-POINT ALGORITHM, THE SPECIAL
BEHAVIOUR OF INFRARED LIGHT AND CAMERAS
2.1 A Short introduction to Nistér’s 5-point algorithm
Nistér’s 5-point algorithm was developed as an efficient
solution to the relative pose estimation problem of a camera
between two calibrated views using 5 corresponding image
points. From images only, it is possible to reconstruct only the
relative orientation of the image pair and thus the relative
position of the corresponding points and the cameras of the
views can be determined. The scale of the scene cannot be
reconstructed as well as of course the absolute positions. This
limitation to a relative and unsealed orientation is one of the
main problems for the integration in a given building model.
The algorithm uses a hypothesis generator within a random
samples consensus scheme (RANSAC) (Fischler and Bolles,
1981). The precondition of intrinsic calibration of the camera
given an improvement of the accuracy and robustness,
especially for the special case, the algorithm is used for in this
paper. The calibration of the camera minimizes problems with
planar scenes and building façades normally appear planar.
Without calibration the methods fails in coplanar scene points
as there remain many correct solutions. Using not only image
pairs but image triplets, the RANSAC scheme with the 5-point
algorithm resolves all ambiguities. One precondition is a
sufficient change in the observed scene between the images
which is normally achieved by changing the camera position
and viewing direction. A detailed mathematical description of
the recovering of the translation and rotation of the second and
third view corresponding to the first view, can be found in
Nistér (2004).
2.2 Recorded infrared image sequences
Current IR cameras cannot reach the optical resolution of video
cameras or even digital cameras. Like in the visible spectrum,
the sun affects infrared records. Images in the mid-wave
infrared are directly affected as in addition to the surface
radiation caused by the building’s temperature the radiation of
the sun is reflected. In long-wave infrared the sun’s influence
appears only indirect, as the sun is not sending in the long wave
spectrum, but of course is affecting the surface temperature of
the building.
Caused by the small field of view and the low optical resolution
it was necessary to record the scene in oblique view to be able
to record the complete facades of the building from the floor to
the roof and to get an acceptable texture resolution. The image
sequences were recorded with a frequency of 50 frames per
second. The viewing angle related to the along track axis of the
van was constant. Figure 1 shows a set of images from the
sequence. The position of the camera was recorded with GPS
with an accuracy of 2-5 meters and, for quality measurements
from tachymeter measurements from ground control points.
Fig. 1 : Images of one test sequence showing the angular view
and the camera movement.
2.3 Description of the given 3d building model
The information extracted from the infrared image sequences
has be assigned to the corresponding building in a GIS database.
To link extracted façade textures and GIS database, the given
polygonal building model stored in the database is taken. This
model is given in LOD 2 and represents the façades as one
polygonal surface with the vertices in world coordinates.
3. AUTOMATED EXTRACTION OF SURFACES AND
TEXTURES
3.1 Application of the 5-point algorithm on infrared image
sequences
Nistér’s 5-point pose estimation can be used to extract point
clouds from image triplets and a relative camera path. Those
point clouds can then be used to estimate surfaces in a scene
observed from several images. Mayer (Mayer 2007) has
introduced an approach for wide-baseline image sequences. In
this approach, Fôrstner points (Fôrstner and Gülch, 1987) are
matched via cross-correlation. RANSAC is used with the
RANSAC scheme of Chum et al. (2003) for the estimation of
the fundamental matrix F and trifocal tensor T of the image
triplet. The found inliers are used for a robust bundle
adjustment (Hartley and Zisserman 2003). To orient the whole
image sequence, the triplets are linked based on homographies
and already known 3d points of the already oriented sequence
part. For the reconstruction of planes from the point clouds,
vanishing points are detected for groups of images. Because
building façade are often vertical, the medians in x- and y-
direction can be takes as the vertical direction. Planes are
searched defining a maximum distance of a point to a plane.
The best plane is the plane with the smallest distance to a
hypothesized plane. From the plane parameters and projection
matrices homographies are computed between the planes and
the images.
All images are recorded from a moving vehicle with the same
viewing direction. This means, that the camera is only moving
along a path. In a first glance, this seems to be a simplification.
But, although the angle between the camera and the moving
direction is constant, there are changes in the viewing direction
caused by the movement of the vehicle. So the viewing
direction cannot be seen as fixed. For the reconstruction of the