NFRARED
developed to extract
; are discussed and
contrast to existing
the image sequence
parts. The first part
ages and used as tie
undle adjustment is
ding is additionally
les the extraction of
constructed exterior
is extracted. These
one facade texture.
tures and localising
ven for big building
ce for the extraction
k et al., 2011).
radiation in the
ce characteristics of
> in normal visible
little difference in
etails from distance,
mperatures with an
ed cameras are able
ideo frame rate (25
cooling technique,
ers, the expenses of
] to normal video
| for many different
: are collected from
red images. Bigger
several images. The
photos without any
|a problem, when
are combined and
f buildings, in this
texturing an entire
'esolution and small
oblems. Only small
e image. Direct line
f the model's edges
al., 2004; Avbelj et
jle facade edges in
) deal with the fact,
not always have
d vice versa. This
problem is even getting worse with a moving camera and
inaccurate orientation parameters caused by the GPS system.
Different strategies for matching of given 3d models and images
are well known in computer vision. Single image processing is
working with 3 or more correspondences between image and
model. An overview over 3-point algorithms is given in
Haralick et al. (1994). Triggs (1999) introduces a generalisation
of the 6-point Direct Linear Transformation (DLT) for camera
pose estimation and calibration from a single image with 4 or 5
known 3D points. There are also iterative methods proposed in
Haralick et al. (1989). Longuet-Higgins (1981) introduce the 8-
point algorithm to projectively reconstruct a relatively oriented
scene from two different views without previously known 3D
coordinates of the points. The eight point correspondences are
used to calculate the Fundamental matrix (Hartley and
Zisserman 2003), which describes the relative orientation of the
two views. Two images showing the same planar surface are
related by a homography. This homography is used to find the
relative orientation of the two images via their corresponding
plane in the scene (Hartley and Zisserman 2003). When using
image sequences, multiple images can be used for pose
estimation.
Another approach to relatively orient two images is introduced
by Nistér (2004). The algorithm uses corresponding SIFT
feature points (Lowe, 2004) in two calibrated views of a scene
and calculates the essential matrix of this two images from five
corresponding points to find a relative camera motion between.
Improvement of robustness is samling sets of five points within
a random sample consensus scheme (RANSAC) (Fischler and
Bolles 1981). A hypothesis test deals with mismatched points
(Torr and Murray, 1997; Zhang, 1998) and allows the
combination of hundreds of views using trifocal tensors (Nistér
2000). Further extensions of this algorithm towards the
handling of possible wide-baseline image sequences taken with
digital still-images and video cameras have been achieved by
Mayer (2007), Pollefeys et al, (2008) and Heinrichs et al.
(2008). Mayer (2007) adopts Nistér's algorithm for facade
extraction and texturing from multiple views. In this paper,
fhese strategies are extended to deal with a given building
model and camera path in a global coordinate system to match
the image sequence on an existing building model.
2. METHODOLOGY
The matching process between the images and the building
model is done in two steps. The usage of continuous image
sequences taken from a moving car allows performing a relative
orientation of the images of a sequence to extract estimated
facade planes and a relative camera path (Mayer, 2007;
Heinrichs et al., 2008; Lo & Quattrochi, 2003; Pollefeys et al.,
2008). In this paper the observed exterior camera orientation is
added as additional observation in the bundle adjustment. This
allows recovering the scaling factor of.the image sequence
which is unknown in only relative orientation. The transfer of
the image sequence to the global coordinate system allows the
matching of edges detected in the images and the given building
model. As mentioned, a matching of image edges and model
lines only is not successful in most of the images due to the
small part of building models visible in one image. But possible
corresponding parts of edges in the image and lines of the
model can be introduced to the bundle adjustment. They are of
special interest for the correct borders of facade planes.
The second step uses the images and their global coordinates
from the first step to perform a coregistration of images and
building in 2d and 3d. In the 2d image space a matching of
extracted image edges and projected lines of the 3d model can
be performed (Avbelj et al., 2010). In the 3d space 3d points are
generated from the homologous points of the image sequence
orientation process. These points are grouped in planes and
matched with facade planes of the 3d model. Both 2d and 3d
matching are combined in a bundle adjustment.
The quality of the bundle adjustment of the orientation of the
image sequence as well as the matching of the estimated 3d
point cloud and the building facades depends on the number
and accuracy of homologous point features in the images of the
sequence. Two representatives of different point feature classes
have been used in this paper. Gradient based features like
Foerstner (Foerstner & Guelch, 1987) points are compared to
blob detectors like SIFT features (Lowe, 2004). The advantage
of gradient detectors is their stability and accuracy for small
changes of the camera orientation and scene which is the case in
adjacent images of the image sequence. The advantage of blob
detectors is their tolerance to changing viewing directions and
scales of features which is the case if we try to find two images
of the sequence with a big stereo base and many homologous
points. Both bundle adjustments, the image sequence
orientation and the matching of the sequence and the building
model are performed with both feature classes to compare their
quality.
The quality of the matching of the image sequence and the
building model is done by analyzing the extracted textures.
Textures from different sequences at different times and with
different orientation parameters are compared through
correlation. The textures are extracted by projecting the voxels
of a predefined grid from a surface into the image space and
interpolate the intensity values. This is done for every image
where a surface is partially visible. The final texture has to be
combined from these partial textures by choosing the best
texture for every pixel. In general, different aspects have to be
taken into account for this and especially if no textures covers
the hole surface, the best quality solution is quite difficult. If we
concentrate on image sequences with a constant oblique
viewing direction, this problem can be simplified. Every
following image has a higher resolution of all visible parts of all
surfaces than all images before. This means, that we can
overwrite partial texture 1 with all parts of partial texture 2,
where partial texture two was visible, if image 2 is a follower of
image 1 and the camera is forward looking.
3. EXPERIMENTS
Current IR cameras cannot reach the optical resolution of
video cameras or even digital cameras. The camera used for the
acquisition of the test sequences offers an optical resolution of
320x240 pixels with a field of view (FOV) of only 20°. The
FLIR SC3000 camera is recording in the thermal infrared (8 -
12 um). On the top of a van, the camera was mounted on a
platform which can be rotated and shifted. Like in the visible
spectrum, the sun affects infrared records. In long-wave infrared
the sun's influence appears only indirect, as the sun is not
sending in the long wave spectrum, but of course is affecting
the surface temperature of the building.
Caused by the small field of view and the low optical resolution
it was necessary to record the scene in oblique view to be able
to record the complete facades of the building from the floor to
the roof and an acceptable texture resolution. The image
sequences were recorded with a frequency of 50 frames per
second. Small changes between two images reduce the number
of mismatches on regular, repetitive structures like windows in
facades but reduce the accuracy in 3d coordinate estimation. To
guarantee anyhow a good 3d reconstruction, the features are
tracked through the hole sequence and images are taken for the