XXII ISPRS Congress 2012: Technical Commission III

    
NFRARED 
developed to extract 
; are discussed and 
contrast to existing 
the image sequence 
parts. The first part 
ages and used as tie 
undle adjustment is 
ding is additionally 
les the extraction of 
constructed exterior 
is extracted. These 
one facade texture. 
tures and localising 
ven for big building 
ce for the extraction 
k et al., 2011). 
radiation in the 
ce characteristics of 
> in normal visible 
little difference in 
etails from distance, 
mperatures with an 
ed cameras are able 
ideo frame rate (25 
cooling technique, 
ers, the expenses of 
] to normal video 
| for many different 
: are collected from 
red images. Bigger 
several images. The 
photos without any 
|a problem, when 
are combined and 
f buildings, in this 
texturing an entire 
'esolution and small 
oblems. Only small 
e image. Direct line 
f the model's edges 
al., 2004; Avbelj et 
jle facade edges in 
) deal with the fact, 
not always have 
d vice versa. This 
    
  
problem is even getting worse with a moving camera and 
inaccurate orientation parameters caused by the GPS system. 
Different strategies for matching of given 3d models and images 
are well known in computer vision. Single image processing is 
working with 3 or more correspondences between image and 
model. An overview over 3-point algorithms is given in 
Haralick et al. (1994). Triggs (1999) introduces a generalisation 
of the 6-point Direct Linear Transformation (DLT) for camera 
pose estimation and calibration from a single image with 4 or 5 
known 3D points. There are also iterative methods proposed in 
Haralick et al. (1989). Longuet-Higgins (1981) introduce the 8- 
point algorithm to projectively reconstruct a relatively oriented 
scene from two different views without previously known 3D 
coordinates of the points. The eight point correspondences are 
used to calculate the Fundamental matrix (Hartley and 
Zisserman 2003), which describes the relative orientation of the 
two views. Two images showing the same planar surface are 
related by a homography. This homography is used to find the 
relative orientation of the two images via their corresponding 
plane in the scene (Hartley and Zisserman 2003). When using 
image sequences, multiple images can be used for pose 
estimation. 
Another approach to relatively orient two images is introduced 
by Nistér (2004). The algorithm uses corresponding SIFT 
feature points (Lowe, 2004) in two calibrated views of a scene 
and calculates the essential matrix of this two images from five 
corresponding points to find a relative camera motion between. 
Improvement of robustness is samling sets of five points within 
a random sample consensus scheme (RANSAC) (Fischler and 
Bolles 1981). A hypothesis test deals with mismatched points 
(Torr and Murray, 1997; Zhang, 1998) and allows the 
combination of hundreds of views using trifocal tensors (Nistér 
2000). Further extensions of this algorithm towards the 
handling of possible wide-baseline image sequences taken with 
digital still-images and video cameras have been achieved by 
Mayer (2007), Pollefeys et al, (2008) and Heinrichs et al. 
(2008). Mayer (2007) adopts Nistér's algorithm for facade 
extraction and texturing from multiple views. In this paper, 
fhese strategies are extended to deal with a given building 
model and camera path in a global coordinate system to match 
the image sequence on an existing building model. 
2. METHODOLOGY 
The matching process between the images and the building 
model is done in two steps. The usage of continuous image 
sequences taken from a moving car allows performing a relative 
orientation of the images of a sequence to extract estimated 
facade planes and a relative camera path (Mayer, 2007; 
Heinrichs et al., 2008; Lo & Quattrochi, 2003; Pollefeys et al., 
2008). In this paper the observed exterior camera orientation is 
added as additional observation in the bundle adjustment. This 
allows recovering the scaling factor of.the image sequence 
which is unknown in only relative orientation. The transfer of 
the image sequence to the global coordinate system allows the 
matching of edges detected in the images and the given building 
model. As mentioned, a matching of image edges and model 
lines only is not successful in most of the images due to the 
small part of building models visible in one image. But possible 
corresponding parts of edges in the image and lines of the 
model can be introduced to the bundle adjustment. They are of 
special interest for the correct borders of facade planes. 
The second step uses the images and their global coordinates 
from the first step to perform a coregistration of images and 
building in 2d and 3d. In the 2d image space a matching of 
extracted image edges and projected lines of the 3d model can 
   
be performed (Avbelj et al., 2010). In the 3d space 3d points are 
generated from the homologous points of the image sequence 
orientation process. These points are grouped in planes and 
matched with facade planes of the 3d model. Both 2d and 3d 
matching are combined in a bundle adjustment. 
The quality of the bundle adjustment of the orientation of the 
image sequence as well as the matching of the estimated 3d 
point cloud and the building facades depends on the number 
and accuracy of homologous point features in the images of the 
sequence. Two representatives of different point feature classes 
have been used in this paper. Gradient based features like 
Foerstner (Foerstner & Guelch, 1987) points are compared to 
blob detectors like SIFT features (Lowe, 2004). The advantage 
of gradient detectors is their stability and accuracy for small 
changes of the camera orientation and scene which is the case in 
adjacent images of the image sequence. The advantage of blob 
detectors is their tolerance to changing viewing directions and 
scales of features which is the case if we try to find two images 
of the sequence with a big stereo base and many homologous 
points. Both bundle adjustments, the image sequence 
orientation and the matching of the sequence and the building 
model are performed with both feature classes to compare their 
quality. 
The quality of the matching of the image sequence and the 
building model is done by analyzing the extracted textures. 
Textures from different sequences at different times and with 
different orientation parameters are compared through 
correlation. The textures are extracted by projecting the voxels 
of a predefined grid from a surface into the image space and 
interpolate the intensity values. This is done for every image 
where a surface is partially visible. The final texture has to be 
combined from these partial textures by choosing the best 
texture for every pixel. In general, different aspects have to be 
taken into account for this and especially if no textures covers 
the hole surface, the best quality solution is quite difficult. If we 
concentrate on image sequences with a constant oblique 
viewing direction, this problem can be simplified. Every 
following image has a higher resolution of all visible parts of all 
surfaces than all images before. This means, that we can 
overwrite partial texture 1 with all parts of partial texture 2, 
where partial texture two was visible, if image 2 is a follower of 
image 1 and the camera is forward looking. 
3. EXPERIMENTS 
Current IR cameras cannot reach the optical resolution of 
video cameras or even digital cameras. The camera used for the 
acquisition of the test sequences offers an optical resolution of 
320x240 pixels with a field of view (FOV) of only 20°. The 
FLIR SC3000 camera is recording in the thermal infrared (8 - 
12 um). On the top of a van, the camera was mounted on a 
platform which can be rotated and shifted. Like in the visible 
spectrum, the sun affects infrared records. In long-wave infrared 
the sun's influence appears only indirect, as the sun is not 
sending in the long wave spectrum, but of course is affecting 
the surface temperature of the building. 
Caused by the small field of view and the low optical resolution 
it was necessary to record the scene in oblique view to be able 
to record the complete facades of the building from the floor to 
the roof and an acceptable texture resolution. The image 
sequences were recorded with a frequency of 50 frames per 
second. Small changes between two images reduce the number 
of mismatches on regular, repetitive structures like windows in 
facades but reduce the accuracy in 3d coordinate estimation. To 
guarantee anyhow a good 3d reconstruction, the features are 
tracked through the hole sequence and images are taken for the
1
2
...
392
393
394
395
396
...
586
587
Full text: Technical Commission III (B3)

Access restriction

Copyright

Note to user