Full text: CMRT09

CMRT09: Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms, and Evaluation 
Within the last years, the matching of multiple views of 
an object enabled the reconstruction of 3D object points 
with high accuracy and high density. Previous approaches 
as (Kanade and Okutomi, 1994) are based on a low-level 
preprocessing of the image to extract points of interest. 
Then, the correspondences of such points are used to es 
timate the 3D position of the object points. In many ap 
plications, Forstner-features (Forstner and Gulch, 1987) 
or SIFT-features (Lowe, 2004) are used, but the derived 
point clouds are either sparse or have been extracted from 
many images or video, e. g. (Mayer and Reznik, 2005) and 
(Gallup et al., 2007). In (Tuytelaars and Van Gool, 2000), 
the correspondences are determined over local affinely in 
variant regions, which were extracted from local extrema 
in intensity images. This procedure is liable to make match 
ing mistakes if the image noise is relatively high. 
Dense point clouds from only a few images are obtained 
by adjusting the correspondence between pixels by correla 
tion based on (semi-) global methods, e. g. (Hirschmuller, 
2005). Assuming the observed objects have a smooth sur 
face, the accuracy of the obtained point clouds gets in 
creased by including information on the relations between 
the pixels by a Markov random field, e. g. (Yang et al., 
2009), or from image segmentation, e. g. (Tao and Sawh- 
ney, 2000). 
In our approach, we take up the idea of (Khoshelham, 2005) 
to improve an initial image segmentation using additional 
3D information. From multi-view analysis, we derive a 
point cloud, which is used for deriving additional features 
for the segmented image regions. We focus on building 
scenes, whose objects mostly consist of planar surfaces. 
So, it is reasonable to look for dominant planes in the point 
cloud, where the search is guided by the image segmenta 
tion. 
For us, it is important to realize an approach, which has 
the potential to get automatized since there are many ap 
plications with thousands of images. There is a need for a 
completely automatic procedure if additional features are 
derived from a reconstructed point cloud to improve the 
segmentation or interpretation of the images. Our input are 
only two or more images from the object, which were taken 
by a calibrated camera. An example is shown in fig. 1. 
3 RECONSTRUCTION OF THE 3D SCENE 
In this section, we describe the generation of the point 
cloud C from the given images. For this generation, there 
are two conditions, which should be fulfilled: (a) the ob 
served objects should be textured sufficiently and (b) the 
views must overlap, otherwise we have problems to deter 
mine the relative orientation between the images. So far, 
the implemented algorithms need some human interaction 
for setting the point cloud scale and the disparity range pa 
rameters, but under certain conditions, the whole approach 
could get designed to perform completely automatically. 
We describe the procedure with two or three given images 
Ji, 12 and /3. Two views are necessary to reconstruct the 
Figure 1: Three aerial views of a building scene consisting 
of a flat roofed part and a gable roofed part. The initial 
segmentation of the upper view is shown on its right side. 
The ground consists of several weirdly shaped regions, and 
the flat roof is also not well segmented. 
Figure 2: Reconstructed 3D-points are projected back into 
2D-image (white). Left: all pairs of matches are shown. 
The point cloud is very dense with approximately 75% of 
pixels having a 3D point, but these points are very impre 
cise. Right: only matches in all three images are shown. 
The point cloud is still dense with approximately 30% of 
pixels having a 3D point with higher precision. 
observed 3D data, but if the matching is performed over 
three images, the point cloud is still dense, see fig. 2, and it 
contains more reliable points, thus less outliers. The recon 
struction process can get improved if even more images are 
considered. If all used images were taken by a calibrated 
camera, we are able to reconstruct the 3D scene by per 
forming the following steps. 
In the first step we determine the relative orientations be 
tween the given images. Of course, it can be skipped if the 
projection matrices have been estimated during image ac 
quisition. Otherwise, due to the calibration of the camera 
we eliminate automatically the non-linear distortions using 
the approach of (Abraham and Hau, 1997). The matching 
of extracted key-points using the approach of (Lowe, 2004) 
leads to the determination of the relative orientations of all 
images, i. e. their projection matrices P n , cf. (Labe and 
Forstner, 2006). The success of the relative orientation can 
212
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.