CMRT09

stilla, uwe
be evaluated according to the statistics of the performed 
bundle adjustment. This step is usually robust enough for 
typical building scenes, because the facades are often suf 
ficiently textured, and we do not have to deal with total oc 
clusions. Otherwise, problems may occur due to too large 
mirroring facade parts. 
The images are oriented relatively, not absolutely, i. e. the 
position of the projection centers are not correctly scaled 
yet. Since we cannot invert a transformation from 3D to 
2D, a reasonable assumption about the scale always has to 
be inserted additionally. The easiest way to set the scale 
parameter is to measure GPS positions during the image 
acquisitions. Another strategy would be to measure one or 
more distances on the object and to identify corresponding 
points in the images or in the extracted point cloud later. 
While the first way can easily get automatized, the second 
one has to be done by human interaction. 
From the second step on, we only use three images for a 
dense trinocular matching and only accept those 3D points, 
which were matched in all three images. Thus, we re 
duce many matching errors close to the image borders and 
avoid points corresponding to occluded surfaces. We use 
the semi-global matching by (Hirschmuller, 2005) in a re 
alization by (Heinrichs et al., 2007). It is efficient, does 
not produce too many outliers, and returns a dense point 
cloud with sufficiently precise points. This approach de 
mands that the images are arranged in a L-shaped config 
uration with a base image, a further one shifted approxi 
mately only horizontally and a third shifted approximately 
only vertically. Due to the special relation between the 
three given images, the search space of the matching and 
3D estimation of a point is reduced to a horizontal or ver 
tical line, respectively. So far, the two parameters of the 
one-dimensional search space for the depth have to be set 
manually before the program is started. Usually, this range 
lies in a small bound assuming that the flying height or the 
distance of a facade to the camera are restricted and do not 
vary much. 
The semi-global matching returns a disparity map, which 
is used to estimate the 3D point cloud by forward intersec 
tion. There are a couple of hundred or a thousand gross er 
rors in the determined point cloud, which can be removed 
under the assumption that all points lie in a certain bound 
ing box. Besides of the remaining outliers the most ex 
tracted 3D points form spatial clusters with clearly visible 
ground and roof planes, cf. fig. 3. Compared with other de 
rived point clouds from stereo aerial imagery, e. g. Match- 
T 1 , the precision of our reconstructed points is significantly 
lower, but we compensate it by the higher denseness. 
REGION-WISE PLANE ESTIMATION 
In this section, we describe the estimation of the most dom 
inant plane for each detected image region of minimum 
size. Thereby, any arbitrary image partitioning algorithm 
1 Automated DTM Generation Environment by inpho, cf. 
www.inpho.de 
Figure 3: Side- and frontview on a point cloud, derived 
from scene extracts of the three aerial images from fig. 
Besides the widely spread points on vegetation objects and 
some outliers, one can clearly recognize up to four major 
clusters showing the ground, a flat roof and a gabled roof. 
can be chosen. In an earlier experiment, we made good ex 
periences with segmenting aerial images using the water 
shed algorithm based on the color gradient, cf. (Drauschke 
et al., 2006). This segmentation approach is also applica 
ble to facade images, cf. (Drauschke, 2009). To overcome 
oversegmentation at nearly all image parts, we smooth the 
image with a Gaussian filter with a = 2 before determin 
ing the watershed regions. Then, oversegmented image 
parts are highly correlated with vegetation objects, which 
are not in our focus yet. Such an initial segmentation is 
shown in fig. 1. For further calculations, we only consider 
those regions Rk, which have a minimum size of 250 pix 
els. This parameter should depend on the image size. We 
have chosen a relatively high value for efficiency reasons. 
In the further process, we want to estimate low order poly 
nomial through the 3D points of each region, i. e. its most 
dominant plane. Therefore, we determine for each region 
the set of points {X,} from the point cloud, which are 
projected into the region: 
We assume that most dominant building surfaces and the 
ground are planar. Hence, we estimate the best fitting plane 
through the 3D points of a region. A similar procedure can 
be found in (Tao and Sawhney, 2000). For efficiency rea 
son, we choose a RANSAC-based approach for our plane 
search, cf. (Fischler and Bolles, 1981). Therefore, we de 
termine the parameters of the plane’s normal form from 
three randomly chosen points Xj 1 , Xj 2 and Xj 3 :
1
2
...
224
225
226
227
228
...
252
253
Full text: CMRT09

Access restriction

Copyright

Note to user