be evaluated according to the statistics of the performed
bundle adjustment. This step is usually robust enough for
typical building scenes, because the facades are often suf
ficiently textured, and we do not have to deal with total oc
clusions. Otherwise, problems may occur due to too large
mirroring facade parts.
The images are oriented relatively, not absolutely, i. e. the
position of the projection centers are not correctly scaled
yet. Since we cannot invert a transformation from 3D to
2D, a reasonable assumption about the scale always has to
be inserted additionally. The easiest way to set the scale
parameter is to measure GPS positions during the image
acquisitions. Another strategy would be to measure one or
more distances on the object and to identify corresponding
points in the images or in the extracted point cloud later.
While the first way can easily get automatized, the second
one has to be done by human interaction.
From the second step on, we only use three images for a
dense trinocular matching and only accept those 3D points,
which were matched in all three images. Thus, we re
duce many matching errors close to the image borders and
avoid points corresponding to occluded surfaces. We use
the semi-global matching by (Hirschmuller, 2005) in a re
alization by (Heinrichs et al., 2007). It is efficient, does
not produce too many outliers, and returns a dense point
cloud with sufficiently precise points. This approach de
mands that the images are arranged in a L-shaped config
uration with a base image, a further one shifted approxi
mately only horizontally and a third shifted approximately
only vertically. Due to the special relation between the
three given images, the search space of the matching and
3D estimation of a point is reduced to a horizontal or ver
tical line, respectively. So far, the two parameters of the
one-dimensional search space for the depth have to be set
manually before the program is started. Usually, this range
lies in a small bound assuming that the flying height or the
distance of a facade to the camera are restricted and do not
vary much.
The semi-global matching returns a disparity map, which
is used to estimate the 3D point cloud by forward intersec
tion. There are a couple of hundred or a thousand gross er
rors in the determined point cloud, which can be removed
under the assumption that all points lie in a certain bound
ing box. Besides of the remaining outliers the most ex
tracted 3D points form spatial clusters with clearly visible
ground and roof planes, cf. fig. 3. Compared with other de
rived point clouds from stereo aerial imagery, e. g. Match-
T 1 , the precision of our reconstructed points is significantly
lower, but we compensate it by the higher denseness.
REGION-WISE PLANE ESTIMATION
In this section, we describe the estimation of the most dom
inant plane for each detected image region of minimum
size. Thereby, any arbitrary image partitioning algorithm
1 Automated DTM Generation Environment by inpho, cf.
www.inpho.de
Figure 3: Side- and frontview on a point cloud, derived
from scene extracts of the three aerial images from fig.
Besides the widely spread points on vegetation objects and
some outliers, one can clearly recognize up to four major
clusters showing the ground, a flat roof and a gabled roof.
can be chosen. In an earlier experiment, we made good ex
periences with segmenting aerial images using the water
shed algorithm based on the color gradient, cf. (Drauschke
et al., 2006). This segmentation approach is also applica
ble to facade images, cf. (Drauschke, 2009). To overcome
oversegmentation at nearly all image parts, we smooth the
image with a Gaussian filter with a = 2 before determin
ing the watershed regions. Then, oversegmented image
parts are highly correlated with vegetation objects, which
are not in our focus yet. Such an initial segmentation is
shown in fig. 1. For further calculations, we only consider
those regions Rk, which have a minimum size of 250 pix
els. This parameter should depend on the image size. We
have chosen a relatively high value for efficiency reasons.
In the further process, we want to estimate low order poly
nomial through the 3D points of each region, i. e. its most
dominant plane. Therefore, we determine for each region
the set of points {X,} from the point cloud, which are
projected into the region:
We assume that most dominant building surfaces and the
ground are planar. Hence, we estimate the best fitting plane
through the 3D points of a region. A similar procedure can
be found in (Tao and Sawhney, 2000). For efficiency rea
son, we choose a RANSAC-based approach for our plane
search, cf. (Fischler and Bolles, 1981). Therefore, we de
termine the parameters of the plane’s normal form from
three randomly chosen points Xj 1 , Xj 2 and Xj 3 :