CMRT09: Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms, and Evaluation
214
and check, how many object points support the determined
plane i. e. how many points are near the plane. This de
pends on the choice of a threshold. Considering aerial
images we allowed a maximal distance of 20 cm to the
plane. If we want to guarantee with a minimum proba
bility Pmin = 0.999 finding a plane, which is constructed
by 3 points and supported by at least half of the points
(e = 0.5), we have to perform m = 52 trials, because
log (1 ~ Pmin)
log (1 - (1 - e) 3 )
log 0.001
log 0.875
51.7. (4)
If no sufficiently high number of supporting points can be
found within m trials, the region will no longer be ana
lyzed. In our empirical investigation, segmented regions
representing roof parts have always a most dominant plane.
Such plane could not get found if e. g. the ground is not
planar but forms a small hill or valley, e. g. at and around
trees and shrubs. Furthermore, we accepted only those 3D
points, which are visible in all three images. Therefore,
occluded building parts are also not in further process.
We estimate the best fitting plane using a least-squares ad
justment on those points, which support the best proposed
plane during the iterations of RANSAC. The statistical rea
soning 2 is taken from (Heuel, 2004), p. 145.
5 MERGING OF IMAGE REGIONS
So far, our approach can only handle with merging of re
gions. If the image is undersegmented in some image parts,
i. e. the region covers two or more objects, a splitting crite
rion must be defined to separate this region parts again. We
suggest to search for several dominant planes and to split
the regions according to the intersections of these planes.
We did not realize the splitting yet, so we only propose our
merging strategy.
We determine a region adjacency graph and check for each
adjacent pair of regions Ri and R 2 if a merging of the re
gions can get accepted. The first test is on equality of the
two corresponding estimated planes. We realized that our
derived point cloud is too noisy for such statistical reason
ing. Therefore, we consider a second test, where we de
termine the best fitting plane through the set of 3D points
from both regions and then we check, if the new plane has
a normal vector n\ 2 which is similar to the normal vectors
ni and n 2 of the two previous planes:
t- (ni2,ni) < 9 A Z (ni2,n 2 ) < 0. (5)
In our experiments, we used 9 = 30°, which leads to rea
sonable results with respect to buildings. If one is inter
ested in each individual roof plane, 6 should not be more
than 10°. If other applications cannot depend on such a
heuristically chosen parameter, we suggest to adapt this
condition by a MDL-based approach, cf. (Rissanen, 1989).
Then, two regions should be merged, if the encoding of
data would decrease when merging.
2 SUGR: Statistically Uncertain Geometric Reasoning, www.ipb.uni-
bonn.de/projects/SUGR
Figure 4: Steps of improving image segmentation. In the
upper row, we show the reference image and its initial seg
mentation. In the bottom row, we show at the left all big
regions from the initial partition (in white) and the final
segmentation including the MDL-based and the geometry-
based grouping of regions. There, the gray-shadowed re
gions have been merged on the basis on geometric proper
ties.
Until this point, we did not consider small regions whose
dominant planes cannot be estimated reliably. Now, we
also merge them, too. Small holes can easily merge with
their surrounding region, but all others may be merged ac
cording to an intensity-based criterion. We implemented a
MDL-based strategy according to (Pan, 1994), where we
additionally stop the merging as soon as the minimum size
of a region has been reached. As alternatives, we could
also use strategies for irregular pyramid structures, e. g.
(Guigues et al., 2003), which is based on similarity of color
intensities or (Drauschke, 2009) which is based on scale-
space analysis. Resulting image segmentation is shown in
fig. 4.
6 EXPERIMENTS
We have tested our segmentation scheme on 28 extracts of
aerial images with known projection matrices showing ur
ban scenes in Germany and Japan. The images from Ger
many were taken in early spring when many trees are in
blossom, but are not covered by leaves yet. The 3D points
matched at such vegetation objects are widely spread, cf.
fig. 3. In most cases, the corresponding image parts are
oversegmented, so that no dominant planes have to get es
timated. There is almost no vegetation in the Japanese
images, but the ground is often dark from shadows. As
mentioned earlier, we have problems with finding precise
3D points in lawn and shadow regions, but with respect to
building extraction (i. e. segmenting the major roof parts),
our approach achieves satisfying results cf. fig. 5. We are
convinced to get better results for matching in dark image
parts, if a local enhancement is used to brighten these parts