The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3b. Beijing 2008
714
2. SYSTEM OVERVIEW
Figure 1 shows the main components in our system. The
epipolar images are generated from the aerial images by
epipolar resampling process. We obtain the disparity map
between the epipolar pairs by stereo matching using area-based
matching with non-parametric technique. From the disparity
map, we generate the DEM as a 3D terrain model. The building
location information extracted from disparity map is used to
remove the unnecessary line segments extracted in the low level
process. After 2D lines are generated, perceptual grouping is
applied to the filtered line segments in order to obtain the
structural relationship features such as parallel line segment
pairs and U-shapes. These can be used to generate rooftop
hypotheses. Among the generated hypothesis, the candidate
rooftop is selected by searching close cycles in the undirected
graph. Finally, we retrieve 3D buildings by using 3D
triangulation for each line segment of detected rooftops.
Figure 1. System Overview
3. BUILDING REGION EXTRACTION
3.1 Stereo Matching
To find accurate disparity map, we employed a multi-resolution
scheme, referred to as hierarchical, or pyramid processing. For
each resolution scheme, the correspondence problem is solved
by first computing census transformed image and then using
Hamming distance correlation on the transformed image. The
census transformation maps the local region surrounding a pixel
to a bit string represent which pixels have lesser intensities. For
example, in a window surrounding a pixel, if a particular pixel’s
value is less than the centre pixel, the corresponding position in
the bit string will be set to 1, otherwise it is set to 0. After that,
two census transformed images will be compared using a
similarity metric based on the Hamming distance which is the
number of bits that differ in the two correlation window bit
string. The Hamming distance (Banks, 1997) is summed over
the window:
Hammi(I ] (u, v), I 2 (x + u,y + v)) (1)
(u,v)elF
where /, , / 2 represent the census transforms of /, and / 2 ,
W is the correlation window.
3.2 Suspected Building Region Extraction
It is usually difficult to separate interested objects from 2D line
segments collection obtained in low level features extraction.
The boundary of interested objects, the buildings, can be partly
occluded by vegetation, shadows, and other objects. In rooftop
hypothesis process, these fragmented boundaries and the
presence of roads, vehicles ... can make false hypotheses
including unwanted rooftop and wrong shape rooftop. This
causes not only significant computational effort in processing
but also wrong final results. To solve this problem, the system
should be able to detect line segments that are within or near
buildings in the image. Here, we use suspected building regions
which extracted from the disparity map. The suspected building
regions are areas which pixel values changes in comparison
with the surround area. The different of pixel values between
suspected building region and surround areas indicates the
different of elevation values. It indicates the existing of higher
objects such as buildings, trees ... in those regions. In the other
words, these regions can give us the information of where the
buildings are located.
The goal of stereo matching process is to find a match between
the pixels in the first (reference) R and second (wrap) W image
such that the pixel located at (i, j) in the R image and a pixel
located at (i+I(i, j), j+J(i, j)) in the W image view the same
point in object space, i.e., W(i+I(i, j), j+J(i, j)) -> R(i,j), where
I(i, j) is horizontal disparity map, and J(i, j) is vertical disparity
map. The index i (column index) is measured along scan lines
and the index j (row index) is measured across scan lines. In
this paper, we use epipolar resampled images, and J(i, j) = 0
for all i and j. This relation can be reduced to W(i+I(i, j), j) ->
wj).
Considering the correspondence problem, there are two popular
approaches. The first one is Normalized Cross Correlation
(NCC) which is one of area-based matching typical metric, and
the second one is non-parametric technique with census
transform (Zabih, 1998). We employ the census transform, due
to its preservation of the edges and computational simplicity.
These regions could be extracted by using a simple height
threshold technique. Their boundaries are extracted by
convolving the disparity map with a Laplacian-of-Gaussian
filter then employing connected component analysis to get zero
crossing pixels’ coordinate in the convolution output. We have
LoG as an operator or convolution kernel defined as:
LoG(x,y) = AG a (x,y)
d 2
(2)
- TGAx,y) + -rjG a (x,y)
ox dy
KG
1-
2 2
X + y
2 cr
* +y
2ct 2