In: Paparoditis N., Pieirot-Deseilligny M.. Mallet C.. Tournaire O. (Eds), IAPRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3, 2010
buildings. The proposed building detection technique falls into
this group.
There are two groups of performance evaluation systems: those
using overlapping thresholds (Rottensteiner et al., 2005. Rutzinger
et al.. 2009, Lee et al., 2008) and those not using any thresholds
(Shan and Lee. 2005. Shufelt. 1999). In (Rottensteiner et al..
2005) and (Rutzinger et al., 2009), a correspondence was estab
lished between a detected building and a reference building if
they overlapped each other either strongly, more than 80% over
lap, or partially, 50% to 80% overlap. Both of the above evalua
tion systems do not reflect the actual detection scenario. Firstly,
the presence of false positive and false negative detections is not
considered at all. Secondly, there may be many-to-many rela
tionships between the detected and reference sets and such rela
tionships are considered as error (Shan and Lee, 2005). Finally,
merging and splitting of the detected buildings (Rutzinger et al.,
2009) does not necessarily correspond to the actual performance.
Without using a particular overlapping threshold, (Shufelt, 1999)
showed the detection performance graphically as the overlapped
area varied from 0-100%. (Shan and Lee. 2005) presented results
by histograms showing the frequency of buildings as functions of
underlap, overlap, extralap, crosslap. and fitness. The number of
false negative buildings was indicated by the frequency al 100%
underlap and the number offalse positive buildings was indicated
by the frequency both at crosslap 0 and 0% fitness.
The evaluation systems can also be categorized into pixel-based
(Rottensteiner et al., 2005, Rutzinger et al., 2009. Lee et al.. 2008)
and object-based systems (Rutzinger et al., 2009). While the lat
ter counts the number of buildings and offers a quick assessment,
the former is based on the number of pixels and provides more
rigorous evaluation (Song and Haithcoat. 2005). The pixel-based
evaluation indirectly corresponds to the horizontal accuracy of
the detected building footprints.
3 PROPOSED DETECTION TECHNIQUE
Fig. 1 shows the flow diagram of the proposed building de
tection technique. The input information consists of a LIDAR
point cloud and multispectral orthoimagery. The primary' and sec
ondary masks are first derived from the LIDAR data, along with
NDVI values from the orthoimagery. The initial building posi
tions are derived from the primary building mask. The colour
information in the multispectral images is usually in the RGB
system and therefore is converted into the YIQ system. The final
buildings are obtained by extending their initial positions using
the two masks and the YIQ colour information.
3.1 Generation of Masks
The raw LIDAR data is divided into groups where each group
corresponds to a tile of 450 x 450 image pixels; i.e., all laser
points corresponding to an image tile go into the same group. A
histogram of the height data for each LIDAR group is obtained,
where bins of low heights correspond to ground areas and those
of large heights correspond to trees and buildings. The distance
between successive bins is 2m and the bin having the maximum
frequency indicates the ground height H g for the corresponding
tile. This is based on the assumption that the majority of the LI
DAR points have heights similar to the ground height. Alterna
tively. the average DEM (Digital Elevation Model) value in each
tile can be used as H g . Figs. 2(a)-(b) show the tiles of masks on
an orthoimage and the groups of LIDAR data.
Figure 1 : Flow diagram of the proposed building detection tech
nique.
Figure 2: (a) A test scene, (b) LIDAR data (shown in gray-scale),
(c) primary building mask and (d) secondary building mask.
heights. The first set marks white for each of its point in the
primary building mask M v which is initially a completely black
mask. The second set marks black for each of its point in the sec
ondary building mask M s which is initially a completely white
mask. Consequently, the black areas in the primary building mask
indicate void areas where there are no laser returns below 7), and
those in the secondary building mask indicate filled areas from
where returns indicate an elevated object above the same height
threshold. Figs. 2(c)-(d) show the two generated masks for a test
scene.
3.2 Initial Buildings
Initial buildings are the black areas in the primary building mask
as shown in Fig. 2(c). Three steps are followed to obtain these
black regions. Firstly, lines around the black shapes in M p are
formed. Secondly, the lines are adjusted and extended. Finally,
rectangular shapes are obtained using these lines.
Edges are first extracted from M p using an edge detector and
short edges are removed assuming that the minimum building
length or width is 3m. Comers are then detected on each curve
using the fast comer detector in (Awrangjeb et al., 2009). On each
edge, all the pixels between two corners or a corner and an end
point, or two endpoints when enough comers are not available,
are considered as separate line segments. In order to properly
align the detected line segments with the building edges, a least-
squares straight-line fitting technique is applied. With each line
segment a point P,„ is recorded. This ‘inside-point’ indicates on
which side of the line the building is recorded. In order to avoid