In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Voi. XXXVIII, Part 3/W4 — Paris, France, 3-4 September, 2009
attributes can be used, e.g. mean values, standard deviations,
texture and shape of the segments. The method automatically
selects the most useful ones for classification. In the Marseille
area, the criteria selected in the tree included only the NDVI. In
the Lyngby area, NDVI and a shape attribute were selected. The
third stage consists of a post-processing step that analyses the
size and neighborhood of building segments and corrects their
class accordingly. Building detection results in a building label
image which is used for the comparison in our test.
Olsen and Knudsen, 2005: The input of the method is given by
a DSM, CIR orthophotos and a raster version of the outdated
database. The method starts with the generation of a DTM,
estimated from the DSM through appropriate morphological
procedures, a nDSM and an Object Above Terrain (OAT) mask.
This is followed by a two-step classification that aims at
distinguishing building from no building objects. This
classification is based on criteria that best characterise buildings
(especially in terms of size and form) and results in the building
label image that is used for the evaluation in this study. The last
stage is the actual change detection step, in which the
classification outcomes is compared to the initial database in
order to extract a preliminary set of potential changes (on a per-
pixel basis) that is then post-processed in order to keep only the
objects that are assumed to have changed.
Rottensteiner, 2008: This method requires a DSM as the
minimum input. Additionally it can use an NDVI image, height
differences between the first and the last laser pulse, and the
existing database, available either in raster or vector format. The
workflow of the method starts with the generation of a coarse
DTM by hierarchical morphological filtering, which is used to
obtain a nDSM. Along with the other input data, the nDSM is
used in a Dempster-Shafer fusion process carried out on a per-
pixel basis to distinguish four object classes: buildings, trees,
grass land, and bare soil. Connected components of building
pixels are then grouped to constitute initial building regions and
a second Dempster-Shafer fusion process is performed on a per-
region basis to eliminate remaining trees. Finally, there is the
actual change detection step, in which the detected buildings are
compared to the existing map, which produces a change map
that describes the change status of buildings, both on a per-pixel
and a per-building level. Additionally, a label image
corresponding to the new state of the data base is generated. In
spite of the thematic accuracy of the change map produced by
this method, it was decided to use this building label image for
the evaluation in this test. 4
4. EVALUATION AND DISCUSSION
In our opinion, the effectiveness of a change detection system is
related to its capacity to guide the operator’s attention only to
objects that have changed so that unchanged buildings do not
need to be investigated unnecessarily. These considerations
result in the evaluation criteria used in this paper to analyze the
change detection performance. On the one hand, to support the
generation of a map that is really up-to-date, i.e. to be effective
qualitatively, the completeness of the system for buildings
classified as demolished and the correctness for unchanged
buildings are required to be high. The completeness of new
buildings also has to be high if the operator is assumed not to
look for any new building except for those which are suggested
by the system. (Note that this also holds true for modified
buildings, a case not considered in this study because the
simulated changes only consisted in new and demolished
buildings). On the other hand, to reduce the amount of manual
work required by the operator i.e. to be effective economically,
the correctness of the changes highlighted by the system and the
completeness of unchanged buildings must be high. However, if
a low completeness of unchanged buildings implies that many
buildings are checked uselessly, this is not necessarily critical
for the application itself, because the updated database is still
correct. Moreover, the economical efficiency that could then
appear to be low has to be put into perspective according to the
size of the building database to update. For instance, if a change
detection system reports 60% of a national database as changed,
we cannot necessarily conclude about the inefficiency of this
system because it still means that 40% of the buildings need not
be checked, which amounts to millions of buildings.
4.1 Overall Analysis
Figure 1 presents the evaluation of the results achieved by the
methods that processed the Lingby test area (LIDAR context).
Table 1 gives the per-building completeness and correctness,
obtained for each test area and each approach. The 7), parameter
(cf. Section 2.) was set to 0.20 for the Marseille and Lyngby test
areas and 0.26 for the Toulouse test area. In Table 1, the values
in bold indicate for which methods the best results are achieved.
The completeness of detected changes is high for all the
methods, especially in the aerial (Marseille) and LIDAR
(Lyngby) contexts. By contrast, the correctness observed in our
experiments is relatively poor, which indicates that there are
many FP changes reported by the systems. In this respect, only
the results obtained in the Lyngby test area with (Rottensteiner,
2008) seem to achieve a relatively acceptable standard.
Approach
Completeness
Correctness
Marseille (Imagery - Aerial context)
(Champion, 2007)
94.1%
45.1%
(Matikainen et ah, 2007)
98.8%
54.3%
(Rottensteiner, 2008)
95.1%
59.1%
Toulouse (Imagery - Satellite context)
(Champion, 2007)
78.9%
54.5%
(Rottensteiner, 2008)
84.2%
47.1%
Lyngby (LIDAR context)
(Matikainen et ah, 2007)
94.3%
48.8%
(Olsen and Knudsen, 2005)
95.7%
53.6%
(Rottensteiner, 2008)
91.4%
76.1%
Table 1. Completeness and Correctness achieved by the four
algorithms for the three datasets.
To take the analysis further, we also determined the quality
measures separately for unchanged, demolished and new
buildings. They are presented in Tables 2 (Marseille), 3
(Lyngby) and 4 (Toulouse), respectively. Focusing on the
Marseille test area first, it can be seen in Table 2 that all
algorithms are effective in detecting the actual changes. Thus,
(Matikainen et al., 2007) and (Rottensteiner, 2008) achieve a
completeness of 100% for demolished buildings. The
correctness for unchanged buildings is also 100%. The few
(11.1%) demolished buildings missed by (Champion, 2007) are
caused by extracted primitives that are erroneously used in the
verification procedure. All three methods also feature a high
completeness for new buildings. Here, (Matikainen et ah, 2007)
performs best, with only 2.4% of the new buildings missed. The
main limitation of this context appears to be the poor
correctness rate achieved for demolished buildings, which