ES USING
remote sensing.
sider context in
f class label at
The parameters
el (DSM) and a
e classification.
cluded by static
reference data.
ising results if a
classification of
tep of a 3D
RF; Geman &
d classification.
context in the
the statistical
hbouring image
MRF and their
le-overlap aerial
el (DSM) that is
ore robust with
)bjects in a 2D
ude information
letector into the
'ossroads of the
'hotogrammetry,
DS
listic models of
man & Geman,
of N image sites
yy), where $ is
te class label x;
this context, an
or to an image
that assume the
el x; at that site.
be statistically
dependent on the class labels of its neighbouring image sites.
As a consequence, the individual sites can no longer be labelled
independently from each other. Collecting the class labels x; in a
vector X= (x; xX, ..., xt, we want to find the label
configuration x* that maximises the posterior probability of the
labels given the data p(x | y), thus x* — arg max, p(x | y). The
posterior probability p(x|y) can be modelled by a Gibbs
distribution (Geman & Geman, 1984):
= [En 2 x Zo) (D
ieS ieS jeN;
In Eq. 1, Z is a normalization constant called the partition
function, and N; is the neighbourhood of data site i (thus, j is a
neighbouring data site of i). The association potential q; links
the class label x; of image site ; to the data y; observed at that
site, whereas the pairwise interaction potential w;; models the
dependencies between the labels x; and x; of neighbouring sites i
and j. The model is very general in terms of the definition of the
functional model for both @; and yj. Our definitions of the
image sites and the neighbourhood N; (thus, the structure of the
graphical model) and the potential functions ¢; and y; used in
our application are described in Section 3.
3. METHOD
The goal of our method is the classification of scenes containing
crossroads. The primary input consists of multiple aerial images
and their orientation data. We require at least fourfold overlap
of each crossroads from two different image strips in order to
avoid occlusions as far as possible. In this paper, the images are
assumed to be colour infrared (CIR) images, though the
methodology can be transferred to other spectral configurations
by adapting the definition of the features to be used for
classification. In a preprocessing stage, these multiple images
are used to derive a DSM by dense matching. After that, the
DSM is used to generate a true orthophoto from each input
image. As each of these orthophotos will contain void areas due
to occlusions, they are all combined to a joint true orthophoto
with only few occluded areas left. In this process, we take
advantage of the multiple views to also eliminate moving cars.
The DSM and the combined orthophoto are the input to the
MRF-based classifier. In the classification process, we choose
the image sites and, thus, the nodes of the graphical model, to
correspond to small squares of n x n pixels of the joint true
orthophoto. The neighbourhood N; of an image site i in Eq. 1
(which defines the edges of the graphical model) is chosen to
consist of the four direct neighbours of site i in the image grid.
We defined 14 classes that are characteristic for scenes
containing crossroads both in an urban and in a rural setting,
including road, building, grass, tree, car, but also sidewalk,
traffic island, and sealed, the latter corresponding to off-road
areas covered by asphalt, e.g. parking lots. Some of these
classes have a very similar appearance in the data and are
characterised by their relative spatial arrangement; however, it
is possible to generate a new set of classes by combining some
of the original ones, e.g. by merging all classes covered by
asphalt (road, sidewalk, traffic island, sealed).
From the orthophoto and the DSM we extract the feature
vectors. We use three groups of features, namely image-based
features, DSM features, and a specific feature that is used to
characterize cars; the use of the latter feature is optional. In a
training phase we use images that were labelled manually to
determine the parameters of the association and interaction
potentials in Eq. 1. Training the parameters of the interaction
potentials requires fully labelled images. Once the parameters
have been determined, the classification of new test images can
be carried out by maximising the posterior probability in Eq. 1
using the trained model.
The individual components of our method, in particular pre-
processing, the definition of the potentials, the definition of the
features and the methods used for training and inference are
described in more detail in the subsequent sections.
3.1 Preprocessing
The first step of preprocessing is the generation of a DSM from
the input images. We use the OpenCV implementation
(OpenCV, 2012) of semiglobal matching (Hirschmiiller, 2008)
with the cost function of (Birchfield & Tomasi, 1998) to
generate a disparity image for each possible pair of images. For
each disparity image thus created, a DSM grid is generated in
object space. Due to occlusions and matching errors, these raw
DSMs will contain void areas, and there will also be height
discrepancies, e.g. at roof overhangs. These raw DSMs are
combined to a joint DSM by taking the median of the valid raw
DSM heights at each position. Remaining void areas (e.g.
caused by problems of the dense matcher in homogeneous
image regions) are filled by an in-painting algorithm based on
non-linear diffusion that is sensitive to height changes. In this
process, we distinguish between void areas where the heights
are to be interpolated from their surroundings (largely caused by
matching errors) and areas where the heights are to be
determined from the lowest surrounding areas (largely caused
by occlusion) in a way similar to (Hirschmiiller, 2008).
The DSM is the basis for the generation of a true orthophoto
from each of the original input images. Ray tracing is used to
determine visibility in this process. The resulting raw
orthophotos will have void areas caused by occlusion. Finally,
these raw orthophotos are merged to a combined orthophoto.
For each pixel of the combined orthophoto, the median of all
valid colour vectors (i.e. the colour vectors from all raw
orthophotos where the respective pixel is not marked as being
void) is chosen. Due to the fact that we require at least four-fold
overlap, this will result in an elimination of moving cars on the
streets, which improves the prospects of automatic classification
of road surfaces (Fig. 1).
Figure 1: Detail of a test site. Left: DSM; centre: raw true
orthophoto with void areas in black; right: combined
true orthophoto.
3.2 Association potential
The association potential @;(x,y; in Eq. 1 is related to the
probability of observing the image data y; at data site i € S
given. that. label x; takes, a value c €, C. by