XXII ISPRS Congress 2012: Technical Commission III

   
ES USING 
remote sensing. 
sider context in 
f class label at 
The parameters 
el (DSM) and a 
e classification. 
cluded by static 
reference data. 
ising results if a 
classification of 
tep of a 3D 
RF; Geman & 
d classification. 
context in the 
the statistical 
hbouring image 
MRF and their 
le-overlap aerial 
el (DSM) that is 
ore robust with 
)bjects in a 2D 
ude information 
letector into the 
'ossroads of the 
'hotogrammetry, 
DS 
listic models of 
man & Geman, 
of N image sites 
yy), where $ is 
te class label x; 
this context, an 
or to an image 
that assume the 
el x; at that site. 
be statistically 
   
  
    
  
   
  
    
    
   
   
   
   
   
    
   
    
    
   
  
  
   
  
  
   
   
   
   
    
   
   
   
    
     
   
    
   
    
    
   
   
    
     
   
  
  
  
  
  
   
   
   
   
    
   
   
   
  
    
  
      
dependent on the class labels of its neighbouring image sites. 
As a consequence, the individual sites can no longer be labelled 
independently from each other. Collecting the class labels x; in a 
vector X= (x; xX, ..., xt, we want to find the label 
configuration x* that maximises the posterior probability of the 
labels given the data p(x | y), thus x* — arg max, p(x | y). The 
posterior probability p(x|y) can be modelled by a Gibbs 
distribution (Geman & Geman, 1984): 
= [En 2 x Zo) (D 
ieS ieS jeN; 
In Eq. 1, Z is a normalization constant called the partition 
function, and N; is the neighbourhood of data site i (thus, j is a 
neighbouring data site of i). The association potential q; links 
the class label x; of image site ; to the data y; observed at that 
site, whereas the pairwise interaction potential w;; models the 
dependencies between the labels x; and x; of neighbouring sites i 
and j. The model is very general in terms of the definition of the 
functional model for both @; and yj. Our definitions of the 
image sites and the neighbourhood N; (thus, the structure of the 
graphical model) and the potential functions ¢; and y; used in 
our application are described in Section 3. 
3. METHOD 
The goal of our method is the classification of scenes containing 
crossroads. The primary input consists of multiple aerial images 
and their orientation data. We require at least fourfold overlap 
of each crossroads from two different image strips in order to 
avoid occlusions as far as possible. In this paper, the images are 
assumed to be colour infrared (CIR) images, though the 
methodology can be transferred to other spectral configurations 
by adapting the definition of the features to be used for 
classification. In a preprocessing stage, these multiple images 
are used to derive a DSM by dense matching. After that, the 
DSM is used to generate a true orthophoto from each input 
image. As each of these orthophotos will contain void areas due 
to occlusions, they are all combined to a joint true orthophoto 
with only few occluded areas left. In this process, we take 
advantage of the multiple views to also eliminate moving cars. 
The DSM and the combined orthophoto are the input to the 
MRF-based classifier. In the classification process, we choose 
the image sites and, thus, the nodes of the graphical model, to 
correspond to small squares of n x n pixels of the joint true 
orthophoto. The neighbourhood N; of an image site i in Eq. 1 
(which defines the edges of the graphical model) is chosen to 
consist of the four direct neighbours of site i in the image grid. 
We defined 14 classes that are characteristic for scenes 
containing crossroads both in an urban and in a rural setting, 
including road, building, grass, tree, car, but also sidewalk, 
traffic island, and sealed, the latter corresponding to off-road 
areas covered by asphalt, e.g. parking lots. Some of these 
classes have a very similar appearance in the data and are 
characterised by their relative spatial arrangement; however, it 
is possible to generate a new set of classes by combining some 
of the original ones, e.g. by merging all classes covered by 
asphalt (road, sidewalk, traffic island, sealed). 
From the orthophoto and the DSM we extract the feature 
vectors. We use three groups of features, namely image-based 
features, DSM features, and a specific feature that is used to 
characterize cars; the use of the latter feature is optional. In a 
training phase we use images that were labelled manually to 
determine the parameters of the association and interaction 
potentials in Eq. 1. Training the parameters of the interaction 
potentials requires fully labelled images. Once the parameters 
have been determined, the classification of new test images can 
be carried out by maximising the posterior probability in Eq. 1 
using the trained model. 
The individual components of our method, in particular pre- 
processing, the definition of the potentials, the definition of the 
features and the methods used for training and inference are 
described in more detail in the subsequent sections. 
3.1 Preprocessing 
The first step of preprocessing is the generation of a DSM from 
the input images. We use the OpenCV implementation 
(OpenCV, 2012) of semiglobal matching (Hirschmiiller, 2008) 
with the cost function of (Birchfield & Tomasi, 1998) to 
generate a disparity image for each possible pair of images. For 
each disparity image thus created, a DSM grid is generated in 
object space. Due to occlusions and matching errors, these raw 
DSMs will contain void areas, and there will also be height 
discrepancies, e.g. at roof overhangs. These raw DSMs are 
combined to a joint DSM by taking the median of the valid raw 
DSM heights at each position. Remaining void areas (e.g. 
caused by problems of the dense matcher in homogeneous 
image regions) are filled by an in-painting algorithm based on 
non-linear diffusion that is sensitive to height changes. In this 
process, we distinguish between void areas where the heights 
are to be interpolated from their surroundings (largely caused by 
matching errors) and areas where the heights are to be 
determined from the lowest surrounding areas (largely caused 
by occlusion) in a way similar to (Hirschmiiller, 2008). 
The DSM is the basis for the generation of a true orthophoto 
from each of the original input images. Ray tracing is used to 
determine visibility in this process. The resulting raw 
orthophotos will have void areas caused by occlusion. Finally, 
these raw orthophotos are merged to a combined orthophoto. 
For each pixel of the combined orthophoto, the median of all 
valid colour vectors (i.e. the colour vectors from all raw 
orthophotos where the respective pixel is not marked as being 
void) is chosen. Due to the fact that we require at least four-fold 
overlap, this will result in an elimination of moving cars on the 
streets, which improves the prospects of automatic classification 
of road surfaces (Fig. 1). 
    
Figure 1: Detail of a test site. Left: DSM; centre: raw true 
orthophoto with void areas in black; right: combined 
true orthophoto. 
3.2 Association potential 
The association potential @;(x,y; in Eq. 1 is related to the 
probability of observing the image data y; at data site i € S 
given. that. label x; takes, a value c €, C. by
1
2
...
476
477
478
479
480
...
586
587
Full text: Technical Commission III (B3)

Access restriction

Copyright

Note to user