Proceedings, XXth congress: Proceedings, XXth congress

altan, m. orhan
  
   
   
   
   
    
   
  
  
  
   
  
  
   
  
   
  
  
   
   
   
  
  
   
   
   
    
   
   
   
   
   
   
   
  
  
  
   
  
   
  
  
  
  
  
  
  
  
  
  
  
    
   
  
  
   
    
    
   
   
  
   
  
  
3. Istanbul 2004 
vely averaging 
ach resolution 
Despite having 
t well suited to 
| not produce a 
is sensitive to 
vercome these 
can be applied 
apageorgiou ef. 
1 conventional 
1e width of the 
] this distance 
1sity transform, 
re 2(c)). This 
basis functions 
  
let compressed 
  
ions for 
and 
  
eee 
after 
ly new tool for 
based on the 
nik, 1995) and 
? bound on the 
n rather than 
not subject to 
h many neural 
S. 
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004 
SVMs work by finding a separating hyperplane between two 
classes. In a binary classification problem, there could be many 
hyperplanes that separate the data. As shown in figure 3, the 
optimal hyperplane occurs when the margin between the 
classes is maximised. In addition, only a subset of the data 
points will be critical in defining the hyperplane. These points 
are the support vectors. 
Separating Hyperplanes 
Support Vectors 
   
Margin 
  
Figure 3. (a) Possible separating hyperplanes; (b) 
Optimal separating hyperplane 
Another attractive property of the SVM is that its decision 
surface depends only on the inner product of the feature 
vectors. As a result, the inner product can be replaced by any 
symmetric positive-definite kernel (Cristianini & Shawe- 
Taylor, 2000). The use of a kernel function means that the 
mapping of the data into a higher dimensional feature space 
does not need to be determined as part of the solution, enabling 
the use of high dimensional space for the learning task without 
needing to address the mathematical complexity of such spaces. 
This offers the prospect of being able to separate data in high 
dimensional feature space and find classifications that were not 
possible in simple, lower dimensional spaces (Figure 4). 
  
Figure 4. Mapping data into a higher feature space can 
make the data easier to separate 
3. TEST DATA 
Several datasets exist in the public domain for use in building 
extraction. Two of the most commonly used are the Avenches 
dataset (Henricsson, 1996) and the Fort Hood dataset (1999). 
Another dataset from the town of Zurich Hoengg was added to 
the public domain for the Ascona 2001 workshop (Hoengg 
dataset, 2001). The small number of buildings in these images 
makes these datasets unsuitable as the basis for research using a 
learning machine like SVM. As the learning machine is trained 
by example, a large number of examples of each object class 
must be presented to the learning machine to ensure valid 
learning. These public domain datasets simply do not contain 
enough data for this purpose. 
To generate sufficient training data, a new database of images 
was created for the purposes of this research. Several large- 
scale aerial photographs of the city of Ballarat, in central 
Victoria, were available for this task. The images were acquired 
at a scale of 1: 4000, originally for the purpose of asset 
mapping within the Ballarat city centre. As such, they were 
taken in a standard stereo-mapping configuration, with a near 
vertical orientation and a 60% forward overlap. 
Three images from this set were scanned from colour 
diapositives on a Zeiss Photoscan™ | at a resolution of 15 
microns. The resultant ground sample distance for the images 
was 6 cm. This compares well to a ground sample distance of 
7.5cm for the Avenches dataset and 7 cm for the Zurich 
Hoengg data. The Zeiss scanner produces a separate file for 
each colour band of the image (red-green-blue (RGB)). These 
files are produced in a proprietary format and were converted 
into uncompressed three-colour 24-bit Tagged Image Format 
File (TIFF) files for ease of use with other systems. 
3.1 Image patches 
In order to train the classifier and test whether effective class 
discrimination was possible, the classification problem was 
simplified by producing discrete image patches of a regular 
size. Each patch was 256 pixels by 256 pixels and contained 
either a single building or non-building piece of the image. The 
recognition problem was simplified further by limiting the 
building patches to those containing single, detached residential 
houses, where the extent of the house fitted completely within 
the 256 x 256 pixel area. This may seem extremely restrictive 
but the problem of building extraction has proven to be very 
difficult and a generalised solution appears unlikely at this 
stage. In a classification approach, it is likely that there will be 
a class for each category or type of building i.e. residential 
detached, residential semi-detached, commercial, industrial and 
so on. As this area appears largely unexplored, the scope of the 
classification was limited to a very specific case to increase the 
chances of success. 
The aerial image TIFF files were used to create a collection of 
image patches, where each patch was stored in a separate TIFF 
file. As the area is predominantly urban residential in character, 
many of the non-building image patches contained residential 
street detail, usually kerb and channel bitumen roadways 
(Figure 5). 
4. INITIAL TESTS 
Initial classification tests were based on a balanced test set of 
100 building images and 100 non-building images. Image 
coefficients were extracted using the quadruple sampled 
wavelet process described earlier. A public domain support 
vector machine, SVM" (Joachims, 1998) was used to classify 
the image patches into building or non-building categories. 
Results of these tests have been reported previously (Bellman 
& Shortis, 2002) and showed that although the classification 
had a predicted success rate of 73%, the actual success on a 
small independent test set was only 40%. As the success rate is 
strongly dependent on the sample size, the low rate of detection 
is most probably due to the small size of the training set. 
Although preprocessing is done using the wavelet transform, 
there are many variables that can influence the preprocessing 
stage. In studies by others, some attempt had been made to 
identify the optimum set of parameters but it was found that
1
2
...
352
353
354
355
356
...
1192
1193
Full text: Proceedings, XXth congress (Part 3)

Access restriction

Copyright

Note to user