Full text: Technical Commission III (B3)

   
   
  
    
ıbination of fea- 
of such objects 
1g a training set 
catures (positive 
vel features not 
termine the best 
nbined to satisfy 
invariant covari- 
nage is a plot of 
intensity values 
hape of the his- 
‘the image (or a 
arrow histogram 
| skewed toward 
lal histogram (or 
ply the presence 
are statistically 
y distribution of 
features encode 
for the image. À 
image will have 
riance, and low 
ositive when the 
sitive side), and 
is out to the left 
nber of different 
tribution is con- 
intensity levels. 
ed to encode the 
in the image are 
levels. Complex 
o vary inversely 
lures 
as a measure of 
dges. Edge fea- 
mple differential 
> Sobel (Duda et 
Cross (Roberts, 
ec Canny (Canny, 
, 1980) edge de- 
s allowing edge 
see figure 1(a)). 
reshold and thin- 
letectors such as 
popular because 
of viewpoint in- 
the features de- 
s and so cannot 
ypoint detectors 
that detect fewer 
T (Lowe, 2004), 
6). 
spects of the hu- 
ss that analyses 
containing more 
  
(a) Canny (b) Harris 
Figure 1: Edge and corner keypoint detectors. 
“interesting” pixels. The images in figure 2 show high responses 
for regions with many edges representing busyness in the images 
or changes in intensity or frequency components of the image. 
(a) Frequency-tuned (Achanta et (b) Maximal Symmetric Surround 
al., 2009) (Achanta and Süsstrunk, 2010) 
Figure 2: Examples of saliency detectors. 
Segmentation is the process of partitioning the image into mul- 
tiple segments. Edge based saliency maps are used to segment 
the images into interesting and non-interesting regions by simple 
thresholding. Figure 3 demonstrates how this procedure drasti- 
cally reduces the area of the image expected to contain meaning- 
ful information about the objects of interest. 
(a) Edge based saliency map (b) Mask applied to image showing 
(Rosin, 2009) interesting region. 
Figure 3: Saliency based segmentation using simple thresholding 
on saliency map. 
The methods described do not require any kind of offline pre- 
processing to use, however they are also weak at detecting salient 
image regions while maintaining a low false alarm rate for higher- 
level features of interest. Learning a model of saliency offline is 
à more promising method for detecting salient image regions. 
24 Learning based features 
In order to detect salient regions of an image, a model of saliency 
can be learned for comparison against new images from train- 
ing data. The Support Vector Machine (SVM) is a method of 
  
  
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia 
supervised machine learning based on the theory of statistical 
learning (Cortes and Vapnik, 1995). The theory behind the SVM 
guarantees that any N dimensional feature space is linearly sep- 
arable in N + M dimensions (where M is not excluded from 
being possibly infinite). The SVM finds a separating hyperplane 
(in N + M dimensional space) between two classes of training 
data (the positive and the negative examples). The placement of 
this hyperplane is such that the distances between the hyperplane 
and the closest training instances (the support vectors) on either 
side of the plane are maximised. Since noise in the training data 
cannot be avoided, the SVM is extended to incorporate a “soft- 
margin” around the hyperplane to allow training instance outliers 
to sit on the wrong side of the hyperplane. The complete learning 
algorithm seeks to maximise the distance of the support vectors to 
the hyperplane, while minimising the distances from the separat- 
ing hyperplane of training instances found to be on the wrong 
side of the separating hyperplane. Finally, since training data 
isn’t always linearly separable in the provided N dimensional 
feature space, a kernel function can be used to place the data 
into a higher dimensionality feature space to increase the like- 
lihood that a separating hyperplane with a good fit to the data can 
be found. The kernel function may be a high degree polynomial 
(or worse) on the training data, but this does not incur any extra 
processing overhead since the training data only ever appears as 
a dot product of vectors inside the kernel function. SVMs have 
demonstrated excellent performance in a number of similar stud- 
ies (Felzenszwalb et al., 2010), (Dalal and Triggs, 2005), (Lin et 
al., 2011) concerning object detection. 
Bounding boxes are positioned around examples of the objects to 
be identified in a set of training images (see figure 4). Features 
for these bounded regions are calculated and then concatenated 
as N-dimensional feature vectors (where N is the number of fea- 
tures used) to generate a set of positive training examples. The 
same number of negative feature vectors are randomly generated 
(from image regions that do not contain the objects of interest). 
The positive and negative examples are passed to an SVM for 
training and five-fold cross-validation is performed, varying the 
parameters to the kernel functions of the SVM to identify an op- 
timal model without overfitting to the training data (linear and 
radial basis functions are evaluated for their performance dur- 
ing cross-validation). The generated model represents a weight 
vector which is multiplied (as the scalar product) with a feature 
vector calculated from a new (previously unseen) image region to 
determine whether the image region is salient or not. The feature 
measurements explored in our approach consist of: Histogram of 
Orientations (over whole image sub-regions), edge density, Har- 
ris keypoint density, FAST keypoint density, mean depth of the 
range image (in the image sub-region), standard deviation of the 
intensity histogram, skew of the intensity histogram, energy of 
the intensity histogram, and entropy of the intensity histogram. 
    
  
Figure 4: Training images displaying positive training instances 
(yellow) and negative instances (red). 
Initial results are promising with a 85% correct identification rate 
  
  
   
  
  
  
   
   
  
  
  
   
   
  
   
   
   
   
   
   
  
  
   
   
   
   
   
    
    
   
  
    
    
  
     
   
   
   
  
   
  
   
  
  
   
  
   
  
   
    
   
   
  
  
   
  
   
   
   
   
   
  
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.