Full text: Technical Commission III (B3)

and image operators such as the Histogram of Oriented Gradi- 
ents (HOG) (Dalal and Triggs, 2005) that encapsulates changes 
in the magnitude and orientation of contrast over a grid of small 
image patches. HOG features have shown excellent performance 
in their ability to recognise a range of different object types in- 
cluding natural objects as well as more artificial objects (Dalal 
and Triggs, 2005), (Felzenszwalb et al., 2010), (Schroff et al., 
2008). 
The success of such appearance based feature detection meth- 
ods for intensity images led to the development of similar ap- 
pearance based features for range images. Spin images (Johnson 
and Hebert, 1999) use 2-D histograms rotated around a reference 
point in space. Splash features (Stein and Medioni, 1992) are 
similar to HOG features in that they collect a distribution of sur- 
face normal orientations around a reference point. NARF (Nor- 
mal Aligned Radial Feature) features (Steder et al., 2010) detect 
stable surface regions combined with large depth changes in ob- 
ject borders. The feature is designed to be stable across differ- 
ent viewpoints. Tripod operators (Pipitone and Adams, 1993) 
compactly encode surface shape information of objects by taking 
surface range measurements at the three corners of an equilat- 
eral triangle. Other range based descriptors include surface patch 
representations (Chen and Bhanu, 2007), surface normal based 
signatures (Li and Guskov, 2007), and tensor-based descriptors 
(Mian et al., 2006). However, for the most part, there is still little 
evidence that any of these range image based features are signif- 
icantly better than any others for specific object detection tasks. 
Recent work has combined intensity based features with range to 
first segment images into planar range regions before using this 
information to guide the object detection process with intensity 
based features (Rapus et al., 2008), (Wei et al., 2011). 
This paper reports upon a number of low-level feature extraction 
methods for their usefulness in describing salient image regions 
containing higher-level features/objects. The features of inter- 
est include those based on the “bag of words” concept in which 
many low-level features are used together to model the character- 
istics of an image region in order to measure the saliency relative 
to the whole image (section 2). These generate response maps 
indicating regions of interest in the image. Feature extraction 
methods that encode a greater amount of spatial and geometric 
information from range and intensity image regions are discussed 
in the context of their use in parts-based models for higher-level 
object detection (section 3). Extraction of rudimentary line seg- 
ment information from the 3-D images for use in detecting and 
modelling/matching object geometry is also discussed. The pa- 
per concludes with a summary of future work and known issues 
to be addressed (section 4). 
2 OBJECT SALIENCY 
Given the large amount of data to be processed, it is necessary to 
first extract candidate regions with greater likelihood of contain- 
ing higher-level features of interest. For a given object detection 
task, such as finding all bus shelters, a saliency detection method 
is required to return all approximate locations of bus shelters in 
the data. A consequence of this is a high false alarm rate. Subse- 
quent processing is then more efficient because the total remain- 
ing amount of data is substantially reduced. 
Some low-level features are more suited than others for discrim- 
ination between high-level features of interest. It is recognised 
that it is not possible to find a combination of one or more fea- 
tures that will detect all high-level features of interest. The se- 
lection of low-level features must be task driven; the objects to 
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia 
    
  
  
   
   
  
  
   
   
  
  
  
  
  
   
  
  
   
  
  
  
  
  
  
   
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
   
  
  
   
  
  
  
  
   
  
   
   
   
   
  
   
   
     
be detected must first be specified so that the combination of fea- 
tures that is most appropriate for the matching of such objects 
can be used. Machine learning approaches using a training set 
of manually chosen instances of the high-level features (positive 
examples) as well as instances of other high-level features not 
of the required class (negative examples) will determine the best 
low-level features to use and how they can be combined to satisfy 
the task. 
2.1 Statistical based features 
The statistical based features capture the scale invariant covari- 
ance of object structure. The histogram of an image is a plot of 
the number of pixels for each grey level value (or intensity values 
of a colour channel for colour images). The shape of the his- 
togram provides information about the nature of the image (or a 
sub-region of the image). For example, a very narrow histogram 
implies a low contrast image, while a histogram skewed toward 
the high end implies a bright image, and a bi-modal histogram (or 
a histogram with multiple strong peaks) can imply the presence 
of one or more objects. 
The histogram features considered in this paper are statistically 
based in that the histogram models the probability distribution of 
intensity levels in the image. These statistical features encode 
characteristics of the intensity level distribution for the image. A 
bright image will have a high mean and a dark image will have 
a low mean. High contrast regions have high variance, and low 
contrast images have low variance. The skew is positive when the 
tail of the histogram spreads out to the right (positive side), and 
is negative when the tail of the histogram spreads out to the left 
(negative side). High energy means that the number of different 
intensity levels in the region is low i.e., the distribution is con- 
centrated over only a small number of different intensity levels. 
Entropy is a measure of the number of bits required to encode the 
region data. Entropy increases as the pixel values in the image are 
distributed among a larger number of intensity levels. Complex 
regions have higher entropy and entropy tends to vary inversely 
with energy. 
2.2 Localised keypoint, edge and corner features 
Rosin (2009) argues for the density of edges as a measure of 
salience because interesting objects have more edges. Edge fea- 
tures have been very popular and range from simple differential 
measures of adjacent pixel contrasts such as the Sobel (Duda et 
al., 1973), Prewitt (Prewitt, 1970), and Robert's Cross (Roberts, 
1963) operators to complex operators such as the Canny (Canny, 
1986) and the Marr-Hildreth (Marr and Hildreth, 1980) edge de- 
tectors. Canny produces single pixel wide edges allowing edge 
linking, and exhibits good robustness to noise (see figure 1(a)). 
The simpler operators such as Sobel require a threshold and thin- 
ning to obtain single pixel wide edges. Corner detectors such as 
Harris (Harris and Stephens, 1988) have been popular because 
they produce features expressing a high degree of viewpoint in- 
variance (see figure 1(b)). However, many of the features de- 
tected by the Canny operator are false corners and so cannot 
be semantically interpreted. More recently, keypoint detectors 
with stronger robustness to viewpoint invariance that detect fewer 
false features have been proposed such as SIFT (Lowe, 2004), 
SURF (Bay et al., 2006) and FAST (Rosten, 2006). 
2.3 Saliency based features 
Saliency based features take inspiration from aspects of the hu- 
man visual system. This is a task driven process that analyses 
global image features to identify image regions containing more 
  
  
“inte 
for re 
or ch 
  
Segr 
tiple 
the i 
thres 
cally 
ful ir 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.