Full text: Technical Commission III (B3)

   
  
   
  
  
  
  
  
  
  
  
  
  
  
  
  
  
   
  
  
   
  
  
  
   
   
   
   
   
  
   
  
   
   
   
    
     
    
   
   
   
    
  
  
   
    
   
   
   
   
   
     
     
    
      
   
  
  
  
  
   
   
      
-B3, 2012 
he Hough trans- 
r in non-random 
form perceptual 
n detected, such 
> problem of ob- 
of line informa- 
are first detected 
c version of the 
oint along a de- 
ered range data, 
> range along its 
ise in the range 
Xf allowed range 
ge points are fit 
n and end-points 
iminate between 
ying the linearity 
sing the detected 
1 of scene geom- 
  
its projected into 
egression in range 
nsform projected 
ng the geometric 
o better approxi- 
ighly articulated 
; must be able to 
sic to these types 
ndividually mov- 
in how adjacent 
e another. 
walb and Hutten- 
]s (Felzenszwalb 
r ability to detect 
'Wpoint, or when 
other scene ele- 
e. A model is a 
ild of a root part 
of the parts. The 
conditionally in- 
igure 7 shows an 
le and front view 
IOG features has 
ty images using à 
/alb et al, (2010). 
  
  
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia 
ER FEE 
A MEM RE tod 
3 
x 
AN 
dio Y 
EE 
po ASK m 
PONAM ANC, 
  
Figure 7: Example of deformable parts model using HOG fea- 
tures (Felzenszwalb et al., 2010) 
The intensity image for testing is first scaled into an image pyra- 
mid of several different resolutions of HOG feature maps. At 
each scale of the feature pyramid, the root filter for a model of the 
object of interest is cross correlated with the feature map. This 
results in the generation of a response map for the root filter. This 
is repeated for each of the child parts using the feature map in the 
pyramid calculated at twice the resolution of the root filter. The 
detection process is performed independently for each part and 
the response maps for each part are transformed according to the 
best detection(s) of the root. Groupings of the detected parts that 
match learned anchor part positions in the car model are favoured 
over part configurations more distant from the learned anchors 
using a deformation cost function (the parameters of which are 
learned based on the observed variability of the parts in the train- 
ing data). This produces an overall response map for complete 
root and part detections. The largest responses are thresholded 
and a bounding box calculated as the convex hull of a car's indi- 
vidual part detections. Finally, the scale of the bounding boxes 
for each detected object are rescaled and translated to match the 
original image dimensions. This places the part and root bound- 
ing boxes in the correct locations for the original image. 
34 Results 
Figure 8 shows results of detections of cars in the Earthmine in- 
tensity images. Thresholds were chosen manually in these results 
to determine the few best detections. The bounding box algo- 
rithm as used simply computes the convex hull of the object's 
parts. Better methods that consider the amount of deformation of 
a part as a factor to scale the position of the bounding box should 
result in more accurate object localisations. 
4 CONCLUSIONS AND FUTURE WORK 
This study has addressed two different stages in the feature de- 
tection process. In the initial stages when the relative proportion 
of data is high, accuracy is traded for speed in a coarse grained 
task driven approach to saliency detection and image segmenta- 
tion. In the second stage, when the relative proportion of the 
data is lower, speed is traded for the more detailed processing 
required for the detection of particular objects. The proposed 
saliency detection method uses simple feature detectors working 
over the whole image in a sliding window approach to identify 
image sub-regions that are more likely to contain high-level fea- 
tures of interest. The output from the first stage is a response map 
which can be thresholded to identify image sub-region candidates 
for the second stage of processing that uses more complex feature 
vectors incorporating HOG style features derived from both the 
Intensity and range imagery. The second stage detection process 
cross correlates these feature vectors with the image sub-region 
candidates provided from the first stage to identify promising ob- 
Ject locations. A final stage (not discussed in this paper) will 
detect the pose of the detected objects by comparing the parts of 
the object with detected geometric features in the 3-D image. 
  
Figure 8: Sample car detections in the Earthmine intensity im- 
ages. Blue boxes denote individual part detections, while yellow 
boxes denote detection of whole object instances. Note the erro- 
neous double detection of the car on the left of the image in the 
bottom right example. 
41 Future Work 
One of the biggest factors determining speed of detection is the 
requirement to evaluate all possible scales of an object in the in- 
tensity image. The addition of range information removes this 
need and the object's size can be learned along with model pa- 
rameters. 
Prior knowledge of how the data were collected and frequency 
and occurrence of low-level features extracted from the images 
in an offline processing step can be incorporated within a proba- 
bilistic framework to help guide the search for higher level (more 
complex) objects of interest. This can be considered a context 
dependent extension of our existing approach to detecting object 
saliency. 
Finally, in future work, the effectiveness of the extended saliency 
and high-level feature detection methods will be tested against a 
larger set of 3-D images in order to assess the broader viability of 
the methods for object detection. 
ACKNOWLEDGEMENTS 
This work is supported by the Cooperative Research Centre for 
Spatial Information, whose activities are funded by the Australian 
Commonwealth’s Cooperative Research Centres Programme. It 
provides PhD scholarships for Michael Borck and Richard Palmer 
and partially funds Prof. Geoff West's position. The authors 
would like to thank John Ristevski and Anthony Fassero from 
Earthmine for making available the dataset used in this work. 
REFERENCES 
Achanta, R. and Siisstrunk, S., 2010. Saliency detection using 
maximum symmetric surround. In: Proceedings of the 17th IEEE 
International Conference on Image Processing, IEEE, pp. 2653- 
2656.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.