-B3, 2012
he Hough trans-
r in non-random
form perceptual
n detected, such
> problem of ob-
of line informa-
are first detected
c version of the
oint along a de-
ered range data,
> range along its
ise in the range
Xf allowed range
ge points are fit
n and end-points
iminate between
ying the linearity
sing the detected
1 of scene geom-
its projected into
egression in range
nsform projected
ng the geometric
o better approxi-
ighly articulated
; must be able to
sic to these types
ndividually mov-
in how adjacent
e another.
walb and Hutten-
]s (Felzenszwalb
r ability to detect
'Wpoint, or when
other scene ele-
e. A model is a
ild of a root part
of the parts. The
conditionally in-
igure 7 shows an
le and front view
IOG features has
ty images using à
/alb et al, (2010).
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
ER FEE
A MEM RE tod
3
x
AN
dio Y
EE
po ASK m
PONAM ANC,
Figure 7: Example of deformable parts model using HOG fea-
tures (Felzenszwalb et al., 2010)
The intensity image for testing is first scaled into an image pyra-
mid of several different resolutions of HOG feature maps. At
each scale of the feature pyramid, the root filter for a model of the
object of interest is cross correlated with the feature map. This
results in the generation of a response map for the root filter. This
is repeated for each of the child parts using the feature map in the
pyramid calculated at twice the resolution of the root filter. The
detection process is performed independently for each part and
the response maps for each part are transformed according to the
best detection(s) of the root. Groupings of the detected parts that
match learned anchor part positions in the car model are favoured
over part configurations more distant from the learned anchors
using a deformation cost function (the parameters of which are
learned based on the observed variability of the parts in the train-
ing data). This produces an overall response map for complete
root and part detections. The largest responses are thresholded
and a bounding box calculated as the convex hull of a car's indi-
vidual part detections. Finally, the scale of the bounding boxes
for each detected object are rescaled and translated to match the
original image dimensions. This places the part and root bound-
ing boxes in the correct locations for the original image.
34 Results
Figure 8 shows results of detections of cars in the Earthmine in-
tensity images. Thresholds were chosen manually in these results
to determine the few best detections. The bounding box algo-
rithm as used simply computes the convex hull of the object's
parts. Better methods that consider the amount of deformation of
a part as a factor to scale the position of the bounding box should
result in more accurate object localisations.
4 CONCLUSIONS AND FUTURE WORK
This study has addressed two different stages in the feature de-
tection process. In the initial stages when the relative proportion
of data is high, accuracy is traded for speed in a coarse grained
task driven approach to saliency detection and image segmenta-
tion. In the second stage, when the relative proportion of the
data is lower, speed is traded for the more detailed processing
required for the detection of particular objects. The proposed
saliency detection method uses simple feature detectors working
over the whole image in a sliding window approach to identify
image sub-regions that are more likely to contain high-level fea-
tures of interest. The output from the first stage is a response map
which can be thresholded to identify image sub-region candidates
for the second stage of processing that uses more complex feature
vectors incorporating HOG style features derived from both the
Intensity and range imagery. The second stage detection process
cross correlates these feature vectors with the image sub-region
candidates provided from the first stage to identify promising ob-
Ject locations. A final stage (not discussed in this paper) will
detect the pose of the detected objects by comparing the parts of
the object with detected geometric features in the 3-D image.
Figure 8: Sample car detections in the Earthmine intensity im-
ages. Blue boxes denote individual part detections, while yellow
boxes denote detection of whole object instances. Note the erro-
neous double detection of the car on the left of the image in the
bottom right example.
41 Future Work
One of the biggest factors determining speed of detection is the
requirement to evaluate all possible scales of an object in the in-
tensity image. The addition of range information removes this
need and the object's size can be learned along with model pa-
rameters.
Prior knowledge of how the data were collected and frequency
and occurrence of low-level features extracted from the images
in an offline processing step can be incorporated within a proba-
bilistic framework to help guide the search for higher level (more
complex) objects of interest. This can be considered a context
dependent extension of our existing approach to detecting object
saliency.
Finally, in future work, the effectiveness of the extended saliency
and high-level feature detection methods will be tested against a
larger set of 3-D images in order to assess the broader viability of
the methods for object detection.
ACKNOWLEDGEMENTS
This work is supported by the Cooperative Research Centre for
Spatial Information, whose activities are funded by the Australian
Commonwealth’s Cooperative Research Centres Programme. It
provides PhD scholarships for Michael Borck and Richard Palmer
and partially funds Prof. Geoff West's position. The authors
would like to thank John Ristevski and Anthony Fassero from
Earthmine for making available the dataset used in this work.
REFERENCES
Achanta, R. and Siisstrunk, S., 2010. Saliency detection using
maximum symmetric surround. In: Proceedings of the 17th IEEE
International Conference on Image Processing, IEEE, pp. 2653-
2656.