and a 20% false alarm rate using HOG features. Future work
will explore different combinations of feature, SVM parameters
and kernel functions to maximise the correct detection rate while
minimising false alarms.
2.5 Range image based features
Each pixel of the high resolution intensity images has co-registered
with it a range value. This allows a range map of the intensity im-
age to be calculated which is used as an additional feature in the
learning process described in section 2.4. The range map is also
used to segment the intensity image as it is expected that objects
of interest (e.g. street furniture) are located within a certain dis-
tance from the camera (the position of the camera is known a
priori). The range map is thresholded to create a mask which is
combined with the saliency response map created by any of the
other methods to further reduce the area of the image to be passed
to the next stage of processing. Figure 5 displays how the range
map is combined with an edge-based saliency map to produce a
final segmentation of the image.
(a) Range map
(b) Mask applied to original image
Figure 5: Range based segmentation using simple thresholding
on range map.
3 OBJECT DETECTION
The aim of an object detection system is to identify the cate-
gory/class and location in an image of one or more objects of
interest. The solution requires that the system internally repre-
sents models of the categories of object to be identified so that it
can compare these models to locations in previously unseen im-
ages in order to identify when and if an instance of that model
(an object) is present. Ideally one model should enable recogni-
tion of all such objects in a category, and be robust to the great
variation of objects possible within a given category, as well as
the great variation in how these objects may appear in an im-
age (different viewpoints, different scales, varying lighting con-
ditions). This means that the system must minimise the false neg-
ative detection rate. In addition, each model must be distinctive
enough to preclude the possibility of confusing an instance of one
model/class for another (or the null class representing no object).
This is equivalent to minimising the false positive detection rate.
The presence of range data with mobile mapping data should the-
oretically allow for more accurate object identification because
of the extra information available. In this paper, range informa-
tion has been used to help segment the image into regions more
likely to contain high-level features of interest. This range infor-
mation can be further used to calculate geometric properties of
the images and their content.
3.1 Object geometry
The identification of object edges, lines and corners can be used
to infer the presence of straight lines or other geometric shapes
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
in the image using feature extractors such as the Hough trans-
form (Duda and Hart, 1972). If found together in non-random
configurations, line features may be combined to form perceptual
groups (Lowe, 1985). Once an object has been detected, such
perceptual groups become doubly useful for the problem of ob-
ject pose estimation. Figure 6 shows extraction of line informa-
tion from a 3-D Earthmine image. These lines are first detected
in the 2-D intensity image using a probabilistic version of the
Hough transform to find line segments. Each point along a de-
tected line is then queried against the co-registered range data.
A line found in the 2-D image is rejected if the range along its
length does not scale linearly. To allow for noise in the range
information, a parameter specifies the degree of allowed range
variation along the length of the line. The range points are fit
to the 2-D lines using standard linear regression and end-points
for the lines determined. It is possible to discriminate between
edge type lines and intensity based lines by querying the linearity
(in range) of short lines orthogonal to and crossing the detected
line. Though providing quite a coarse estimation of scene geom-
(a) Hough lines in original intensity (b) Line segments projected into
image space via linear regression in range
Figure 6: Line segments detected via Hough transform projected
into 3-D space using linear regression in range.
etry, these lines can later be used when comparing the geometric
model of a learned class with detected objects to better approxi-
mate their locations in space.
3.2 Modelling Schema
Many objects (such as people or animals) are highly articulated
and any model of their appearance or geometry must be able to
cater for the wide range of pose variation intrinsic to these types
of object. Non-natural objects often have fewer individually mov-
able components and there is far less variation in how adjacent
parts of the same object appear in relation to one another.
Methods based on pictorial structures (Felzenszwalb and Hutten-
locher, 2005) and deformable parts-based models (Felzenszwalb
et al., 2010) have demonstrated success in their ability to detect
objects even when viewed from an unusual viewpoint, or when
their parts are obscured due to occlusion with other scene ele-
ments or their location at the edge of an image. A model is a
hierarchy of parts where a single part is the child of a root part
having features computed at half the resolution of the parts. The
placement of each of the parts in the model is conditionally in-
dependent of its sibling parts given its root. Figure 7 shows an
example of a deformable parts model for the side and front view
of a car using HOG features.
3.3 Detection Method
Modelling of independent object parts using HOG features has
been used in this paper to detect cars in intensity images using
variant of the approach described by Felzenszwalb et al, (2010).
Figur
tures
The i
mid «
each
objec
result
is rep
pyrar
detec
the re
best «
matc|
over
using
learn
ing d
root |
and à
vidu:
for e;
origi!
ing b
34
Figui
tensi
to de
rithn
parts
à par
resul
This
tecti
of d;
task
tion.
data
requ
salie
over
imag
ture:
Whic
fort
vect
inter
Cros
canc
ject
dete
the