. Assessing the
- intensity data.
metry, Remote
3), 259-262.
. morphological
aled, geometric
28(4), pp. 626-
;, Jie, Z.., 2000.
| for urban land
Proceedings of
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
INTENSITY AND RANGE IMAGE BASED FEATURES FOR OBJECT DETECTION IN
MOBILE MAPPING DATA
Richard Palmer’, Michael Borck!, Geoff West! and Tele Tan?
! Department of Spatial Sciences, Department of Computing
Curtin University, GPO Box U1987, Perth 6845, Western Australia
{r.palmer, michael.borck } @postgrad.curtin.edu.au, {g.west, t.tan} @curtin.edu.au
Cooperative Research Centre for Spatial Information
Commission I11/4
KEY WORDS: low-level features, image processing, point clouds, mobile and terrestrial mapping, 3-D features, 2-D features
ABSTRACT:
Mobile mapping is used for asset management, change detection, surveying and dimensional analysis. There is a great desire to
automate these processes given the very large amounts of data, especially when 3-D point cloud data is combined with co-registered
imagery - termed “3-D images”. One approach requires low-level feature extraction from the images and point cloud data followed
by pattern recognition and machine learning techniques to recognise the various high level features (or objects) in the images. This
paper covers low-level feature analysis and investigates a number of different feature extraction methods for their usefulness. The
features of interest include those based on the “bag of words” concept in which many low-level features are used e.g. histograms of
gradients, as well as those describing the saliency (how unusual a region of the image is). These mainly image based features have
been adapted to deal with 3-D images. The performance of the various features are discussed for typical mobile mapping scenarios and
recommendations made as to the best features to use.
1 INTRODUCTION
Laser scanning is currently the averred method for the collec-
tion of surveying/mapping data but increasingly this is being aug-
mented by 2-D imaging cameras. Co-registration of 2-D colour
intensity maps collected from standard cameras with range mea-
surements collected by laser scanners results in the creation of
3-D images; 2-D images with every pixel having an associated
range value. Recently, mapping systems based on stereoscopic
imaging techniques have been used to produce similar 3-D im-
ages at the expense of reduced accuracy in range. The increasing
use of mobile mapping systems based around such technology
is resulting in the creation of very large amounts of data; mobile
mapping systems operating along roads in urban centres typically
collect full 360 degree panoramas every five or ten metres along
the vehicle track. These datasets are very useful for a range of
content analysis applications, but the speed of analysis is severely
limited by the amount of costly and impractical manual process-
ing needed to identify interesting features. There is a great need
to improve upon the automated detection of content that is of in-
terest to the user, so that a large proportion of time is not wasted
looking through irrelevant data.
Processing data for the automatic identification of features or ob-
jects of interest is a core focus of computer vision research. Re-
search has focussed on the analysis of very large cohorts of im-
ages because many people and organisations produce and share
Images and these must often be indexed and organised accord-
Ing to content. Websites such as Flickr (http://www.flicker.com)
and Picasa (http://picasa.google.com), and the need to search the
Web for images having specific content means many millions of
Images must be processed. Mobile mapping imagery requires
similar processing to discover content for use in application ar-
cas such as asset management, change detection, surveying and
dimensionality analysis. Mobile mapping data is distinct from
regular 2-D imagery because of the availability of co-registered
range information. This extra modality presents an interesting av-
enue for research because it offers the possibility of significantly
increasing the speed and accuracy of existing 2-D image based
feature detection methods.
Research into object detection has produced a large number of
novel approaches to feature detection. The performance of fea-
tures extracted from imagery is evaluated for a particular object
detection task. This requires a task driven approach to the evalu-
ation of features by first identifying the type of object in the im-
agery to be detected, before determining how accurate the object
detection system that uses these features is in detecting the ob-
jects. Typically this requires much imagery with ground-truthed
bounds defined around the objects to be detected. While there
exists much intensity imagery (e.g. the PASCAL Visual Object
Classes Challenge (Everingham et al., 2010)), there are no similar
commonly available 3-D or range image datasets.
For the purposes of this research, a dataset from Earthmine was
used consisting of a sequence of panoramas taken approximately
every ten metres along the road within the Perth CBD, Western
Australia. Each panorama consists of eight images projected onto
the inside of a cube centred on the imaging camera array mounted
on the mapping vehicle. Within the high resolution images, each
colour pixel has co-registered against it the real world latitude,
longitude and elevation at that point. 3-D images can be gener-
ated specifying the colour and range of each pixel in the image.
Range image data has been used for object representation and to
establish correspondences between an object’s geometric model
(e.g. derived from a generic CAD model of the object) and the
object’s representation in the range imagery (Arman et al., 1993),
(Lavva et al., 2008), (Steder et al., 2009). However, due to the
complexity and slowness of matching spatial models in range
imagery, and the wide availability of intensity imagery, research
has favoured extracting the appearance of an object to encode
its discriminative qualities. In intensity images, keypoints or in-
terest points have been proposed such as Harris keypoints (Har-
ris and Stephens, 1988), SIFT (Lowe, 2004), SURF (Bay et al.,
2006), and FAST (Rosten, 2006); blob detectors such as Max-
imally Stable Extremal Regions (MSER) (Matas et al., 2002),