International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B7, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
RANDOM FORESTS-BASED FEATURE SELECTION
FOR LAND-USE CLASSIFICATION USING LIDAR DATA AND ORTHOIMAGERY
Haiyan Guan*, Jun Yu, Jonathan Li*** , Lun Luo*
aGeoSTARS Lab, Department of Geography and Environmental Management, University of Waterloo, 200 University
Ave. West, Waterloo, ON, Canada N2L 3G1
bGeoSTARS Group, School of Information Science and Engineering, Xiamen University, 422 Siming Road South,
Xiamen, Fujian, China 361005
CChina Transport Telecommunication & information Center, Beijing, China
KEY WORDS: Lidar, imagery, Random Forests, Classification, Feature selection
ABSTRACT:
The development of lidar system, especially incorporated with high-resolution camera components, has shown great potential for
urban classification. However, how to automatically select the best features for land-use classification is challenging. Random
Forests, a newly developed machine learning algorithm, is receiving considerable attention in the field of image classification and
pattern recognition. Especially, it can provide the measure of variable importance. Thus, in this study the performance of the Random
Forests-based feature selection for urban areas was explored. First, we extract features from lidar data, including height-based,
intensity-based GLCM measures; other spectral features can be obtained from imagery, such as Red, Blue and Green three bands,
and GLCM-based measures. Finally, Random Forests is used to automatically select the optimal and uncorrelated features for land-
use classification. 0.5-meter resolution lidar data and aerial imagery are used to assess the feature selection performance of Random
Forests in the study area located in Mannheim, Germany. The results clearly demonstrate that the use of Random Forests-based
feature selection can improve the classification performance by the selected features.
1. INTRODUCTION
Urban land cover classification has always been critical due to
its ability to link many elements of human and physical
environments. Timely, accurate, and detailed knowledge of the
urban land cover information derived from remote sensing data
is increasingly required among a wide variety of communities.
This surge of interest has been predominately driven by the
recent innovations in data, technologies, and theories in urban
remote sensing. During the past decades, increasing advances in
lidar technologies provide high-accuracy and point-density 3-
dimensional point clouds for land-use classification in
combination with imagery. As lidar data is unstructured,
irregular 3-D points and short of spectral information,
classification confusion is often generated between man-made
and natural objects. On the other hand, it is difficult to directly
obtain land-use information only from remotely sensed data,
owing to the complexity of landscapes, spectrally identical
objects, as well as abundance of spatial and spectral information.
Therefore, integrating lidar point clouds with imagery is being a
preferred means for land-use classification.
Although a plethora of features that can be extracted from both
lidar point clouds and optical imagery, there is no rule or model
for how to automatically and objectively select proper features
for the desired classification results. Majority of existing
research works are focusing on the development of
classification methods, few attentions are paid on the feature
selection using lidar data and imagery. The subjective selection
of classification features causes the classification results
unstable. To this end, Random Forests-based feature selection is
proposed in this study.
*junli@uwaterloo.ca, phonel 519-888-4567, ext. 34504
Random Forests, one of ensemble classification family that are
trained and their results combined through a voting process, can
be considered as an improved version of bagging, a widely used
ensemble classifier (Breiman, 1996). It is well known that
Random Forests are characterised by notably computational
efficiency. In the field of remote sensing, Random Forests has
been achieved a promising classification accuracy for hyper-
spectral (Wang et al., 2009), multispectral (Stumpf and Kerle,
2011), and multisource data (Gislason et al., 2006). Due to
classification complexity of multisource data, commonly used
parametrical classification methods are impropriate. Random
Forests, as nonparametric classification algorithm, should be of
great interest for multisource data by providing an estimate of
individual variable importance index. Moreover, several studies
have shown the advantages of Random Forests in land cover
classification; the results indicate that the selected features agree
the existing physiological knowledge. However, few is focus on
urban areas by fusion of lidar data and aerial images. To this end,
RF is applied to feature selection in this study.
This paper is organized as follows. In section 2, we describe the
basic principles of Random Forests, the lidar data and calibrated
imagery used in the paper, features selected from the lidar data
and imagery, respectively. Section 3 then discusses variable
importance, one of the Random Forests’ measures, for all
features, Random Forests-based feature selection and the
corresponding classification results by Maximum Likelihood
Classifier (MLC). Finally Section 4 concludes the proposed
method.