Full text: Proceedings, XXth congress (Part 7)

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B7. Istanbul 2004 
  
2. CLASSIFICATION AND REGRESSION TREES 
(CART) 
As discussed in Bittencourt and Clarke (2003b), binary decision 
trees for classification can be viewed as a non-parametric 
approach to pattern recognition. A decision tree provides a 
hierarchical representation of the feature space in which 
patterns x; are allocated to classes w; (j=1,2,...,k) according to 
the result obtained by following decisions made at a sequence 
of nodes at which branches of the tree diverge. The type of 
decision tree used in this paper is discussed in detail by 
Breiman et al. (1984), whose contributions have been 
summarized by the letters CART (Classification And 
Regression Trees). These letters indicate that trees may be used 
not only to classify entities into a discrete number of groups, 
but also as an alternative approach to regression analysis in 
which the value of a response (dependent) variable is to be 
estimated, given the value of each variable in a set of 
explanatory (independent) variables. 
Binary decision trees consist of repeated divisions of a feature 
space into two sub-spaces, with the terminal nodes associated 
with the classes w;. A desirable decision tree is one having a 
relatively small number of branches, a relatively small number 
of intermediate nodes from which these branches diverge, and 
high predictive power, in which entities are correctly classified 
at the terminal nodes. 
2.1 How Does CART Operate? 
CART involves the identification and construction of a binary 
decision tree using a sample of training data for which the 
correct classification is known. The numbers of entities in the 
two sub-groups defined at each binary split, corresponding to 
the two branches emerging from each intermediate node, 
become successively smaller, so that a reasonably large training 
sample is required if good results are to be obtained 
(McLachlan , 1992). 
The decision tree begins with a root node ; derived from 
whichever variable in the feature space minimizes a measure of 
the impurity of the two sibling nodes. The measure of the 
impurity or entropy at node /, denoted by i(/), is as shown in the 
following equation : 
Á 
ity 22 pw; | t)log pOw; |f) (1) 
j=l 
where p(w;| t ) is the proportion of patterns x; allocated to class 
w; at node t. Each non-terminal node is then divided into two 
further nodes, /, and /;, such that p; , pa are the proportions of 
entities passed to the new nodes /;, f; respectively. The best 
division is that which maximizes the difference given in: 
Ai(s,t)=i(t)- p,i(t,)- Pity) 
The decision tree grows by means of the successive sub- 
divisions until a stage is reached in which there is no significant 
decrease in the measure of impurity when a further additional 
67 
division s is implemented. When this stage is reached, the node 
t is not sub-divided further, and automatically becomes a 
terminal node. The class w; associated with the terminal node / 
is that which maximizes the conditional probability p(w;| f ). 
2.2 CART applied to Feature Selection 
The decision tree generated by CART uses only the bands that 
help to separate the classes, while the others are not considered. 
In this paper we use the tree as a feature selection method to 
reduce dimensionality. As an illustration, an image with only 
six spectral bands was classified using a decision tree; only two 
of the six points were identified by the CART procedure as 
necessary to separate three classes, as shown in Figure 1. 
  
Band 1 < 50 ; AZ x 
VH Tom ^ 
/ A 
p X 
s 
RE. 
Band5«80 — A72] 
= ^ 
A \ 
I ^ 
b 
Wa Wa 
  
  
  
Figure 1. Example of decision tree using synthetic data 
Figure 2 shows the subdivision of the feature space determined 
by the decision tree, where the point representing the patterns. 
As the tree used only two spectral bands, is possible to present 
the subdivision of feature space in the plane. 
  
  
  
  
  
  
  
tn 
m 
1401 Hs n ns 
Jn 
1204 o n of on 
nu 
: Ba oo” 
n 
100 4 en u un 2 © H 
g^ gp n fF 
ca ou 
ae 
Wj P 
Clazs 
40 4 
9 Class w3 
) W3 
e 20% * Class wi 
= 
à u U Claes ed 
n 0 za mn ua 5 
Band 1 
Figure 2. Subdivisions of the feature space determined by the 
decision tree (synthetic data) 
This example is over-simplified because the solution can be 
presented as a two-dimensional scatter-plot. However 
hyperspectral images classifications generally require solutions 
in space of many dimensions. Another way to show the decision 
trees' results is the following: 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.