189
Figure 1. The overlay of TM’s false color composite imagery
(RGB543) and DEM
3. THE PRINCIPLE OF CART
In fact, CART is one kind of data surveying and forecasting
algorithm (Breiman L, 1984), it not only can deal with the
highly tilt and many states numerical value,but also can deal
with homothetic attribute data in order or out of order. The
CART algorithm adopts the technology of the dimidiate
recurrent division, it always divides the current sample into two
son-samples, this makes each non-leaf nodes that has two
branches.The benefit of this algorithm is that it can take a
portion as the training data, and the other one as the checkout
data.lt leads into an "adjustable mistake rate " in process, it
means that all leaf nodes of one branche joined a punishment
factor.If that branch is still able to keep low mistake rate, then
keep it,otherwise, give it up.The ultimate analysis result is an
optimum binary decision tree which takes complicated degree
and mistake rate into account, all approachs that equinoctial
points define are corresponding to a most conditional class.
So,the decision tree that CART creates is a concise binary
decision tree.
The particular description of CART is as following:
/* T represents the current sample collection,
T _ attribute list represents the current candidated attribute
collection */
Function cartformtree (T)
{
establishe root node N ;
assign classes for N ;
If T all belong to the same class OR only on sample left in
T
Then return N as leaf node and assign a class for it;
For attribute in each T _ attributelist
carry out a division for that attribute, calculate Gini
index of that division ;
the testing attribute of N equals to the attribute which has
minimal Gini index among T _ attributelist;
divide T into two son-collections , 7^ :
transfer cartformtree (7])
transfer cartformtree (T 2 )
}
CART has the following merits: limpid structure, easily
understand,simply realize,quick speed, high accuracy; Can deal
with a large amount of data and the nonlinearity relation.The
data put in can be continuation variable also can be a discrete
value; Contains the default and error of a data; Can give out the
significance of the testing variable [5 , 6]. In the process of
CART decision tree growth, it adopts Ginilndex which is
always used in economics field to be the criterion testing
variable and segmentation rule.The mathematics definition of
Ginilndex is as following:
J
Ginilndex = x ~Hp 2 U\ h )
Where, p(^j\h^j is the probability when some one testing
variable h belongs to the j class, Kj{h) is the sample
number when some one testing variable h belongs to the j
class, «(/?) is the sample number when some one testing
variable belongs to j , J is the number of class in training
sample collection.
4. CLASSIFYING BASED ON CART
4.1 The selection of training sample
The selection of training sample was the essential step in the
study, which directly related the rule quality gained. There still
weren’t standard classification system for mining areas based
on the remote sensing image, this article referred to the land use
and land cover classification system.
In order to study the overall situation of the mining land
resource, considering the actual situation, the trial area land
types was classified into seven classes according to the image
interpretation ability.The land types included Water body,
Paddy field, Arid land, Building area, Road,Vegetation and
Subsidence land.
In order to enable the training sample to reflect each kind of
land type in the spatial distributed characteristic, this article
used random delamination sampling method according to the
space coordinates.Carried the sampling on the trial area
referring to TM and 1:50,000 scale topographic map.
4.2 Determining the testing variable
The spectral response characteristic most direct affects the
ground fetures identification ability of multispectral remote
sensing image, and it is also the most important interpretation
element.Each ground feature has the unique spectrum reflection
and the radiation characteristic as a result of the different
material composition and the structure, this reflects on the
image is that each ground feature in various wave bands has
different grey level. But because of the complexity of the
ingredient and structure of ground features, as well as the
influence of the remote sensing sensor and the atmospheric
environment, the optical spectrum feature of the ground features
present the multiple complex changes.Therefore, in order to
make full use of the TM data to carry on the information