of data. Based on binary trees, number of split-
ting patterns of categories at each node is the
same as in the design of binary tree. As small
additional costs are needed compared with bi-
nary tree, computations of design and classifica-
tion are more effective than general multibranch
trees.
(S3) As for data segmentation boundary in the feature
space, ‘undetermined node’ containing samples
nearby the boundary is introduced in triplet tree.
The label of undetermined node is the same as its
parent node. Decision of category is suspended
about undetermined node and other divisions are
tried to classify in the following steps.
As the decision tree approach considering ambigu-
ity of categories, Wang proposed a binary tree design
using rejection strategy (Wang, 19862).
l. Categories are divided in two subgroups using
group distance.
2. Using Bhattacharyya distance, distance between
each category and two subgroup is calculated.
If the distance above is smaller than a thresh-
old value Do decided in advance, this category is
decided to be ‘rejected’ at this node.
3. Rejected categories are inherited to two children
nodes.
Decision tree by Wang’s method is binary tree, so
computing efficiency is excellent. One defect of this
method is the selection of threshold value Dg. A
heuristic is used in design procedure.
General Bayesian classifiers with rejection area in
discrimination is able to extended with ‘determined
area’ and ‘undetermined area’. Compared with these
approach, proposed triplet tree classifier has advan-
tages as a tree classifier such as efficiency in compu-
tation at multistep operation and ability to get hier-
archical view of characteristics of both categories and
uncertainly-classified data part. Moreover, proposed
design method behaves well even when normality of
data distribution is rejected.
4. ALGORITHMS
4.1 Node division constraints
An important point of the design algorithm of this
triplet tree is how to control creation of undetermined
nodes. Four constraints are applied in the proposed
method, considering proper tree size and informative
nodes according to training samples. The effect of
constraints (a),(b), and (d) are illustrated in Fig.4.
(a) The number of division of undetermined node
containing M categories are limited to M — 1
times.
990
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
(b) At the division of an undetermined node, if one
or two determined nodes with single category
is created, its child undetermined node becomes
terminal node.
(c) An undetermined node with less samples than 1%
of the whole training samples become a terminal
node.
(d) In case training samples can be partitioned by
one boundary, only determined child nodes are
created.
CI
e) w
(b)
Fig.4 Example of the effect of constraints (a),(b), and (d)
Some undetermined nodes become terminal nodes
in this method. These nodes mean indistinct part in
the multidimensional data. The characteristic of each
ambiguity is shown by the location in the decision
tree.
4.2 Design procedure
STEP1: All types of binary division of categories are
compared using group distance, and categories
for two determined nodes and variables are se-
lected.
STEP2: Two boundaries in selected variables are se-
lected and one node is segmented two determined
nodes and one determined node.
1. Error tolerance parameter p(%) in deter-
mined node is choiced at the beginning.
When sample numbers for two groups of
categories are N, and N, respectively, error
classified samples for two determined nodes
are limited to |N1*p/100| and | N2xp/100 |
respectively.
2. Two determined nodes can be made by one
boundary, no undetermined node is created.
3. If two determined node cannot be created.
one determined node and one undetermined
node are tried to create.
4. If no determined node is created, this node
becomes terminated as undetermined node.
S |
a
~~ un ns