ROOT-LEVEL
€ e © CONCEPT-LEVEL
i d] 1 e e e INSTANCE-LEVEL
1. le ly. 1a
Figure 1: The three-level tree structure of SX-
WEB.
representing SHALLOW WATER (SW). The largest
mean value for BLUE is 149 and is found in the
concept class BARREN 2 (B2). To determine
whether BLUE is an attribute predictive of
class membership, the standard deviation of the
attribute BLUE in the root node (sd-20.63) is
used to normalize the mean scores of all
concept-level children. That is, each pair of
mean values for the chosen attribute are
subtracted from one another. The absolute value
of each subtraction is then divided by the
standard deviation of the attribute found
within the root node. These standardized
difference values are summed and divided by the
number of paired attribute computations that
have been made. This gives an average
standardized mean difference value for the
attribute relative to the root node. If this
average difference is larger than a user
specified threshold value, the attribute is
considered to be predictive. Making this
computation for the attribute BLUE results in
a value of approximately 1.075. If the
predictiveness threshold is set at 1.0, BLUE is
then determined to be a predictive attribute.
Figure 3 shows the predictiveness scores for
the attributes found within the 155 instance
training set.
Finally, family resemblance scores are stored
within the root node and each concept-level
node. Family resemblance scores computed from
the 155 instance training set are shown in
Figure 2 in the final column, labeled FR.
Family resemblance scores form the basis of
SX-WEB's evaluation function by giving a
measure of the overall similarity of the
exemplars making up individual concept classes.
The concept class with the lowest family
resemblance score is Shrub Swamp (SS). As we
will see in Section 2.3, those concept classes
containing highly similar instances will have
lower family resemblance scores.
2.2 Computing the similarity of two exemplars
SX-WEB uses two formulas to compute similarity.
One formula is used when attributes are nominal
or mixed and a second similarity measure is
used when attributes are strictly real-valued.
Once again, we will limit our discussion to
real-valued exemplar similarity.
To compute the similarity between exemplars E,
and E,, the absolute value of the difference
between each attribute value in E, and its
corresponding attribute value in E, is divided
by the standard deviation of the attribute
found in the concept-level node being
considered for instance classification. These
standardized differences are summed over all
attributes. Finally, the sum of the standar-
dized differences are divided by the number of
attributes giving an average standardized
difference value among the attributes of E, and
E,. Notice that similarity scores closer to
zero mean greater similarity between two
exemplars.
Figure 2: Standard deviations, means, and family
resemblance scores for the 155-instance training
set.
652
2.3 Classification and the family resemblance
principle
When presented with a set of training
instances, SX-WEB builds a three level tree
structure. SX-WEB uses this tree structure
together with its evaluation function to
classify newly presented instances into one of
the concept-level classes. When learning is not
incremental, once an unknown instance is
classified, it is discarded. In an incremental
learning mode, the new instance becomes part of
the classification tree. We now examine
SX-WEB's evaluation function.
SX-WEB's evaluation function is based on the
family resemblance principle (Cantor, 1979)
which states that:
Most prototypical members of a
concept class share many features in
common with members of their own
class and few features in common with
members of other closely related
categories;
From a classification point of view, this
principle implies that new instances to be
classified should be placed in the category
class that will result in a best overall family
resemblance value as a result of instance
inclusion. Based on this, we used a method
proposed in (Tversky, 1977) for computing class
family resemblance. Specifically:
FR(C) = 2/(N*(N-1)) * X Sim(a,b)
where C is the concept class whose family
resemblance score is being computed, N is the
total number of exemplars contained in concept
class C, and Y Sim(a,b) represents the sum
total of all computed similarity scores between
the class exemplars. In other words, to find
the family resemblance score for concept class
C, the similarity of each exemplar to all other
exemplars in the class is summed. This sum is
then divided by the total number of similarity
computations made, giving an average similarity
value for the class. Along these same lines,
typicality is defined as the average similarity
of one class exemplar to all other members of
the class, or:
| m tm
TUOHDMHUNU5SHKHOQUH-S