a class description. Clustering techniques range from opti-
mization methods with a given number of classes, over hier-
archical issues resulting in binary classification trees called
dendrograms, to clumping techniques which give clusterings
in which object classes may overlap.
All learning systems try to generate possible solutions by
grouping object features (possibly using additional back-
ground knowledge). This generation of the groupings in gen-
eral is a search process which requests for suitable heuristics.
3.1 Numerical learning
This approach is chosen, when the numeric data is given in
terms of feature- or attribute-vectors. These features span
an n-dimensional space. The task of a classification pro-
cess is to divide this feature space into several regions, the
pattern classes [Niemann 1981]. Learning is reduced to an
estimation of the unknown parameters, which link the ob-
servable features. For this task, a broad range of techniques
is available.
In unsupervised numerical approaches (numerical taxon-
omy), grouping of the objects is performed on the basis
of similarity. This similarity measure is the value of a nu-
meric function applied to two objects (e.g. Euclidian dis-
tance). Thus objects which are most similar are grouped
together, objects which are least similar are distinguished
and form different object classes. Similarity measures can
be both context-free (depending only on the attribute val-
ues) or context-sensitive, where the similarity is depending
also on the value range of the attributes.
Neural Networks also deal with numerical data. Basically
there are input and output patterns which are linked through
one ore more layers by sets of weights. The learning task is
to adjust these weights [Rumelhard and McClelland 1986].
3.2 Structural approaches
Conceptual clustering is a extension of numerical taxonomy
to symbolic data. Conceptual clustering techniques typically
do not only consider the objects’ features and the context,
but furthermore possible concepts or restrictions on the ob-
jects involved - so-called background knowledge - which may
be used to describe the objects. Thus the similarity of two
objects strongly depends on the quality of the concepts that
describe the objects and their relations and possible limita-
tions.
A classical concept to acquire models from examples was
presented by Winston [1975]. His ARCH-program first de-
rives a structural representation of the example objects in
terms of a semantic network. In the following step the class
descriptions of the different examples are constructed: the
first positive example is hypothesized as the model, while
the following positive and negative examples serve to correct
or restrict the current model. In contrast to a complete ob-
ject description, a model is generated, which is sufficient to
distinguish an object from other objects in the knowledge
base. Winston relies on noise free data; the resulting model
is also represented in a semantic network.
A successor of Winstons program was developed by Connell
and Brady [1985]. Their system is capable to handle real
858
world data, namely real images. Their model generation
strategy states, that if two objects are the same, then the
differences between them should be irrelevant and can be
deleted. The program expects only positive examples for an
object class. Although handling with noise, it cannot treat
outliers: these form a new object class.
Wong and You [1985] present a program starting with ex-
amples in an attributed graph. In an estimation process,
the relevant attributes and the relevant relations, along with
the corresponding probabilities are gained. This structure,
which represents the model, is called a so-called random
graph.
Segen [1988] developed a program to learn descriptions for
2D-objects from examples. In contrast to most other learn-
ing techniques, his program is not based on a fixed set of
attributes and relations, but is able to generate its own de-
scriptors. The program starts with the objects (given as ob-
ject contours) and calculates points with curvature maxima,
which are denoted descriptors of the first level. Then it gen-
erates new descriptors by successively grouping the features
of the previous level. The resulting hierarchical graph is an-
alyzed statistically. The system can treat noisy data. The
drawback is that the resulting representation bases on the
descriptors chosen, which are not interpretable by humans.
Also Stier [1991] starts from the idea, that object represen-
tations relying on a predefined set of descriptors can only be
as expressive as the descriptors are. Therefore he argues for
a system that is capable of deriving its own appropriate de-
scriptors. He presents a learning technique which starts from
elementary, general knowledge about objects, represented in
logical assertions. In the evaluation phase, a general object
is hypothesized and the rules from the knowledge base are
applied to it. The examples serve to verify (accepted or re-
jected) these hypothesized new rules. The evaluation is per-
formed in an exhaustive search manner, where the matching
criterion is the exact fit with the examples. He demonstrates
his strategy on simple examples. Starting with elementary
knowledge about polygons (parallelity, straightness, ...) he
learns the concepts of special types of polygons (triangles,
squares, rectangles).
All structural learning techniques presented have in common,
that they rely on a hierarchical representation of the objects.
In a first step, the structural description of the examples is
derived. Then this description is generalized to a model. For
both steps various strategies may be used. These however
depend strongly on the data and the task, and no general
technique is available up to now.
4 DATA REPRESENTATION
WITH FORMAL
GRAMMARS
Thinking of patterns in terms of sentences makes it possible
to apply techniques from formal language theory to pattern
recognition [Fu 1982]. With the help of Formal Grammars
the knowledge about the structure of observations is repre-
sented in symbolic form. A grammar consists of the tuple
G = (S,Vn, Vr, P)