International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004
A number of pattern recognition applications shows some
distortion (due to noise). Also, patterns of a certain class are
more frequent than others inside the same class. Quite the same
as for natural languages, ambiguity should be kept into account:
a sentence may actually bear different meanings, due to possible
differences at recognition or interpretative level.
2. PARSING STRATEGIES FOR PATTERN
RECOGNITION
2.1 The Theory of Formal Languages
A statistic approach to language started as the scholars realized
the value of statistical methods in the recognition of speech.
Hidden Markov's model was first used by the Author for
modelling the language of *Evgenij Onegin" (Markov, 1913).
In the theory of formal languages, a language is defined as a set
of strings: a string is a finite sequence of symbols chosen from
some finite vocabulary. In natural languages, a string is a
sentence, and the sentences are sequences of words chosen from
some vocabulary of possible words. A grammar is defined as a
four-tuple:
G-(N,T, P, S)
where N and T are the non terminal and terminal vocabularies
of G, P is the set of production rules, and S is the start symbol.
A formal language is indeed defined by:
= A terminal vocabulary of symbols (the words of the
natural language)
* A non terminal vocabulary of symbols (the syntactic
categories, e.g. noun, verb)
= A set of productions (the phrase structure rules of the
language)
=» The so called start symbol
We start from a special non terminal S, and S is replaced by the
string on the right side of a chosen production rule. The process
of rewriting a substring according to one of the rewriting
production rules continues until the string consists only of
terminal symbols:
S —aS|bS|e
where the symbol | indicates “or” and € is the null string. The
succession of strings that result from the process is a derivation
from the grammar: to find a derivation (a parse) for a given
sentence (sequence) is called parsing.
As to the latter approach, let's remind that in the Theory of
Formal languages, Chomsky divides languages in classes, thus
forming a hierarchy of languages, based upon different
grammars:
=» Unrestricted grammars (0-type grammars)
" Context-sensitive grammars (1-type grammars)
= Context-free grammars (2-type grammars)
» Regular grammars or finite state grammars (3-type
grammars)
The most general grammar is obviously the 0-type, which bears
no limits for rewriting rules: for the other types, such
restrictions are regularly increasing. Types 0 and 1 are able to
describe natural languages as the other two, much simpler to
manage from a computational viewpoint, are more suited into
limited backgrounds and have been hitherto used for artificial
languages.
684
In the 1-type grammar, rewriting rules restraints bear that the
right side of a rule should have at least as many symbols as the
left side.
For the 2-type grammar, all rules should have just one non-
terminal symbol on the left side
For 3-type grammar, the right side has only one terminal
symbol, or one terminal symbol and a non-terminal one for
every production rule.
An attraction of the use of a syntactic method for Pattern
Recognition is the possibility of description and recognition of
an object, a scene, a texture: but noise may complicate the
process of computing the string structure: extraneous primitives
may be generated by spurious edges, or actual primitives of
some shape may be undetected due to the poor quality of the
image.
Another problem is the ambiguity of the primitives that are
extracted to represent the patterns, how this ambiguity can be
accounted for in the model or in the analysis (in the
construction of the grammar or in the parsing).
A grammar normally divides strings in just two classes: the
grammatically correct ones, and the others. In any case,
ambiguity is very frequent, and it is even deliberately pursued
as sometimes happens in poetry.
The following different approaches have been proposed:
" approximation
= transformational grammars
" stochastic grammars
= similarity and error-correcting parsing
The use of approximation reduces the effect of noise and
distortion at the preprocessing and primitive extraction stage.
The second approach defines the relations between the noisy
pattern and the corresponding noise-free pattern by a
transformational grammar.
Many efforts have been devoted to construct more sophisticated
grammars, like stochastic and fuzzy grammars. Although
context free grammars or transformational grammars can
represent the phrase structure of a language, they tell nothing
about the relative frequency or likelihood of a given sentence. It
is usual in context free grammar to use recursive productions to
represent repetition, however one can generate sentences which
are technically grammatical, but not always acceptable.
Stochastic approach supplies a solution to this problem: each
production in a stochastic grammar is assigned a probability of
selection, a number between zero and one. During the
derivation process, rules are selected for rewriting according to
their assigned probabilities. Each string has a probability of
occurrence computed as the product of the probabilities of the
rules in its derivation.
When a string has two or more parses, we can use the more
probable parse as a description of the string (pattern): the most
probable parse is the one according to which the generated
string has the highest probability.
However, what we already know about probable occurrence
plays a meaningful role. The parsers are made up according to a
likelihood criterion. However, parsers may also be built
according to a further criterion, i.e. the Bayes’s theorem.
In this case, some utter a priori information is required about
the starting probability to deal with one class of patterns or
another one.
When a distorted pattern cannot be accepted by any grammar,
an error-correcting parsing, based upon a similarity criterion,
can be used: a way to handle noisy patterns is the use of
similarity measures instead of stochastic grammars: a similarity
or distance measure is defined between a sentence representing
a known pattern and sentence representing an unknown pattern.
Interna
The dis
of error
2.2 Pa
In acco
number
Parsing
starts fi
termina
starts fr
Besides
related :
0-type |
1-type |
2-type |
3-type 1
Chart al
althougi
In the
gramma
algorith:
between
Models
Kasami’
are trans
To get (
all non
terminal
A > €
A-—E
Be w- a
to be tes
The alg
element:
a value |
inside t;
A — a,
The tabl
The strir
be found
Stochast
Models.
A Hidde
series oi
inference
A Hidde
paramete
One can «