hul 2004
it the
‚as the
ne non-
terminal
one for
Pattern
ıltion of
cate the
imitives
tives of
y of the
that are
/ can be
(in the
ses: the
y case,
pursued
ise and
lage.
€ noisy
by a
sticated
[though
rs can
nothing
ence. It
tions to
s which
»ptable.
n: each
ility of
ng the
ding to
ility of
; of the
e more
1e most
nerated
urrence
ng to a
> built
] about
erns or
immar,
iterion,
use of
nilarity
senting
pattern.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004
The distance between two strings is defined in terms of number
of errors (insertion, deletion, substitution.).
2.2 Parsing algorithms
In accordance with the needs of different grammars, a certain
number of Parsing algorithms are in use.
Parsing procedures may be top-down or bottom-up type, as one
starts from initial symbol S, operating substitutions until only
terminal symbols are present, such as to fit the clause, or as one
starts from the string backward till the start symbol S.
Besides, the language classes as arranged by Chomsky are
related to a number of reconnaissance devices (automata):
0-type languages: Turing machines
1-type languages: bounded automata
2-type languages: pushdown automata
3-type languages: finite state automata.
Chart algorithms are of special interest, because they are simple,
although they are referred to context-free grammars.
In the following, context-free as well as stochastic regular
grammars have been used. So, we would recall a typical chart
algorithm, the Kasami’s one, and also remind of equivalence
between stochastic regular grammars and Hidden Markov
Models (HMMs).
Kasami’s algorithm is suited to context-free grammars, as they
are transformed in the so called Chomsky normal form (CNF).
To get CNF, we convert all rules to such production rules that
all non terminal symbols would yield either two other non
terminals or one terminal:
A24
A — BC
Be w= a;a,....a, a string whose pertinence to a given gramamr is
to be tested, the grammar being already reduced to the CNF.
The algorithm is basically a triangular parsing table, whose
elements are t; for 1 <i<ne 1 <j<n-i+l. Every t; should have
à value being a sub-set of N. The non terminal symbol shall be
inside t; if, and only if:
A— ydg. jl
The table is assembled as follows:
" onestates t;=A if A aj
= one states t; =A even for a single k, such that 1< k<j
if A — BC is to be found in P, having B present in tj
and C in ti4 4.
The string shall belong to the said language just in case S shall
be found into t,,
Stochastic regular grammars are equivalent to Hidden Markov
Models.
A Hidden Markov Model has, properly, hidden states: so, just a
series of symbols is present, which allows a probabilistic
inference towards the related sequence of states.
A Hidden Markov Model (HMM) is specified by a set of
parameters:
» the set of states S = (sy, $3, ...,SN)
= state sequence Q = ( qu, q>, …, Gk)
" the prior probabilities (x) are the probability
distributions of q; being the first state of a state
sequence
" the transition probabilities (aj) are the probability to
go from a state i to a state j, i.e P (qj qi)
" the emission probabilities (e) are the probability of
observing x when the system is in the state q; p(x|qi)
One can calculate:
685
" the likelihood of a sequence of observations with
respect to a HMM by means of the so called “forward
algorithm"
= the most probable sequence of corresponding states
(most probable path) by Viterbi algorithm
= the HMM parameter estimation by forward-backward
algorithm
Let fi(i) = Pr (x,..x; m =k) be the probability to observe the
sequence x- x;... X, at the same time, to be in the state k. The
Forward algorithm allows recursive calculation of x probability
to be done. The steps of algorithm are as follows:
" initialisation: f,(i) =m; e (x;)
* recursive step: f (1) = eı (x;)XZx (fi(i-1)ay)
" termination: Pr(x)= )XZx (fi(n)ay)
Viterbi’s algorithm, in the turn, lets associate to a sequence, the
related most probable asset of otherwise hidden states:
computation is quite analogous, just substituting in the Forward
algorithm table the sum with the maximum search, according to
the process: probability at each case corresponding to the
Forward algorithm comes from the sum of some terms;
however, for Viterbi’s algorithm, only the maximum one of
abovesaid terms is used.
Viterbi’s algorithm keeps, fore evrey state and for every
position, a pointer to the preceeding state, so as to trace
backwards the most probable path.
The following HMM has the alphabet: 0,1; a;; ,a;, , 815 ‚321 ‚322
405 , 05/ , 855, 855 are the transition probabilities and parameters
e,(0), ex(0), ex(0), e,(1), ex 1), ex(1) represent the emission
probabilities.
The corresponding stochastic regular grammar is:
SX,
x; — OX, | IX, | 0X2 | 1X; | OX; | IX,
Pu Pi2 P13 Pia Pis Pis
X —+ 0x; 3x; T ox; T rxe lox. | 1x,
P2 po Pas Pau [pos pos
X; — 0x x lox. dix: lox, lHx;
P3 po. Ds Ps ps Pis
pu tT Piz =a
Pız * Pı4 732
P1s + P16 = 313
pu /(pu * pi) 7 ei(0)
pis / (pis + pia) = €12(0)
pis / (pis* pio) = €13(0)
e; (0) — e11(0) + e12(0) + e13(0)
We repeat for the p»; and ps; probabilities.
We can obtain the set of probabilities of the production rules of
the stochastic regular grammar, given the set of emission
probabilities and the transition matrix of HMM, and viceversa.
Parsing is finding an optimal derivation for a given sequence
(string): it can be tought as an alignment of non terminal
symbols (hidden states) of the grammar and terminal symbols
of the sequence (string), just as Viterbi's alignement of a
sequence positions to HMM states.
Moreover, the theory of stochastic automata define the class of
languages accepted by stochastic automata.