Proceedings, XXth congress: Proceedings, XXth congress

altan, m. orhan
hul 2004 
it the 
‚as the 
ne non- 
terminal 
one for 
Pattern 
ıltion of 
cate the 
imitives 
tives of 
y of the 
that are 
/ can be 
(in the 
ses: the 
y case, 
pursued 
ise and 
lage. 
€ noisy 
by a 
sticated 
[though 
rs can 
nothing 
ence. It 
tions to 
s which 
»ptable. 
n: each 
ility of 
ng the 
ding to 
ility of 
; of the 
e more 
1e most 
nerated 
urrence 
ng to a 
> built 
] about 
erns or 
immar, 
iterion, 
use of 
nilarity 
senting 
pattern. 
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004 
The distance between two strings is defined in terms of number 
of errors (insertion, deletion, substitution.). 
2.2 Parsing algorithms 
In accordance with the needs of different grammars, a certain 
number of Parsing algorithms are in use. 
Parsing procedures may be top-down or bottom-up type, as one 
starts from initial symbol S, operating substitutions until only 
terminal symbols are present, such as to fit the clause, or as one 
starts from the string backward till the start symbol S. 
Besides, the language classes as arranged by Chomsky are 
related to a number of reconnaissance devices (automata): 
0-type languages: Turing machines 
1-type languages: bounded automata 
2-type languages: pushdown automata 
3-type languages: finite state automata. 
Chart algorithms are of special interest, because they are simple, 
although they are referred to context-free grammars. 
In the following, context-free as well as stochastic regular 
grammars have been used. So, we would recall a typical chart 
algorithm, the Kasami’s one, and also remind of equivalence 
between stochastic regular grammars and Hidden Markov 
Models (HMMs). 
Kasami’s algorithm is suited to context-free grammars, as they 
are transformed in the so called Chomsky normal form (CNF). 
To get CNF, we convert all rules to such production rules that 
all non terminal symbols would yield either two other non 
terminals or one terminal: 
A24 
A — BC 
Be w= a;a,....a, a string whose pertinence to a given gramamr is 
to be tested, the grammar being already reduced to the CNF. 
The algorithm is basically a triangular parsing table, whose 
elements are t; for 1 <i<ne 1 <j<n-i+l. Every t; should have 
à value being a sub-set of N. The non terminal symbol shall be 
inside t; if, and only if: 
A— ydg. jl 
The table is assembled as follows: 
"  onestates t;=A if A aj 
= one states t; =A even for a single k, such that 1< k<j 
if A — BC is to be found in P, having B present in tj 
and C in ti4 4. 
The string shall belong to the said language just in case S shall 
be found into t,, 
Stochastic regular grammars are equivalent to Hidden Markov 
Models. 
A Hidden Markov Model has, properly, hidden states: so, just a 
series of symbols is present, which allows a probabilistic 
inference towards the related sequence of states. 
A Hidden Markov Model (HMM) is specified by a set of 
parameters: 
» the set of states S = (sy, $3, ...,SN) 
= state sequence Q = ( qu, q>, …, Gk) 
" the prior probabilities (x) are the probability 
distributions of q; being the first state of a state 
sequence 
" the transition probabilities (aj) are the probability to 
go from a state i to a state j, i.e P (qj qi) 
" the emission probabilities (e) are the probability of 
observing x when the system is in the state q; p(x|qi) 
One can calculate: 
685 
" the likelihood of a sequence of observations with 
respect to a HMM by means of the so called “forward 
algorithm" 
= the most probable sequence of corresponding states 
(most probable path) by Viterbi algorithm 
= the HMM parameter estimation by forward-backward 
algorithm 
Let fi(i) = Pr (x,..x; m =k) be the probability to observe the 
sequence x- x;... X, at the same time, to be in the state k. The 
Forward algorithm allows recursive calculation of x probability 
to be done. The steps of algorithm are as follows: 
" initialisation: f,(i) =m; e (x;) 
* recursive step: f (1) = eı (x;)XZx (fi(i-1)ay) 
" termination: Pr(x)= )XZx (fi(n)ay) 
Viterbi’s algorithm, in the turn, lets associate to a sequence, the 
related most probable asset of otherwise hidden states: 
computation is quite analogous, just substituting in the Forward 
algorithm table the sum with the maximum search, according to 
the process: probability at each case corresponding to the 
Forward algorithm comes from the sum of some terms; 
however, for Viterbi’s algorithm, only the maximum one of 
abovesaid terms is used. 
Viterbi’s algorithm keeps, fore evrey state and for every 
position, a pointer to the preceeding state, so as to trace 
backwards the most probable path. 
The following HMM has the alphabet: 0,1; a;; ,a;, , 815 ‚321 ‚322 
405 , 05/ , 855, 855 are the transition probabilities and parameters 
e,(0), ex(0), ex(0), e,(1), ex 1), ex(1) represent the emission 
probabilities. 
The corresponding stochastic regular grammar is: 
SX, 
x; — OX, | IX, | 0X2 | 1X; | OX; | IX, 
Pu Pi2 P13 Pia Pis Pis 
X —+ 0x; 3x; T ox; T rxe lox. | 1x, 
P2 po Pas Pau [pos pos 
X; — 0x x lox. dix: lox, lHx; 
P3 po. Ds Ps ps Pis 
pu tT Piz =a 
Pız * Pı4 732 
P1s + P16 = 313 
pu /(pu * pi) 7 ei(0) 
pis / (pis + pia) = €12(0) 
pis / (pis* pio) = €13(0) 
e; (0) — e11(0) + e12(0) + e13(0) 
We repeat for the p»; and ps; probabilities. 
We can obtain the set of probabilities of the production rules of 
the stochastic regular grammar, given the set of emission 
probabilities and the transition matrix of HMM, and viceversa. 
Parsing is finding an optimal derivation for a given sequence 
(string): it can be tought as an alignment of non terminal 
symbols (hidden states) of the grammar and terminal symbols 
of the sequence (string), just as Viterbi's alignement of a 
sequence positions to HMM states. 
Moreover, the theory of stochastic automata define the class of 
languages accepted by stochastic automata.
1
2
...
698
699
700
701
702
...
1192
1193
Full text: Proceedings, XXth congress (Part 3)

Access restriction

Copyright

Note to user