Full text: Geoinformation for practice

  
In this paper a preliminary investigation of this problem 
is done, showing some different approaches, and a brief 
example of recognition of simple geometrical shape is 
given. 
2. The Theory of Formal Languages 
In the theory of formal languages, a language is defined 
as a set of strings: a string is a finite sequence of symbols 
chosen from some finite vocabulary. In natural languages, 
a string is a sentence, and the sentences are sequences of 
words chosen from some vocabulary of possible words. 
A grammar is defined as a four-tuple: 
G-(N,T,P,S) 
where N and T are the non terminal and terminal 
vocabularies of G, P is the set of production rules, and S 
is the start symbol. 
A formal language is indeed defined by: 
= A terminal vocabulary of symbols (the words of 
the natural language) 
= A non terminal vocabulary of symbols (the 
syntactic categories, e.g. noun, verb) 
= À set of productions (the phrase structure rules 
of the language) 
= The so called start symbol 
We start from a special non terminal S, and S is replaced 
by the string on the right side of a chosen production rule. 
The process of rewriting a substring according to one of 
the rewriting production rules continues until the string 
consists only of terminal symbols: 
S — aS| bS|e 
where the symbol | indicates “or” and is the null string. 
The succession of strings that result from the process is a 
derivation from the grammar: to find a derivation (a 
parse) for a given sentence (sequence) is called parsing. 
As to the latter approach, let's remind that in the Theory 
of Formal languages Chomsky divides languages in 
classes, thus forming a hierarchy of languages, based 
upon different grammars: 
=  Unrestricted grammars (0-type grammars) 
=  Context-sensitive grammars (1-type grammars) 
=  Context-free grammars (2-type grammars) 
" Regular grammars or finite state grammars 
(3-type grammars) 
The most general grammar is obviously the 0-type, which 
bears no limits for rewriting rules: for the other types, 
such restrictions are regularly increasing. Types 0 and 1 
are able to describe natural languages as the other two, 
much simpler to manage from a computational viewpoint, 
are more suited into limited backgrounds and have been 
hitherto used for artificial languages. 
In the 1-type grammar, rewriting rules restraints bear that 
the right side of a rule should have at least as many 
symbols as the left side; 
For the 2-type grammar, all rules should have just one 
non-terminal symbol on the left side 
44 
For 3-type grammar, the right side has only one terminal 
symbol, or one terminal symbol and a non-terminal one 
for every production rule. 
The language classes as arranged by Chomsky need a 
number of reconnaissance devices (automata): 
0-type languages: Turing machines 
1-type languages: bounded automata 
2-type languages: pushdown automata 
3-type languages: finite state automata. 
Although many classes of patterns appear context 
sensitive, context sensitive grammars have rarely been 
used, because of their complexity. Context free 
programmed grammars have been used , probably due to 
their effectiveness in describing natural languages: they 
are able to capture much of natural and artificial 
languages, but many problems required extensions. 
Context free grammars cannot model all the 
characteristics of natural languages. One example is the 
conversion of sentences from active to passive voice, 
Chomsky (1957) developed the theory of 
transformational grammar, in which a sentence is derived 
as a deep structure, then modified by transformational 
rules and finally converted in surface form by 
phonological rules. The deep structure is derived by a 
context free grammar, which generates “proto” phrases: 
strings like “the bridge crosses the river” and “the river is 
crossed by the bridge” have the same deep structure, but a 
different surficial structure. 
In syntactic pattern recognition problems, it is often 
important to represent the two or three dimensional 
structure of sentences in the languages. Traditional 
context free grammars generate only one dimensional 
string. Context free graph grammars have been developed 
in order to construct a graph of terminal nodes instead of 
a string of terminal symbols. 
A statistic approach to language started as the scholars 
realized the value of statistic methods in the recognition 
of speech. Hidden Markov’s model was first used by the 
same for modelling the language of “Evgenij Onegin” 
(Markov, 1913). A grammar normally divides strings in 
just two classes: the grammatically correct ones, and the 
others. In any case, ambiguity is very frequent, and it is 
even deliberately pursued as sometimes happens in 
poetry. The simplest way of accounting for ambiguity is 
the usage of a stochastic grammar. 
3. Parsing Algorithms 
In accordance with the needs of different grammars, a 
certain number of Parsing algorithms are in use. 
Parsing procedures may be top-down or bottom-up type, 
as one starts from initial symbol S, operating 
substitutions until only terminal symbols are present, 
such as to fit the clause, or as one starts from the string 
backward till the start symbol S. 
Tabular procedures are time saving, however they are not 
suitable to all grammar types. 
As an example, in the following the tabular Kasami 
method is used, which is remarkable for its semplicity: it 
works with context-free grammars, previously reduced to 
the Normal Chomsky Form. 
Actual 
gramm 
being 
(CNF) 
genera 
shall h 
A—B( 
A a. 
For in 
contex 
S 
B 
A — 
Some 
genere 
accord 
Rules 
canoni 
(non 1 
termin 
substit 
So 
There! 
A— 
S 
A— 
B. 
Bo. 
At last 
A —> 
B o 
Once 
its Cl 
assem 
belong 
Be w: 
grama 
reduce 
The a 
whose 
Gi sho 
termin 
A 3 
langu: 
The ta 
| « 
pre 
Also,
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.