Full text: Proceedings, XXth congress (Part 4)

  
  
| 
  
  
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004 
  
4.2 Automatic Derivation of Implicit Data 
As mentioned above there are rules implicit in spatial data, 
however there are two different ways of approaching the goal of 
extraction of implicit knowledge. These two kinds of extraction 
models are on the one hand to define the rules a priori 
(association rules) and to apply them to the data, on the other 
hand to let the computer find the rules by itself by exploring the 
data. Both ways lead to more knowledge, but in the first case it 
is knowledge, which we were especially searching for, like the 
concepts of chapter 3.1. The second case brings up unknown 
knowledge or inherent information, which may be useful to 
learn more about the data set, but can be not useful as well. 
Both methods are usually known as data mining (Witten and 
Frank, 2000) and will be described and examined. They are 
discerned into supervised and unsupervised classification. 
4.2.1 Supervised Classification: implies knowledge 
discovery on the basis of predetermined models respectively 
spatial association rules. Supervised classification starts from a 
set of classified examples for a concept to be learnt. From this 
set classification schemes for the concepts are derived, e.g. 
using machine learning approaches (Michalski et al., 1998), or 
also Maximum Likelihood classification (Lillesand and Kiefer, 
1994). In principle every kind of knowledge representation can 
be used to form a classification scheme, especially rule-based 
systems or semantic networks. We will depict the process by 
the help of decision trees. Every branch symbolises the 
existence of a distinctive classification feature. Depending on 
the result of the inquiry the adequate branch will be followed 
further. In the end the model leads to a classification into 
different categories of one issue. However the scheme includes 
some essential problems. The sequence of the validation of a 
distinctive classification feature is one determining factor. The 
use of such a step by step algorithm without the possibility to 
go backwards holds the endangerment of abandoning important 
elements or a proper solution at an early stage. The 
determination of thresholds respectively stop criterions can lead 
to problems. Therefore the need of high quantitative and 
qualitative data is necessary to be able to calibrate the model. 
The concepts of *the centre of a city" can be implemented by 
using such supervised methods. For example, we could 
determine a point as a city centre, if it fulfils following 
conditions: 
- major streets will meet in the centre 
- the buildings in the centre are larger in comparison to 
areas outside 
- non-existence of industrial areas 
= etc. etc. 
The weak point of such specifications can easily be recognised: 
- the descriptions are given in natural language, which 
is not directly usable by a computer 
- the specifications are vague 
- not all conditions might be needed in all cases 
- some conditions can be more important, some less 
important 
- there is no guarantee, that the model composed by 
humans is accurate, proper and especially complete 
- possibly there are much more criteria, which we have 
ignored and did not take into account. On the 
contrary, we could have included distinctive features, 
which do not correspond to the reality, and have only 
been valid for a small test data set. 
Basically we expect to retrieve a special information as a result 
of predefined inputs. However, the classification model will 
338 
fail, if the perceptions will not agree with reality. The above 
mentioned difficulty of combining the criteria and their values 
is already hidden in the scheme. In the case of inadequate 
combination and insufficient provision of characteristics 
misinformation will be generated. On the other hand the quality 
of deliberately formed models depends highly on the human 
creativity and ability to reason. Spatial phenomena and 
relationships have to be recognised by humans a priori to 
implement them into a supervised classification algorithm. 
This implies, that the setup of such models has to be done very 
carefully, possibly using large test data sets in order to gain the 
information from and to perform tests for verification of the 
derived rules. Furthermore, a specific inference scheme has to 
be designed to apply the rules to the data, that takes the 
probability or the importance of a condition to a rule into 
account. 
4.2.2 Unsupervised Classification: The method aims at 
leaving the process of knowledge discovery to the computer 
itself. That means the computer has to discover rules, 
separations into categories, similarities in data sets without any 
predefined restrictions. Koperski & Han, 1995, describe an 
approach, where spatial associations between objects have been 
analysed automatically leading to the derivation of a rule stating 
that “all large cities lie close to a river”. Since such rules are 
induced from a finite set of examples, they cannot be verified, 
but only falsified. Thus, there has to be a validation of the 
utility of the detected information. It may happen, that rules 
will be found, which are obvious and do not give further 
knowledge. It is another process of learning to distinguish 
useful and non-useful rules. 
One form of Data Mining is clustering in order to find 
regularities or similarities in data sets. We used it for the 
following investigation: 
A way to analyze geometric objects is to determine their 
characteristics and to try to find regularities among them. Such 
regularities then, in turn, can be considered as representatives 
for a certain class of objects or a class of objects in a certain 
context or environment. For linear objects or even networks of 
linear objects the nodes are such a characteristic, including the 
node degree, i.e. the number of outgoing lines from the node. 
Furthermore, also the angles of the outgoing lines can be 
important. Different types of nodes can be distinguished and 
classified, as shown in figure 4: 
os TY Ko Y X 
ELL TEE ARW FRK | CRS KAY PSI 
Figure 4. Different node types 
We made some investigations analyzing the node types of 
linear networks. 
Three examples will point out the process: 
I. While investigating the concept of the city centre with 
supervised models, we introduced the criteria of crossroads in 
the centre. A crossroad is a node with at least four outgoing 
lines, which were expected primarily in the city centre, as there 
many roads come together. The tests turned out in an 
unexpected result of this investigation. The condition to find 
crossroads in the city centre depends on the size of the town. 
There seems to be a rule regarding the relation between the 
structure of the centre, the spatial arrangement of streets and the 
size of the city. 
In figure 5 typical structures in the city centre are shown, 
depending on the dimension of the town. In small towns often a 
big street leads through and mainly TEE-junctions can be 
Internat 
found, | 
dominat 
the city 
observe 
itself o 
located. 
This co 
can be f 
  
  
  
Figu 
2. June 
existen: 
to look 
settlem 
approxi 
Ámong 
represe 
junction 
there a 
betwee 
As sho 
highwa 
their c 
validate 
areas ir
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.