International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004
Important information can be anticipated already by means of
the keywords. However there is still no knowledge about the
distribution and location of the geometrical elements, their
connections to each other, their accumulation in special places
and so on. Those characteristics make the information of a data
set complete and allow humans to interpret data. This is the
ambition of the next step, namely to extract implicit information
from data sets and making them visible in the internet,
especially for search engines.
4. EXTRACTION OF IMPLICIT KNOWLEDGE WITH
DATA MINING
As mentioned in the above chapter especially the keywords are
a first approach to get some semantic information. However
these keywords have a big drawback. They are still
interpretable only by human beings. Still expressions like
*Autobahn", *Aérogare* or "Hospital" are characterless to the
computer. We would need a translation in two respects: first a
language translation, but moreover a semantic translation.
Those catalogues, which describe the meaning of a word and
determine its sense depending on the context, are called
ontologies.
To enrich the ontology our ambition is focused on teaching the
computer to learn spatial concepts and to combine knowledge to
higher concepts automatically. They are hidden in the spatial
data, less to find on the level of pure geometry, but rather
inherent by the combination and interaction of the spatial
elements. Spatial data mining is the approach to extract those
implicit information.
Needless to say, upon finding those implicit spatial structures
still the computer does not know the meaning of "Autobahn".
However the concept is learnt, that “Autobahn” is a major road
(which has own concepts as well), has less junction points and
is situated rarely inside of settlement areas, but rather in
peripheral areas.
Next we will introduce those implicit structures and concepts,
which could be useful for a search engine. Afterwards we will
describe procedures and algorithms to discover inherent
information with data mining and will document first
approaches and results.
4. Implicit Data
As Aristoteles put it: the whole is more than the sum of its
parts, the content of a spatial data set is more than only the pure
geometry. Cognitive structures of human beings fit to the world,
because they were formed by adaptation to the world. Up to
now computers do not have this semantic knowledge of the
world. The challenge is to reproduce such an adaptation process
by learning automatically.
Considering typical queries to a search engine and user
scenarios with spatial background, there is a lot of helpful
information stored in data sets. E.g. a user would like to search
for a hotel in the centre of the city, at least the search engine has
to know, where the city centre is located. This knowledge can
be discovered in vector data, but it is usually not explicitly
stored in an item.
In figure 3 you can see topographic elements of a small village,
like roads and houses. However, this is already an interpretation
by humans. You have to be aware, that actually you just can
spot some lines and polygons, which are differently coloured.
That is the prior information the computer is able to get out of
the data.
337
Figure 3. Where is the city centre located?
Indeed we recognise streets and houses and we are able to
reason further facts. Humans can locate the church by the
special shape of this building. The interaction of the streets and
houses and their concentration induces at least the information,
that it is a village. We also can identify larger buildings in the
upper part and distinguish them from smaller ones in the south.
A computer can calculate these facts too. The big challenge is
the following reasoning process. Humans interpret the larger
buildings as the inner part of the village, because they know
about old farmyards and the typical formation of a village (in
Germany). The smaller buildings represent a colony of one-
family houses. We are able to locate the main street leading
through the village as well, because of the structure of the
settlement. Therefore humans can detect the city centre
approximately without difficulty.
There is a plenty of examples and ideas, which would be useful
in SPIRIT. At least we would like to concentrate on some
concepts mentioned below:
- classification of more or less important cities
- sphere of influence of cities
- . detection of the centre of a city
- determination of tourist areas and attractive
destinations
possibilities of suburban or industrial settlement,
urban development, quality of housing
The information available in the data set, which we consider to
exploit in those concepts together with the necessary operations
to extract and combine the information is described in Heinzle
et al. (2002). ;
Some characteristics of the elements can be determined with
simple GIS functionality like to calculate an area/size or to
count the existence of special objects. The evaluation of other
properties, like density, distribution or neighbourhood, is more
complicated. The analysis of distances is an essential part to get
knowledge of these aspects. However, the handling of threshold
values or absolute numbers is less helpful, because it depends
on the context, if an attribute or a characteristic is really
specific and outstanding. Most of the time those values are of
interest and shed light on something, which distinguish
themselves and excel at special properties in contrast to the rest
of the data. Clustering algorithms can be used to identify groups
of elements respectively their neighbourhood. Among
clustering algorithms those are preferable that do not need
threshold values (Anders, 2003).
Moreover the combination of properties and their calculated
values raise a problem. Logic operations have to be extended by
weighting and quantifiers, which depend on the importance,
relevance, quality of the attribute values and significance of
elements.