Full text: Proceedings, XXth congress (Part 4)

ul 2004 
A 
ent DERIVATION OF IMPLICIT INFORMATION FROM SPATIAL DATA SETS WITH 
ird DATA MINING 
VY, F. Heinzle', M. Sester 
)0, 
d Institute of Cartography and Geoinformatics, University of Hannover, Appelstr. 9a, 30167 Hannover, Germany - 
f (frauke.heinzle, monika.sester)@ikg.uni-hannover.de 
Working Group: TS WG IV/5 
of n. s : ov 
ce KEY WORDS: Spatial, Information, Retrieval, Metadata, Data Mining, GIS, Databases, Internet/Web 
on, 
ral 
ABSTRACT: 
Geographical data sets contain a huge amount of information about spatial phenomena. The exploitation of this knowledge with the 
aim to make it usable in an internet search engine is one of the goals of the EU-funded project SPIRIT. This project deals with 
spatially related information retrieval in the internet and the development of a search engine, which includes the spatial aspect of 
queries. 
Existing metadata as provided by the standard ISO/DIS 19115 only give [ractional information about the substantial content of a 
data set. Most of the time, the enrichment with metadata has to be done manually, which results in this information being present 
rarely. Further, the given metadata does not contain implicit information. This implicit information does not exist on the level of 
pure geographical features, but on the level of the relationships between the features, their extent, density, frequency, 
neighbourhood, uniqueness and more. This knowledge often is well known by humans with their background information, however 
it has to be made explicit for the computer. 
The first part of the paper describes the automatic extraction of classical metadata from data sets. The second part describes concepts 
of information retrieval from geographical data sets. This part deals with the setup of rules to derive useful implicit information. We 
describe possible implementations of data mining algorithms. 
1. INTRODUCTION The aim is to make spatial data sets visible in the Internet, 
especially to enable search engines to get knowledge about the 
There is an imagination, a dream, that some day our computer data and publish it or use it in search queries. This requires the 
would communicate with us in a meaningful way. Tim Berners definition of metadata that are sufficient enough to describe the 
Lee (2001) concretised this dream in the range of Internet with significant aspects of the data, but moreover it requires the 
the formulation of the Semantic Web. The idea is to let the development of algorithms, which will determine these 
computer understand not only the words used by humans, but metadata automatically. The second and more ambitious aim is 
also the context of the expressions and their use in different to even make the contents usable for a search engine. This 
situations. means to identify spatial phenomena in the data sets and to 
Especially when using an Internet search engine, we are often build a semantic network from implicit information in the data. 
confronted with the stupidity of the computer. Today most of Both attempts are described in the following chapters. In 
the search engines conduct a query by looking up keywords and section 3 we discuss the first issue, namely the automatic 
comparing them to a precompiled catalogue of all existing web annotation of spatial data sets with a set of important metadata 
sites. However, there is no analysis of the sense of a query or an tags. Subsequently we present ideas for the extraction of 
interpretation of the combination of used words in web sites. implicit information to use it for spatial concepts and 
The aim of building a Semantic Web deals with those questions. concentrate on data mining algorithms to derive this 
Linked to the idea of the Semantic Web is the EU-funded information. 
project SPIRIT (Jones et al, 2002). SPIRIT (Spatially-aware 
Information Retrieval on the Internet) is engaged in improving 
the concept of search engines by evaluating the spatial context 2. RELATED WORK 
of queries and web sites. The inclusion of the context and 
consideration of the semantic background improves the quality The extraction of information from spatial data sets has been 
of the results. Often we use spatial concepts to describe investigated in the domain of interpreting digital images. There, 
something or we keep a spatial situation in mind, when we the need for interpretation is obvious, as the task is to 
search for something. In SPIRIT we want to include those automatically determine individual pixels or collections of 
structures to define a spatial ontology. pixels representing an object. Basic techniques for image 
A huge amount of information is stored in spatial data sets. interpretation are either pixel based classification methods (e.g. 
However, usually these data sets are not accessible in the Lillesand and Kiefer, 1994) or structure based matching 
Internet. Most of the time there are neither metadata describing techniques (e.g. Schenk, 1999). The major applications in 
the datasets nor specifications of the intrinsic geometries and photogrammetry lie in the automatic extraction of topographic 
attributes. Furthermore these data sets contain a lot of implicit features like roads (Gerke et al., 2003), buildings (Brenner, 
information. 2000) or trees (Straub, 2003). The main challenge is to provide 
appropriate models for the objects to be found in the images. 
o3 
O3 
UA 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.