ial data.
al world
| several
field of
ce level.
1l as the
set into
s by
ne with
search on
but at the
roblem is
automatic
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004
matching tool for our research that gives us the possibility to
match the data comfortably by hand and use these matchings as
an input for the integration of data schemas.
This work is part of the Nexus project (Nexus 04). In the Nexus
project, we are developing an open platform for all possible
types of mobile, location-based information systems. In order to
realize a generic approach in Nexus, different data providers
have to be able to integrate their data into the Nexus world
model. For this reason, a schema integration takes place that
maps the object classes of existing data schemas from data
providers onto the classes of the Nexus schema. At the moment
this process is done manually. In this paper we show how this
can be done in an automatic way. The paper first gives an
overview on related work. In section 3, it is discussed how
spatial databases can be related. Section 4 comprises a detailed
explanation of the realization of our approach.
2. RELATED WORK
The topic of spatial data integration is very much related to the
research arcas listed below. Some of their aspects will be briefly
presented in the following section
un
The notion of the research presented in this paper has already
been addressed by (Uitermark 1996): “Geographic Data set
integration (or map integration) is the process of establishing
relationships between corresponding object instances in
different, autonomously produced, geographic data sets of a
certain region. The purpose of geographic data set integration is
to share information between different geographic information
sources”.
2.1 Matching and conflation
Concerning the matching of spatial objects, the basic idea is to
express and to evaluate the similarity of spatial features. If a
certain degree of similarity can be detected, two features can be
assigned to each other. (Bruns & Egenhofer 1996) have adopted
this basic assumption and count the steps that have to be taken
to transform one representation into another representation. The
number of steps can then be interpreted as a similarity measure.
A fundamental, line-based matching approach for street network
data has been presented by (Walter and Fritsch 1999). In a first
step, the algorithm finds all potential correspondencies of
topologically connected line elements in two source data sets by
performing a buffer operation. The matching candidates are
stored in a list. This list is ambiguous and typically contains a
large amount of n:m matching pairs. Then, unlikely matching
pairs are identified and eliminated using relational parameters
like topologic information and feature-based parameters like
line angles. The result is a smaller but still ambiguous list with
potential matching pairs. These matching pairs are evaluated
with a merit function in order to compute a unique combination
of matching pairs which represents the solution of the matching
problem. This is a combinatorial problem which is solved with
an A* algorithm.
A point-based matching method was proposed, for example, in
(Bofinger 2001). The algorithm developed here is based on the
idea of describing intersections of streets, i.e. nodes of a street
network, by an explicitly defined code. The code consists of
point coordinates, abbreviations and names of incident streets
and the number of linked edges. For cach intersection, such a
153
code is created. By comparing the codes of the intersections
within different data sets and by assigning the intersections with
the most similar codes to each other, references can be derived.
The problem of conflation is for example being tackled by
(Cobb et al. 1998). The merging process is defined here as
"feature deconfliction", where all parts of a matched feature pair
are unified into a single "better" feature. The conflation
algorithm has to decide, which properties are preserved in the
resulting instance. In their approach, the authors are also taking
into account the data quality information of the corresponding
instances.
2.2 Semantic data integration and ontologies
According to (Uitermark et al. 1999), semantic integration can
be understood as a communication process since two partners
who want to communicate have to have the same understanding
of the objects they are talking about.
In the database domain, some work has been done regarding
schema matching by (Do and Rahm 2002), where schemas are
compared using parameters like element names, data types or
further structural information. In the field of GIS, a lot of
different approaches have been carried out using ontologies.
Ontologies can be defined as formalized specifications of
concepts about objects of the real world from a certain
application perspective (Gruber 93). Whereas database schemas
require a digital representation, ontologies are just abstract
views on the semantics of things. There is only one ontology for
an object in a certain application domain, but there can be
multiple database representations (Fonseca et al. 2002).
Consequently, concerning schema integration, two cases have to
be considered (Hakimpour and Timpf 2001):
I. Database schemas arc based on the same ontology:
only synonyms and homonyms have to be detected to
perform an integration.
2. Database schemas are based on different ontologies
(from different application domains): a common
ontology has to be created by detecting the
similarities of the source ontologies.
The authors are presenting a formalism for the representation of
ontologies, the so-called Description Logic (DL). Each user
community can define its perception of an object using DL and
then different ontologies can be merged.
Another example on how to integrate different semantics of
spatial data is provided by (Bishr et al. 1999). The approach
consists of two components, the Semantic Wrapper and the
Semantic Mapper. Objects of different spatial databases are
wrapped by the Semantic Wrapper and have to conform to a
predefined interface so that they can be recognised by the
Semantic Mapper. This interface is specific for a certain
application domain like transportation, topography, etc. On the
level of the Semantic Mapper, the semantics of two objects can
be compared and the schematic and semantic differences
between them can be resolved.
2.3 Standardization
The question of interoperability of GIS is mainly addressed by
the OpenGIS Consortium (OGC 2004) and the Technical
Commission 211 of the International Standards Organization
(ISO-TC211 2004). Both institutions are closely linked.