e
f
f
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004
4. DATA INTEGRATION
4.1 Overview
Data Integration is a very actual research topic covering many
different aspects from a variety of different domains. In this
part of the GEOTECHNOLOGIEN project the integration of
heterogeneous vector data sets is the main focus. Data
integration or map conflation can be divided in horizontal and
vertical integration. Horizontal conflation is referred to edge-
matching of adjacent maps with the objective of eliminating
spatial and thematic discrepancies in the common area of the
maps, vertical conflation describes the integration of two (or
more) maps covering the same area with differences in data
modelling, thematic content and accuracy (Yuan & Tao, 1999).
The result of the integration of ATKIS and the geoscientific
maps is slightly different from the common definition of map
conflation. As it is not the aim of the project to develop a new
master data set (Beller et aL, 1997), but to enhance the
geometric accuracy of the geoscientific data sets. In this project
the creation of a master set is not recommended because
ATKIS is chosen as reference data set regarding the higher
geometric accuracy and actuality. Therefore the topographic
content of the geoscientific data sets is adjusted to a reference
data set.
During the integration process there are various mandatory
tasks. The geometric accuracy of ATKIS — which is based on
the higher acquisition accuracy and the more frequent updates —
should be used to correct and enhance the geometric content of
the geoscientific data sets and avoid parallel updating.
4.2 Semantic Differences
At the beginning of the integration process the semantic models
— which means at this time of the project the thematic contents
— of all data sets are compared. Topographic elements which
are represented in all of the three data sets are selected and will
be used as candidates for the matching process. This selection is
mandatory to avoid comparing “apples and oranges” and has to
be the first step to ensure a successful integration.
Four different types of data integration are defined in (Walter &
Fritsch, 1999).
e L: stemming from the same data source with unequal
updating periods,
e IL: represented in the same data model, but acquired
by different operators,
L I11.: stored in similar, but not identical data models,
* . IV.: from heterogeneous sources which differ in data
modelling, scale, thematic content.
The integrational part to be performed in this project could be
categorized as type IV.
In the first phase of this project, the topographic feature class
"water areas" has been chosen as a candidate for developing
and testing, because of the presence of this topographic element
in all data sets.
S. INTEGRATION WORKFLOW
One aim of the project is the adaptability of the research results
to real applications. Therefore all the research is pursued in
close partnership with external partners from geology and soil-
science.
S.l Application framework
At this point of the project the first research results and selected
algorithms have been implemented in a software prototype.
Vividsolutions developed an open-source GIS application based
on the JAVA development language. The Unified Mapping
Platform JUMP is a GUI-based application for viewing and
processing spatial data. It includes many spatial and GIS
functions. It is also designed to be a highly extensible
framework for developing and running customized spatial data
processing applications. JUMP is based on the Java Topology
Suite JTS, a JAVA programming library which offers various
modules for the development of highly adopted software
applications for data integration (JUMP 2004).
Using this system which represents data according to the OGC-
standard a software prototype is developed, which serves as
testbed for different matching-algorithms and is used for
visualization of the origin data sets and the matching results.
The concept the federated database foresees that all the original
data sets will be kept — however the links between
corresponding objects in the different data sets will be
explicitly stored.
5.2 Data preparation
Before the integration process can be started, all the data sets
which will be used in the integration workflow, have to be pre-
processed to a common data format.
In this project a federated data base is developed which is
capable of importing the data sets in their original format,
converting them to a common standard and store them in a
single data management system (Tiedge et al. 2004).
5.2.1 Harmonisation
Water objects in ATKIS are represented in two different ways:
Water areas and rivers exceeding a certain width are
represented as polygons. Thinner rivers are digitised as lines
and arc assigned additional attributes, referring to some
classified ranges of widths. The representation of water objects
in the geo-scientific maps is always a polygon.
These differences have to be adjusted before integration starts.
For the first implementation a simple buffer algorithm has been
chosen, using the line representation from ATKIS as centre line
and the width attribute. This enables the operator to compare
the polygon from ATKIS and the water object from the geo-
scientific maps using a mere intersection.
Another problem is the representation of grouped objects in
different maps. For a group of water objects, e.g. a group of
ponds, the representation in the different data sets could either
be a group of objects with the same or a different number of
objects, or even a single generalised object. Finally, also objects
can be present in one data set and not represented in the other.
All these considerations lead to the following relation
cardinalities that have to be integrated: 1:0, 1:1, 1:n, and n:m.
5.3 Geometry based matching
5.3.1 Selection Sets
As it was mentioned in 4.2 the data delivered from the data
management system, will be selected using specified feature
attributes, resulting in the three selection groups (ATKIS,
geological map and soil-science map).
Due to the fact that the objects from all three data sets are
representations of the same real world objects, they show