International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004
These models are either given by hand or can also be acquired
using machine learning approaches (Sester, 2000). The
interpretation of vector data sets is a fairly new application. It
has mainly been investigated in the context of spatial data
mining (Koperski & Han, 1995).
3. METADATA DESCRIPTIONS OF SPATIAL DATA
SETS
3.1 Metadata in SPIRIT
In metadata information about spatial data sets can be stored.
Metadata are structured data to describe resources and to enable
users or agents to select and assess the data. However, there are
two major problems:
The expressiveness of metadata highly depends on the used
scheme. Many existing schemes define the content more or less
strictly. The ISO 19115 standard (ISO/TC-211, 2003) is
designed especially for geographical data sets. The metadata
used in SPIRIT are highly conforming to this existing
international standard. However we identified a set of metatags,
which are of essential importance for SPIRIT.
Secondly the enrichment with metadata still is a process, which
has to be done manually for the most part. Although there are
some tools supporting the data entry by using interfaces and
predefined lists of terms, the costs of manpower and time input
to enter the data are still almost insurmountable obstacles. This
leads to the fact that only few web sites and information
resources are enriched with metadata. For this reason tools to
generate metadata automatically would be preferably. We will
illustrate this ambition on the example of ArcView projects and
shape files. ;
3.2 Automatic Extraction of metadata
For SPIRIT, we considered the following metatags as of high
importance: name, spatial extent, keywords, contact and
resolution. In this chapter we will illustrate the automatic
extraction of metadata from ArcView shape files. Hereby of
special relevance is the discovering of keywords regarding the
stored spatial elements.
From ESRI shape format the following information can be
extracted easily:
- minimum bounding box
- number of geometrical elements
- . type of geometrical elements, like point, line, polygon
- information about the attributes and their structure,
like name. tvpe
That information is important for the interpretation of the
geometrical aspect of a data set. Indced it docs not tell us many
things about the semantics of the data. Particularly if the names
of the predicates are coded by numbers or like in the
abbreviated example given in table 1, the primary information
of the shape files is insufficient.
SHAPE AOBJID TEIL OBJART OART ATYP
PolyLine N01CZ70 . 001 3102 3102
PolyLine NOICZIS 002 3105 3105 1301
PolyLine N20LHCN 001 3106 3106
Table 1. ATKIS-record, Excerpt of the adequate dbf file
From this, it is not apparent, that this data represents a road
network, which is displayed in figure 1.
At least it is necessary to know, which data are coded in the set
to be able to provide an internet user the right information. Up
to date we only know about the type of elements, for example
there are lines, but we do not have knowledge whether the lines
are streets, pipelines, administrative borderlines or contour
lines. To detect this information, we analyse shape files and if
there is a legend available, more information can be extracted
from the ArcView project file to derive automatically adequate
keywords. The following example documents the process.
Figure 1. Road network data set
In figure 2 the automatically extracted metadata are shown.
E FTSimpleDisplay Metadata
<Metadata> A
<Name> Strabenverkehr (104) - Objektteil-Linien </MN ame>
<Path> u:/atkis/jade_weser_port/arcview/F104_It.shp </Patho
<CreationD ate> 15. April 2004] 16:56:52 <#CreationD ate>
«Keywords»
220 Straße
20 Bundesautobahn
56 Landesstraße, Staatsstraße
49 Forststraße
331 Gemeindestrabe
sonst. Strafe
u
223 Weg
51 Fahrbahn
«Keywords»
«Number of Entities? 1611 «/Mumber of Entities»
<ShapeType> Polylines </ShapeType>
<Min <> 3437212 720 </Min X>
<Min Y> 5934313.340 </Min Y>
Maw %> 3445062260 </M ax X>
<Max Y> 5944527390 «/Max Y»
</Metadata> : zi
Figure 2. Metadata for the displayed road network data set,
distinguishing different types of road (in German)
All available data are analysed to acquire the keywords. Text
files are checked to identify street names and designations of
regions. Captions often give a glimpse of the character of the
stored geographical elements, as well as the names of the
attributes in the dbf files.
The spatial extent of the data set is determined by the minimum
bounding box. Moreover there are also some indicators to infer
the scale or the level of detail of the data set. Analysing only
the geometry of features, a simple measure for the scale of a
data set can be the distance between the individual points a line
or a polygon is composed of. Furthermore, the existence and
type of certain geographic elements also give rise to a certain
resolution, e.g. typically buildings are only present in large
scales; in large scales roads are typically represented as areal
objects whereas in small scales they are given in forms of
polylines.
Internat:
Fines ree
Importai
the key!
distribut
connecti
and so c
set com
ambitioi
from d
especial
4. EX
As men
a first :
these |
interpre
“Autob:
comput:
languag
Those «
determi
ontolog
To enri
comput
higher
dala, le
inheren
element
implicit
Needles
still the
Howe vi
(which
is situa
periphe
Next w
which «
describ:
informe
approac
41 In
As Ari
parts, tl
geomet
because
NOW CC
world. '
by learı
Consid:
scenari
inform:
for a hc
to kno
be disc
stored i
In figut
like roa
by hun
spot so
That is
the dat:
2
o2
ON