INDEXING TREE METHODS AND SPATIAL ORDERING FOR MAPS AND GEOGRAPHIC DATA: AN OVERVIEW
AND APPLICATION TO THE GEODETIC GIS PROJECT
L. Biagi, M. A. Brovelli, M. Negretti and C. Saldarmi
Politecnico di Milano, Facoltà di Ingegneria di Como
[ludo® gemini,como.polimi.it:maria@ipmtf4.topo.polimi.it: marco@gemini.como.polimi.it; cristian@qemini.como.polimi.it ]
KEY WORD: Indexing trees methods, spatial ordering, RDBMS access, Geodetic GIS.
PURPOSE:
The problem we want to deal with in this presentation is related to the extension of the data files or of the tables in a
RDBMS interfaced with spatial applications, like GIS. Usually in this case, either in case we are managing images or
gridded DTMs or very large sets of sparse objects (points, lines, areas or complex features), the goal is to find methods
allowing to reduce the high number of disk accesses that may be required to retrieve the attributes associated to a given
area.
Even if in the last years the technology has reduced the access time both to the disk and to central physical memory, the
ratio disk/memory remains about 10 4 -10 5 , therefore the number of disk accesses is untill now the most relevant
performance parameter of data structures. In this frame the challenge is to use an amount of central memory as small as
possible to describe the current allocation of data on disk in order to accelerate the retrieval of any object in the data
structure. And, because of the spatial content of the object, which is fundamental information in computer-aided design,
GIS and image recognition, the retrieval should be based on its geographical location.
In this paper we present an overview of the more important existing spatial data structures and their related space
partitions. In the last paragraph we describe the method used in the case of the ITALGEO-GIS.
1.THE PROBLEM OF SPATIAL DATA ACCESS
In order to understand the problem, we suppose to have a
set of sparse points (which is the easiest case) in which
we have some interesting information. For example we
can consider a set of points at which GPS measurements
have been done, so that the ellipsoidal height at these
locations are available. If we take into account (for the
sake of simplicity) only four of these points(see Fig.1), we
may describe them as belonging to a point layer in a GIS
structure and we may associate to their geometrical
description the RDBMS point relation, given in Table 1.
Fig. 1 - Locations of the considered points
Pointjd
Latitude
Longitude
1
44°50'
12°20'
2
43°60'
13°60'
3
42°50'
11°30'
4
41°90'
16°10'
Table 1 - Relation “Point"
Table 1 may be ordered by an index referred to one of its
columns; if, for example, we sort the table with respect to
the longitude the point sequence is 3, 1, 2, 4 and we have
no information about the actual distance of the points. The
same obviously happens if we sort with respect to the
other column. So, independently from the choice of the
attribute, it is impossible in this table to order the points
with respect to their distance, which is the spatially
meaningful information.
In case of bidimensional query on a region (Fig. 2) the
inefficiency of the relational model becomes evident: if we
want all the points in the dark rectangular region, we have
to examine the entire latitude and longitude intervals
corresponding to the light rectangular regions. The
availability of only one index to order the data avoids
these problems.
So let us summarize what we have seen by observing that
the spatial data structure is characterized by the fact that
the embedding space plays an important role.
In traditional RDBMS a comparative search (binary
search), based on a comparison of the relative values of
the elements stored, regardless their position in space,
and statistical queries were used.
If we want to introduce general location queries (e.g.:
which point is closest to a given query point), it is
impossible to efficiently generalize the monodimensional
indexing methods of the standard structures. In particular
if we generalize the concept of balanced binary search
tree to two or more dimensions (k-D-tree), we generate