and
/ or
\ain-
ures
data.
and
CESS
ss of
d by
es of
mity
and
hods
ries,
e for
tally
dex.
ever,
one-
They
r this
Ccess
s: and
One
hods,
ional
y are
nded
non-
S of
g of
ith a
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004
2. SPATIAL ACCESS METHODS
2.1 Indexing in Database Systems
The main task of a database system (DBS) is to store large sets
of data persistently. The database management system (DBMS)
must support the insertion, modification and deletion of
arbitrary data in arbitrary sequences. For this reason, the DBMS
organizes the data in database blocks. The access to secondary
storage (i.e. typically to hard disks) is performed blockwise,
i.c., the access to a data record requires the transmission of (at
least) one complete block that may store also non-required
records.
An index dynamically organizes the database blocks in order to
accelerate the access to blocks containing records that fulfill
some query condition(s) (e.g., all persons born in Istanbul). The
data structures that are used for building and maintaining an
index are called index structures. In current database systems,
two types of index structures are most often used: B-trees and
hashing.
A B-tree is a dynamic balanced tree. Each of its nodes
corresponds to a database block. B-trees store the data sorted
according to a selected attribute. For processing a query, the
tree is traversed starting at the most upper node (= the root):
only subtrees are accessed that potentially refer to queried data.
Figure 1 illustrates a B-tree.
leaf node
Figure 1. Example for a B-tree.
Hashing computes the location of a block on secondary storage
(i.e. the block address) using one or more selected attribute(s)
of a record. This computation is done by a hash function. Figure
2 depicts the hashing approach. Hashing supports efficiently
exact match queries, i.e. the search for records with attribute
value(s) that exactly match to the query condition (like in the
above Istanbul example). However, hashing has efficiency
problems either with handling uneven data distributions or with
range queries (like finding all persons born in a city whose
name starts with I).
block addresses: 0 1 2 3
32 9 14 11
hash function: 18 15
h : S(x) MOD 4
Y Y
26 27
Figure 2. Example for hashing.
99
2.2 Indexing Spatial Data and Point Data
B-trees require a linear ordering of the data and hashing has —
as mentioned before — problems with uneven data distributions
or with range queries. Because of this reasons, conventional
cannot be used — without extensions — for organizing spatial
data. Therefore, special spatial access methods have been
developed for spatial database systems and Geographical
Information Systems. Point access methods allow organizing
multidimensional points and rectangle access methods — in
addition — the storage of extended multidimensional objects like
rectangles, cuboids, and (in approximation) of polygons, arcs
and solids.
The grid file (Nievergelt et al., 1984) is an example for a
multidimensional point access method. It is based on hashing.
However, the hash function is replaced by a grid directory. This
directory stores block addresses in its cells (see Figure 3). Grid
files have performance deficits storing uneven or correlated
distributed points.
sr —""—— ^| grid directory
M
{
data block 3 m. ries diy
1 1 ST
Ur
U
; i
if oo o
26 bd s : E
! . |
| 2 4 A data block 1
data block 2 Li ir
i
0
umm te Fil
+
-10 7 25
Figure 3. Example for a grid file.
The partitioning of the data space by grid files has following
properties:
e The region described by a database block (the so-called
block region) is rectangular.
e The data space is completely covered by the block regions.
e The block regions do not overlap.
However, for achieving efficient spatial access methods, at least
one of these three properties must not hold (Seeger, 1989).
2.3 Hash Trees
Hash trees are multidimensional point trees that combine
hashing with data structures derived from trees. A typical
example of a hash tree is the BANG file (Balanced and Nested
Grid File) developed by M. Freeston (1987). The BANG file is
a hierarchical tree. The upper part of the tree is the directory
and the leaf nodes store the real data (“data nodes”). The block
regions of the directory nodes are based on a grid structure and
represented by a (multidimensional) rectangle. In contrast to
conventional grid files, a block region does not represent the
complete area of this rectangle. Instead, the included rectangles
of the smaller block regions in the same node are removed from
the rectangle. In consequence, the shape of block regions is
irregular and may consist of several, unconnected areas.
Figure 4 shows a set of points organized by a BANG file. The
points are distributed on an area having the shape of a sinus
curve. The figure depicts the partitioning of the BANG file of
all nodes having the same height in the tree.
"ODE
€