4. SPACE DRIVEN PARTITION BY TREES
(THE EXAMPLE OF THE HHCODE)
This data organization type allows to store the spatial data
using a monodimensional code, named Helical
Hyperspatial Code (HHCODE).
The record structure is shown in Fig. 18:
Fig. 18 - Structure of a HHCODE record
The method implemented by this tree structure uses an
iterative decomposition of the space. The whole region is
divided into four quadrants, to each of which a code from
0 to 3 is assigned. The data belonging to each of these
quadrants are assigned a CODE field which equals the
relative code of the region of affiliation. When the number
of data inside a subregion becomes greater than a fixed
threshold, a new uniform subdivision into four quadrants
occurs; to each subregions a new code from 0 to 3 is
again assigned.
The operation of continuous decomposition till the wanted
resolution level is reached by adding a further digit to the
CODE field.
By using such a storing procedure, an efficient search
ordering method can be simply built on the HHCODE. An
interrogation on a region defined by the two couples of
coordinates (x1, y1), (x2, y2) reduces to a query on those
data with a CODE value corresponding to this region.
5. THE GEODETIC-GIS EXPERIMENT
The first step in the construction of the geodetic archive
for the ITALGEO-GIS project (L. Biagi et al., 1998; L.
Biagi et al, 1999) consists in the comparison between the
performances obtained by using the Grass archive or a
data structure, like an RDBMS, suitable for geographic
applications (M. A. Brovelli et al., 1999). To this purpose
there are several commercial databases that deal with
techniques for storing geographic data. The best-known
products are Oracle Universal Server, Informix Spatial
Datablade, VISION Sybase and ESRI Spatial Data
Engine, which is not a DBMS, but is an engine working
with data stored in Oracle, Informix, Sybase, DB2,...
In the following paragraph we consider the object-
relational DBMS Postgres, originally developed within the
University of California at the Berkeley Computer Science
Department. This DBMS has the advantage to be free and
source code available.
The fundamental notion in Postgres is that of class
(collection of object instances). Each instance has the
same collection of attributes and each attribute is of a
specific type. Each instance has a permanent object
identifier (OID) that is unique.
The last Postgres version, PostgreSQL, allows to index
the database tables by using the B-tree, in the case of
monodimensional data, and by using the R-tree for the
spatial data. In the second case the discriminating
element is the type (size and position) of rectangles used;
the methods supplied by this type of indexing are: " it is
contained in “it contains...", “it is to the left/ to the
right/ above/ under/ as regards to ...", etc.
An indexing structure of this type is therefore particularly
efficient in the case of vectorial data, like arcs and
polygons. Unfortunately in the construction of the
Geodetic GIS the data prevalently used are distributed on
sparse points or on grids.
The use of an R-tree indexing method is not the optimal
choice in these cases because each operation and
comparison involves two couples of coordinates: each
point is described as a rectangle with null area and with
the vertices coinciding. By the way, Postgres is the only
DBMS suitable for spatial data which is freeware;
therefore we decided to try the efficiency of Postgres
versus Grass for managing these two data types.
The archive organization in Grass uses the hierarchical
directory structure of the UNIX system. The main directory
(GISDBASE) contains subdirectories (LOCATIONS),
which are independent databases and contain in their turn
subdirectories (MAPSETS) with files and subdirectories
(ELEMENTS), describing the data and map setting. The
database elements in the mapset are not fixed, but they
are dynamically created by the Grass applications.
Regarding the raster data, they are stored within Grass in
a bidimensional matrix N x M. The corresponding file
contains N lines of M columns of integer values. These
values, representing the attributes associated to each
raster cell, are called categories. In the case of a Digital
Terrain Model (DTM), for instance, each different height
may correspond to a different Grass category.
The information describing raster data are distributed
within more subdirectories (Cell, Cellhd, Cat, Coir, Hist,...)
in the MAPSET according to their contents.
The directory Cellhd, for instance, contains the raster file
header, that is the description of the file characteristics
(see Fig. 19):
• proj: the map projection;
• zone: zone of the corresponding UTM projection;
• north, south, east, west: boundaries of the raster file;
• e-w resol, n-s resol: resolution of the raster file;
• rows, cols: number of lines and columns of the grid;
• format: number of bytes used to represent the
attribute;
• compressed: the field points out if the data are in
compressed (1) format or not (0).
The sparse data are stored in Grass in files within the
sites_list directory. These files (Figure 20) have a header
containing:
• the name of the points contained in the file;
• the description of the attribute associated to the
points.
The data follow, each point entity being stored in a record,
with three attributes: east, north, attribute value.
Note that with this kind of organization it is possible to
associate to each point only one attribute. So, if we have
more attributes, we must have in the Grass archive more
site-points files identical with regard to the point locations,
but with different attributes. Vice versa if we use a DBMS
like Postgres we do not have limits about the number of
attributes to associate to the points, and the data structure
allows us to obtain them in a query procedure with a
single extraction.
typedef struct
{
information_type
INFO;
real
LAT;
real
LON;
hhcode
COD
} node;