[S
1formation
)n systems.
nodels, and
rge objects,
tial objects.
presents a
alyzing the
ect Related
>s. We then
lel that uses
s. We also
pport large
ness of the
and further
tion, GIS
Video
1ds
pectrum
ser
1. INTRODUCTION
In the past few years, a substantial development
has been going on in the field of managing large
spatial objects such as digital imagery, digital
terrain model and scanned maps, mainly due to the
interest of building multi-media spatial
information systems and global environmental
information systems. The development can be
roughly classified into two categories. The first
category is through system integration which uses
two or more different systems, such as image
processing system to handle image objects and Data
Base Management System(DBMS) to handle non
image objects such as text and graphics. Examples
can be found in Chang[1990], Zhou Q.[1989],
Wegener[1989], and Zhou[1991]]. The second
category is through next generation data base
systems using Abstract Data Type(ADT) or object
oriented data model to handle large objects. These
systems include Lohman[1989], Orenstein[1989],
Deux [1990], Gupta[1992], and Stonebraker [1993].
However, there are no generally accepted solutions
in GISs at this time. With the first method, two
systems are loosely integrated. Large objects in the
image processing systems are processed
independently and the results are converted into
the DBMS to perform GIS operations. This not only
limits the use of DBMS for large object
management, but also makes the data processing
unnecessary complicated and time consuming
because of multi data conversions. With the second
method, all large objects are treated as long binary
data strings with little semantics and data
abstraction associated with them. This is not only
inefficient for data processing because the whole
data set may need to be read, written and processed
together, but it also makes many kinds of
interactive data processing impossible.
In this paper we will analyze large object contents,
highlight their special features, and develop an
object oriented model to support them in GISs, using
digital images as examples. We will also
investigate their query patterns and present
several methods that can be used to reduce the
amount of data, to improve data retrieval
efficiency, to speed up data query and to better
Support browsing. We will then use several
Practical GIS query examples to show the
performance improvement upon using different
techniques. We conclude the paper with some
discussions on system performance and future
research issues.
2 . LARGE SPATIAL OBJECTS
Large spatial objects are often represented by
multi-dimensional matrices using long unstructured
byte strings that are often stored and transmitted
entirely. More precisely large objects consist of a
list of small items and long data strings. The list of
small items will be used to interpret the data
format and meaning of the unformatted long data
strings following it. For image data, these small
items may be the image header; For Digital
Terrein Model(DTM) data, these may be the name
of the region, the coordinates of the origin,
resolution, precision, etc. These small items are
often mandatory and are used to interpret long data
string for display and process, and/or to identify
and distinguish one data string from others. We
call these small items direct related attribute
data(DRAD). While DRAD is indispensable,
other formatted attribute data describing the
contents and features of large spatial objects is
generally not mandatory. For digital imagery, this
data may be histogram, color map, and
interpretation results from the original image
data; For DTM, it may be the contour line, the
slope and visibility data. This data is often the
result of data calibration, interpretation,
processing and analyses. We call this data derived
attribute data(DAD). The DAD data is per se
redundant because it is just another form of
information presented in the source data. Usually
DAD is very difficult and/or time consuming to
derive. In GIS, it is desirable to store DAD in the
database because DAD is high level information
and can be used to answer most GIS queries. In
addition we prefer to integrate DAD into GIS
databases because DBMS then can be used to
manage DAD.
Because DAD and DRAD is much simpler than the
raw data and sometime they may be well modeled
by relational data model, many researchers use
this technique to handle large objects. They store
DAD and DRAD in RDMS, while manipulating
the long string data through a link between the
relational table and operating system files[Zhou,
1991]. This approach may work for simple
applications, but it has several serious drawbacks.
First, the DAD and DRAD can be semantic rich and
complicated (for example geometry and topology
data) because of the amount of information
embodied in a large object. Second, because of two
data bases used are independent, it is very difficult
to maintain data integrity and perform transaction
management. Third, the relational data model
lacks the power to define the semantics inherited
in the large objects and the methods needed to
process large objects. It is indeed not too much
213