The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B4. Beijing 2008
200
Traditional commercial RDBMS or ORDBMS systems and the
SQL standards (such as SQL2 and SQL-99) typically support
only simple data types and BLOB (Binary Large Objects). None
of those standard data types meets the management
requirements of geoimagery and raster gridded data. Even
though the geoimage data can be directly stored as BLOB’s in
the databases, there is no standard operations developed to
manipulate them inside the database system, and the standard
SQL query language doesn’t work the same way as for other
simple data types. SQL/MM defines a data type SIStilllmage
to store and manage still 2D images (ISO, 2001). However it
stores images in one of the standard image file formats, which
are best used to store only smaller images and are not
specifically designed to support geoimagery and geospatial
raster data types. Scalability and performance are concerns for
both the simple BLOB approach and the SI_StillImage data
type.
Geospatial imagery and raster data are typically huge in size
and have many special metadata associated with them, such as
coordinate system and georeferencing information. The
operations on them are also different from other standard data
types. In order to meet those special requirements, we propose
an enterprise database-centric approach. This approach has
uniquely a database-centric focus and uses server-based image
processing concepts. By database-centric it means the raster
data are stored and managed inside the database natively and
the management and processing functionalities are implemented
and embedded inside the database and closely and securely
associated with the raster data itself. It’s basically an
enhancement of the RDBMS from inside.
More specifically, we think it should consist of three major
components: a new native database data type for storing raster
datasets, a server-side image processing and raster operation
engine, and a standard user-friendly interface. It’s designed to
work in a client-server environment as well as in any multi-tier
architecture.
The native object data type in this approach is specifically
designed so that it can be used similarly as other standard
database data types. The data model is generic for most raster
data types, including geoimagery, so that each image can be
stored as an object in any relational table. The specific format of
the object type fits well into the enterprise RDBMS so that it’s
truly scalable and performant. For example, it allows flexible
user-specified blocking, which means each image stored can be
unlimited in size and adaptable to various applications. One
database table can contain virtually unlimited number of images
and various internal spatial indexing mechanisms enable fast
metadata query and raster data retrieval.
This approach emphasizes a server-side image processing and
raster operation engine. By doing that it offers true security for
the data because the data no longer needs to be retrieved and
loaded into a middleware or client through an insecure network
and processed in an unmanaged computer memory. The
processing engine is also closest to the data and so runs faster
by avoiding data transferring cost. The processes can be run
concurrently and deployed onto many powerful servers to
reduce the burden on the desktop image processing systems.
The processing engine can be coupled with middleware and
client-side processing systems to fully leverage the power of
enterprise distributed computing systems.
The approach offers a single data format and a SQL or PL/SQL
API, which dramatically improves usability and simplifies data
access. Usability is one of the key drivers behind this database
centric approach. SQL is the standard for enterprises and
enterprise application developers are most familiar with it. By
storing and managing the data inside the database, offering
various indexing and query capabilities, and providing many
basic processing operations through an easy-to-use and standard
interface, this approach allows non-geoimaging experts easily
integrate geospatial data with enterprise data, quickly leverage
geospatial technologies, and deploy powerful IT resources so
that the geoimagery and related information can be quickly
delivered, distributed and used by different enterprises and mass
consumers.
Oracle GeoRaster, an enterprise database management system
for geospatial raster datasets, was designed based on this
approach. To prove the concept of such a native database
centric approach, part of the design and some key benefits of
Oracle GeoRaster are further described in the floowing sections
of this paper. Some tests and research using the Oracle
GeoRaster technology were conducted and are partially
presented as well.
In the tests we used Oracle Database lOg Release 1, which was
installed on Asianux 1.0 Service Pack 1. The Linux server has
4x 1G RAM, 4x 2.4GHz CPU, and lx 72G internal hard disk.
Network Appliance NearStore R200 system was used for
database storage. It is a disk-based nearline storage system and
provides near-primary storage performance at near-tape storage
costs. The NetApp Storage consists of 16 disks (14 data disks +
2 parity disks, each disk is 292GB) combined into one global
disk by RAID4. The test dataset includes 50 digital Color Ortho
Images, courtesy of the Office of MassGIS, Commonwealth of
Massachusetts Executive Office of Environmental Affairs.
These 50 images cover the greater Boston area and can be
seamlessly mosaicked into one large image. Each image has
8000 rows, 8000 columns and 3 bands and has a size of 183MB
stored in TIFF format.
3. THE NATIVE RASTER DATA TYPE AND THE
SCALABILITY
As described above, the first key component of this database
centric approach is a new native raster data type, which is called
the GeoRaster data type in Oracle 1 Og and 11 g databases
(Oracle, 2004; Xie, 2008). Oracle GeoRaster defines a
component-based logically layered multidimensional raster data
model. A raster data object consists of raster cell data and
associated metadata. The raster cell data is a multidimensional
matrix of raster cells. Each cell stores a value, referred to as the
cell value. The number of bits used to store the cell value is
called the cell depth. The matrix has a number of dimensions, a
cell depth, and a size for each dimension. As a multi
dimensional matrix, the core data can be blocked and
compressed for optimal storage, retrieval and processing. In the
GeoRaster data model, all associated information (other than
the raster cell matrix) for the raster object is stored as
“metadata”, which include raster information, spatial reference
system information, date and time information, layer
information, and spatial extent (footprint) etc.
More specifically, a raster data (an image or a grid) is stored in
Oracle as an object of the SDOGEORASTER data type, called
the GeoRaster object. This object type is the core data type for
users and it stores all metadata and necessary information for
indexing and raster data query. The type is defined as below: