International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
IN-DATABASE RASTER ANALYTICS: MAP ALGEBRA AND PARALLEL
PROCESSING IN ORACLE SPATIAL GEORASTER
Qingyun (Jeffrey) Xie', Zhihai Zhang, Siva Ravada
Oracle Corporation, One Oracle Drive, Nashua, NH 03062, USA -
(qingyun.xie, zhihai.zhang, siva.ravada)@oracle.com
KEYWORDS: raster, image, database, analytical, processing, query, management, software
ABSTRACT:
Over the past decade several products have been using enterprise database technology to store and manage geospatial imagery and
raster data inside RDBMS, which in turn provides the best manageability and security. With the data volume growing exponentially,
real-time or near real-time processing and analysis of such big data becomes more challenging. Oracle Spatial GeoRaster, different
from most other products, takes the enterprise database-centric approach for both data management and data processing. This paper
describes one of the central components of this database-centric approach: the processing engine built completely inside the
database. Part of this processing engine is raster algebra, which we call the In-database Raster Analytics. This paper discusses the
three key characteristics of this in-database analytics engine and the benefits. First, it moves the data processing closer to the data
instead of moving the data to the processing, which helps achieve greater performance by overcoming the bottleneck of computer
networks. Second, we designed and implemented a new raster algebra expression language. This language is based on PL/SQL and
is currently focused on the “local” function type of map algebra. This language includes general arithmetic, logical and relational
operators and any combination of them, which dramatically improves the analytical capability of the GeoRaster database. The third
feature is the implementation of parallel processing of such operations to further improve performance. This paper also presents
some sample use cases. The testing results demonstrate that this in-database approach for raster analytics can effectively help solve
the biggest performance challenges we are facing today with big raster and image data.
1. INTRODUCTION process them in the client or another server. However, moving
the data between the database and the processing engine is
There are some prominent characteristics of geospatial imagery costly given the speed and bandwidth limitations of the
and raster data. First, they are special and complex data types in computer networks.
comparison with structured and simple data types such as
numbers and strings. Second, they require specialized indexing, With the data volume growing exponentially, real-time and
querying, processing and analyzing algorithms. Thirdly, they near real-time processing and analysis of such big data becomes
are generally huge in size, thus they are “big data” in nature. more important and urgent. So, building a fast processing and
These mean that we have to build special processing and analysis solution for the image and raster databases is critical.
analysis engines for the gcospatial image and raster database. Oracle Spatial GeoRaster takes the enterprise database-centric
And scalability and performance of such systems are keys to approach for both data management and data processing. This
success. paper presents one of the central components of this database-
centric approach: the processing engine built completely inside
For geospatial image and raster data archiving and the database. Part of this processing engine is raster algebra,
management, enterprise RDBMS technologies have been which we call the In-database Raster Analytics. There are three
widely used as the foundation. Over the past decade, several key features of this in-database raster analytics engine. First, it
products including GeoRaster, RasDaMan, and ArcSDE have moves the data processing closer to the data instead of moving
demonstrated this database technology (Baumann, 2001. ESRI, the data to the processing. Second, to implement this we
2005. Oracle, 2004). The common feature of these products is designed a new raster algebra language. The third feature is the
to store image and raster data inside RDBMS databases, which implementation of parallel processing of such raster operations
in turn provide the best manageability and security. GeoRaster inside the database. This paper discusses these key
is unique because it takes the database-centric approach (Xie, characteristics of this in-database analytics engine and the
2008a. Xie, 2011). This approach not only builds spatial indices advantages.
but also provides all data management and query operations
inside the database itself. It is truly scalable and provides
greater performance by removing the need of constantly
moving the datasets in and out of the database. 2. IN-DATABASE PROCESSING
For geospatial image and raster data processing and analyzing, In-database processing, also known as in-database analytics,
many advanced and highly efficient desktop systems such as refers to the integration of data processing and analytical
ERDAS Imagine and PCI Geomatica and server-based engines functionalities into the databases or data warehouses. The basic
such as ArcGIS are readily available. When a large-scale idea is to eliminate the overhead of moving large data sets from
enterprise RDBMS based spatial database is built, such desktop the enterprise databases to separate processing and analytical
and server-based systems generally can connect to it and then software applications.
retrieve the imagery and raster data out of the database and
88