International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B2. Istanbul 2004
Currently, a typical geospatial knowledge discovery process has
the following steps:
I. Find a real-world problem to solve;
2. Develop/modify a hypothesis/model based on the problem;
3. Implement the model or develop an analysis procedure at
local computer systems and determine the data
requirements;
4. Search, find, and order the data from data providers
(Geoquery);
5. Preprocess the data into a ready-to-analyze form. The
preprocessing typically includes reprojection, reformatting,
subsetting, subsampling, geometric/radiometric correction,
etc (Geo-assembly) so that multi-source data can be co-
registered;
6. Execute the model/analysis procedure to obtain the results
(Geocomputation);
7. Analyze and validate the results;
8. Repeat steps 2-7 until the problem is solved.
Because of the multidisciplinary nature, geospatial data from
data centers are very diverse. In many cases, the temporal and
spatial coverages and resolution, origination, format, and map
projections are incompatible. Data users spend considerable
time on assembling the data and information into a ready-to-
analyze form for the geocomputation step, even when the
analysis is very simple. If datasets the user requests are not
ready available at data centers, the geospatial information
system cannot make the datasets for the user on-demand even if
the process to make such datasets is very simple. Users have to
spend considerable amount of time to order and process the raw
data to produce the data products they need in the analysis. It is
estimated that more than 50% of users’ time is spent on the
geoquery and geo-assembly steps of the geospatial knowledge
discovery (Di and McDonald, 1999).
The above mode of operations in geospatial knowledge
discovery assumes that the data will be acquired and input into
the local computer systems for analysis. The user has to have
local analysis hardware, software, and expertise in order to use
the multi-source geospatial data for knowledge discovery and
applications. The mode also requires significant human
involvement in handling the data transactions because the
analysis systems in the users' sites are normally the standalone
systems and are incapable of interoperating with data systems at
data centers. We call this type of mode of operations the
"everything-locally-owned-and-operated (ELOO)" mode. In the
past several decades, the geospatial research and applications
have been all based on the ELOO mode. But this mode has
significant problems:
l. Difficulty. to access the huge volume of multi-source
geospatial data. The process for a general user from
ordering to actually obtaining the data usually takes weeks.
Therefore, many applications requiring real or near-real
time data can only be conducted by very few users who
have access to real data sources.
. Difficulty to integrate the multiple-source data from
multiple data providers. Because users cannot get the data
in uscr-specified form, they need to spend a lot of time and
resources to pre-process the data into a ready-to-analyze
form.
3. Lack of enough knowledge to deal with geospatial data.
Because of the diversity of geospatial data, expert
knowledge in the data manipulation and information
technology is needed to handle such data. Not all users
have such knowledge. In fact, many geospatial research
N
and application projects have to hire geospatial experts to
188
manipulate the data and operate the analysis systems.
However, many potential geospatial users don’t have such
luxury.
4. Lack of enough resources to analyze the data. Many of
current geospatial research and application projects require
handling multi-terabytes of data. In order to conduct such
projects, users have to buy expensive high-performance
hardware and specialized software. In many cases, those
resources are only purchased for a specific project and
when the project is finished, the resources will be set idle.
Because of the above problems, applying geospatial data to
solve the scientific and social problems is a very expensive
business and only few users can afford such luxury. This is the
major reason that although geospatial information and
knowledge have vital scientific and social value, they are not
used as wide as possible in our society.
4. MAKING THE GEOSPATIAL INFORMATION THE
MAINSTREAM INFORMATION
In reality, what most users want is the geospatial information
and knowledge that are ready to be used in their specific
applications, rather than the raw data. However, current
geospatial information systems are incapable of providing
ready-to-use user-specific geospatial information and
knowledge to broad user communities.
In order for geospatial information to become the mainstream
information that everyone can use at will, geospatial
information systems have to be able to provide the ready-to-use
information that fits the individual users’ needs. That means an
ideal geospatial information system must be able to deal
automatically with the distributed nature of geospatial data
repositories and fully automate steps 2-6 of the geospatial
knowledge discovery. The system has to be intelligent enough
so that it can understand the description of the geospatial
problem provided by the general users, ideally in nature
languages, form the problem solving procedure/model
automatically, figure out where the data is located and how to
access them on line, run the procedure/model against the data
without human interferences, and present the result back to
users in human understandable forms. If such a system can be
built, users only need to describe the geospatial problem
accurately and examine the results. A problem that requires
several months of experts’ time to solve at present maybe only
needs minutes or seconds to solve within such a system. Even if
we cannot make such a system reality in next few years, the
recent development in the service oriented architecture (SOA)
and geospatial interoperability standards, as well as the advance
in computer hardware and network makes the construction of
geospatial information systems, which are much more capable
than today’s ones, possible in next few years. Such systems can
fully automate steps 3-6 of geospatial knowledge discovery.
Even with such a system, scientists and engineers can focus
more on the creative process of hypothesis generation and
knowledge synthesis rather than spending huge amount of time
on those tedious data preparing tasks. The system will also
greatly facilitate the construction of complex geocomputation
services and modeling.
5. THE SERVICE ORIENTED ARCHITECURE AND
DISTRIBUTED SERVICES
One of hot research topics in the E-business world is to enable
the interoperable business services at the network environment.
Currently, there are many individual standalone business
Internationa
services ava
for individ
requirement:
requests co
services pro
service-orier
construct a
service syste
information
The key cc
services. À
contained, s
services. Ste
with a servi
service invo
There ds. n
associated w
the descripti
messages th
service. Sta
together to <
include publ
2.
Fig
There are t
providers w|
requesters (i
brokers who
à service prc
users to us
descriptions
clearinghous
requestors ai
right service
they're look
results that 1
right service
negotiate as
services of t
Services mu:
results (chai