Full text: Proceedings of the Symposium on Global and Environmental Monitoring (Part 1)

DEVELOPMENT OF A DATA SET INDEX FOR THE GLOBAL CLIMATE RESEARCH PROGRAM 
Donald R. Block and Edward H. Barrows 
NSI Technology Services Corporation, Research Triangle Park, N.C. 27709 
ABSTRACT 
The U.S. Environmental Protection Agency Global Climate Research Program (GCRP) is being set up 
to conduct multiple research efforts at many different laboratories distributed across the United States. 
Each of these research efforts will require very large quantities of data for analysis. The distributed 
nature of the research effort and the dynamic nature of the data itself argue against a centralized 
depository for all data. Distributed data systems, however, will cause redundancies when the same data 
set must be used at more than a single location. The establishment of independent data centers at 
each research location can be prohibitively expensive. At the same time, program management needs 
to be able to identify and act on a variety of data problems. 
A system design combining the elements of a centralized data set index and distributed processing 
proves to be both cost-effective and responsive to the research requirements. Based on a text-retrieval 
software platform, a centralized index provides information not only on the data sets being utilized in the 
research, but also facilitates the movement of data sets to other locations, and tracks the uses and users. 
The evaluation process and products that were examined in the analysis of alternatives indicate a clear 
choice in an innovative combined hardware/software platform. 
KEY WORDS: Data Set Index, Indexing, Data Management, Text Retrieval, Text BDMS 
INTRODUCTION 
The data set index (DSI) for the GCRP is a data 
base containing information about all the GCRP 
data sets. It is the mechanism that ties the 
multiple, geographically distributed physical data 
bases together into a single logical data system. 
The DSI does not function as a master data 
dictionary like IBM’s Repository or DEC’S 
Common Data Dictionary. The DSI stores 
general descriptive metadata about GCRP data 
sets that can be used by scientists for reviewing 
and locating these data sets. The DSI for the 
GCRP contains descriptive information about a 
data sets’ spatial and temporal density and 
extent, parameters, usefulness, usage, and 
quality. It identifies the sources of data sets and 
monitors their movement throughout the GCRP 
information system. This paper outlines the 
design and development approach used to 
identify and implement a practical and effective 
method of storing and retrieving metadata about 
data sets for the United States Environmental 
Protection Agency (EPA) GCRP. 
BACKGROUND 
research effort often requires information from 
many separate sources such as; universities, 
foreign governments, local governments, other 
federal agencies and from within EPA. Early in 
the development of the data management plan 
for the GCRP it was determined that, since the 
research effort would be widely distributed, the 
pertinent data should be located and managed 
at locations in close proximity to the research. 
However, it was evident that the management of 
this distributed data would require some degree 
of centralization. To facilitate project research, 
reduce redundancy, and promote data sharing, 
a centralized method of monitoring and reporting 
the available GCRP data sets led to the concept 
of a data set index (DSI). 
DESIGN 
The initial question that arose during the design 
phase was "Who would use the DSI and for what 
purposes?". Answering this question would help 
identify the kind of data to be stored in the DSI. 
Since the concept for the DSI arose to promote 
data sharing and to facilitate cross project 
research it was evident that the primary users of 
the DSI would be scientists looking for data sets 
that supported their research objectives. To 
make this kind of determination scientists 
required that the DSI store more than just simple 
The organization of the EPA GCRP consists of 
multiple research efforts located at EPA 
laboratories throughout the United States. Each 
305
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.