Full text: Proceedings of the Symposium on Global and Environmental Monitoring (Part 1)

306 
facts about a data set. They required the 
storage of detailed information that would 
provide insight into the usefulness of a data set 
relative to their projects. This kind of 
information was only available from other 
scientists who had knowledge of the data sets. 
Realizing that getting a scientist to enter detailed 
information about a data set into the DSI would 
be difficult we identified ease of data entry as a 
top priority in the system design. 
To make data entry convenient we decided to 
require only a minimum amount of information 
based on a data sets usefulness to the GCRP. 
Primarily we wanted to give the scientists the 
freedom to add information and categories of 
information whenever they determined it would 
be useful in describing a data set. When 
attempting to identify the minimum amount of 
required information we recognized that in many 
cases a scientist’s knowledge of a data set 
would be directly related to the data set’s 
usefulness in the GCRP. With this in mind we 
developed a classification scheme for data sets 
based on their level of use within the GCRP 
resulting in four levels of usage: investigated, 
accessed, acquired, and generated (Figure 1). 
The scientist’s knowledge of a data set would 
most likely grow as the data set proceeded 
through these levels of usage. Therefore the 
DSI was designed to require more information as 
the usage of the data set increased. 
In addition to identifying the quantity of 
information required in the DSI based on data 
set usage we identified the categories of 
information that would be required to describe 
a data set. To do this we identified the users 
of the DSI from a data retrieval perspective. 
Primarily we wanted the DSI to serve scientists 
as a type of library system for data sets. If 
scientists had a data need we wanted them to 
first search the DSI to see if the data they 
required had already been located by other 
scientists in the GCRP. This would help 
facilitate the acquisition of the data for the 
scientist while also reducing the cost of data 
duplication for the program. Secondarily we 
wanted the DSI to serve as a data management 
tool for the GCRP data manager. Using 
information from the DSI the data manager 
would be able to report more holistically on the 
status of GCRP data to program managers while 
also keeping track of data redundancy, data 
quality, data quantity, data usage and data 
distribution. 
To satisfy these user needs we identified 5 
categories of information to store in the DSI. 
1. Data Set Information 
Information about the data set such as the data 
set name, a description of the data set, the 
source of the data set, the primary scientific 
purpose for the data set’s existence, whether a 
data dictionary for the data set exists, the quality 
of the data dictionary, the major parameters 
stored in the data set. 
2. Data Set Retrieval Information 
Information to assist in retrieving this metadata 
such as acronyms for the data set name and 
parameters, keywords used to reference the data 
set, spatial features recorded in the data set for 
locating sampling locations, and temporal 
descriptors used to define the frequency and 
duration of sampling events. 
3. Data Set Recorder Information 
Information about the person or group making 
the DSI entry such as their name, level of 
understanding of the data set, location, phone 
number, and their affiliation within the GCRP. 
4. Data Set Usage Information 
Information about the uses and/or usefulness of 
the data to the GCRP such as the potential 
usefulness of investigated data, uses of acquired 
data, and the name of the person or research 
group using the data set. 
5. Data Set Acquisition Information 
Information about the acquisition and storage of 
the data such as the location of the data, name 
of the data analyst in charge of the data, and 
the formats in which the data is stored. 
The metadata within the DSI will be stored in 
these categories and the data retrieval software 
will use the categories to provide more options 
for selective data retrieval. The relationship 
between the DSI required information and the 
usage level of data sets as shown in Figure 2. 
For investigated data sets only minimal 
information for categories 1, 2, and 3 will be 
required. As the level of usage increases, 
information from all 5 categories will be required 
and the amount of detail will increase. 
Another consideration concerning the DSI design
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.