308
once.
3. For new text data to be searchable by words
that are not in the index the index must be
updated which means the system must be taken
offline to update the index and build the links
between the new index term and the textual
data. Therefore new entries may not be
immediately retrievable from the data base.
4. A data base administrator is usually required
to schedule and perform routines to update the
index as required.
5. For searches with compound logic statements
the index must be searched multiple times which
slows the retrieval rate. The more complex the
search logic the longer the retrieval.
6. Searches using words that are not indexed
produce incorrect results.
Some products that incorporate this indexed
approach to text retrieval, and run on VAX
equipment running VMS, include: BASIS by
Information Dimensions Inc., TextDBMS by Data
Retrieval Corp., INFO-DB+ by Henco Software,
Inc., and TextWare by Unibase Systems, Inc.
Another approach taken to increase text retrieval
speeds has been to increase the speed of the
traditional string search method. GESCAN, by
GESCAN Inc., uses its own VAX hardware board
to increase the speed of its string searches.
This specialized board is designed to perform
only text search and retrieval. Therefore it can
perform such tasks faster than normal VAX
hardware. Using this string searching approach
eliminates the management problems associated
with indexed test DBMS software mentioned
previously and provides several additional
advantages:
1. The CPU impact on the rest of a time shared
system is reduced since most of the resources
needed for the search are handled by the
additional hardware.
2. The specialized hardware allows the first
query "hit" to be returned to the user while the
search continues processing. In other words,
the user does not need to wait until the search
is complete to start viewing the results.
3. The response time does not depend on the
query complexity. The data is scanned only
4. There is no overhead on the data base. The
disk space required equals the amount of data
in the data base.
5. New entries are always immediately available
for retrieval and can always be retrieved using
the same terminology as when they were input.
An evaluation of the indexed text DBMS
indicated that BASIS, by Information Dimensions
Inc., was the best indexed text DBMS for the
DSI application. GESCAN, by GESCAN Inc.,
was the only string searching option that
accommodated the DSI design objectives. The
cost of GESCAN for a VAX 8600 series was
approximately 60% less than the cost of BASIS,
with modules to support equivalent functions. In
addition, GESCAN Inc. allowed purchasers a 60
day trial period costing less than 10% of the
purchase price. If the software is satisfactory
the trial cost can be applied toward the
purchase. This 60 day trial period allows the
purchaser to develop and test applications
software at a reasonable cost.
To take a close look at the features of the
GESCAN software we provided GESCAN Inc.
with a sample set of data that would be "typical"
for the DSI. Based on the published maximum
search speed of 90 million characters per
second on a system (Schild, 1989) and the
demonstration using our data the response time
with the GESCAN software was very acceptable.
Based on the low system maintenance
requirements, the low system CPU and I/O
overhead requirements, the low cost, and the
features offered by the GESCAN product we
determined that it was the software platform that
best supported the design objectives for the EPA
GCRP DSI.
CONCLUSIONS
Requiring information based on data set usage
customizes the data entry query to more closely
meet the knowledge level of the data entry
scientist. Providing scientists with an easy to
use free-form data entry method promotes
system use and encourages more detailed
descriptions of the data sets in the DSI. Free
form data entry indicated the need for a text
DBMS system rather than a traditional DBMS. A
text DBMS using a string-searching method has