×

You are using an outdated browser that does not fully support the intranda viewer.
As a result, some pages may not be displayed correctly.

We recommend you use one of the following browsers:

Full text

Title
The 3rd ISPRS Workshop on Dynamic and Multi-Dimensional GIS & the 10th Annual Conference of CPGIS on Geoinformatics
Author
Chen, Jun

ISPRS, Vol.34, Part 2W2, “Dynamic and Multi-Dimensional GIS”, Bangkok, May 23-25, 2001
199
AN INTELLIGENT GIS SEARCH ENGINE TO RETRIEVE
INFORMATION FROM INTERNET
ZheLIU, YongGAO
The University of Calgary
Geomatics Engineering, 2500 University Drive NW., Calgary, AB, CANADA, T2N 1N4
FAX: 403-2841980; TEL: 403-2208230, 403-2206174
EMAIL: zheliu@ucalgarv.ca. qao@geomatics.ucalgary.ca
Keywords: Intelligent GIS Search Engine, Internet Information Retrieval, Fuzzy Logic, GIS Thesauri
ABSTRACT
There are a lot of search engines available to retrieve GIS related information on Internet, such as the general search engines like
Yahoo, AlltheWeb, or the specially designed search engines for GIS Community like GeoCommunity. It is general for these current
search engines to provide irrelevant results and miss relevant results when they are used to retrieve GIS related information. No-Name,
an intelligent search engine designed by author, is developed intentionally to overcome the disadvantages and inabilityof current search
engines to serve GIS community. No-Name builds a reasonable size keywords database by carefully choosing about 2000 GIS terms to
construct a GIS thesauri. Since the terms are usually relevant to each other, like a document represented by "Oracle" and "SQL Server"
has an uncertain relevance to a query represented by "Database", the term connection value that represents the term relevance degree
can be determined by fuzzy logic. No-Name ranks the websites and documents according to both thefrequency of keywords occurring
in the text including the heads, titles and bodies with different weights and the relevance level between the keywords according to the
term connection strength. No-Name supports two independent layers, named term layer and location layer. The term layer has a tree
like structure with about 2000 GIS terms to construct GIS thesauri. The location layer has a tree-like structure with about 6 continents
and 100 countries that are considered active in GIS field. Moreover, No-Name has limited power to support nature language input and
Linguistic Analysis. Several experiments were carried out to assess the performance of No-Name and some famous current search
engines. The test results show that No-Name is better than current search engines in three respects: Relevant Sites Retrieval, Irrelevant
Sites Dodge, and Natural Language Identification. The test results also show that No-Name can revise the term connection values by
learning from users' feedback.
INTRODUCTION
Any search engine will fall into one of two categories: for
general purpose or for a special community. A general search
engine is designed for general populations. Their service
covers a wide variety of subjects, and it is the reason why
most of current search engines are now used for general
purpose. Yahoo and AlltheWeb are good examples of General
purpose search engine. However, with width they can not go
depth further. On the contrary, GIS search engine is one
application of the specially designed search engines for one
community. It focuses its service on providing exhaustive GIS
information, and its customers are mainly from GIS
community. Example for GIS search engine is Geo-
Community, founded by GeoComm International Corporation
in 1995.
Generally speaking, GIS search engine can provide better
service than General purpose search engine for GIS
community. By constructing a more compact structure than
General purpose search engine, The GIS search engine can
retrieve more complete, relevant and balanced GIS information
with much lower cost. The work is easier for GIS search
engine to increase its performance by retrieving relevant
information and dodging irrelevant information than that for
General purpose search engine.
However, current GIS search engines like Geo-Community
can improve their performance further by applying fuzzy logic
and text analysis technology. Fuzzy logic can be used to solve
the relevance degree between terms. Current information
retrieval method like Bayesian network model [1] tends to
assume the terms are atom-like and independent, but the
independence assumption is not realistic: Document cannot be
represented by a set of independent terms. Terms are inter
independent in most application areas [2]. For example, a
document represented by "Oracle" and "SQL Server" is usually
relevant to a query represented by "Database" to some extent.
The thesaurus Wordnet [3] simplify the term relevance
relations by dividing them into several types. Table 1 shows
the types of relations. In fact, the relevance degree of every
term pair is different form each other and the simplification will
impair the relationship between the query and documents. For
a small size thesaurus, like a GIS thesaurus containing about
2000 keywords, it is possible to construct a term relevance
factor table to quantify the relevance degree of every term-
pair. In practice, the term relevance factor table can be first
set by experts in GIS fields and then adjusted by the feedback
from users.
relation
example
synonymy
computer - data processor
antonymy
big - small
hyponymy
tree - maple
(is a)
hypemymy
maple - tree
( a kind of)
meronymy
computer - processor
( is part of )
holonymy
processor - computer
( has a )
Table 1: Some relations offered by Wordnet
Most of current search engines classify the website by
analyzing the heads and titles. Text analysis technology can
be used to analyze the whole text of a web page. The concept
of text analysis technology was first introduced by IBM in 1999
and was applied in IBM Text Search Engine. By recording the
frequency of keywords occurring in the text and rendering
weight when the keywords occurring at the heads and titles,
search engine applying text analysis technology can rank the
website according to the correlation level to the keywords