Full text: The 3rd ISPRS Workshop on Dynamic and Multi-Dimensional GIS & the 10th Annual Conference of CPGIS on Geoinformatics

ISPRS, Vol.34, Part 2W2, “Dynamic and Multi-Dimensional GIS”, Bangkok, May 23-25, 2001 
199 
AN INTELLIGENT GIS SEARCH ENGINE TO RETRIEVE 
INFORMATION FROM INTERNET 
ZheLIU, YongGAO 
The University of Calgary 
Geomatics Engineering, 2500 University Drive NW., Calgary, AB, CANADA, T2N 1N4 
FAX: 403-2841980; TEL: 403-2208230, 403-2206174 
EMAIL: zheliu@ucalgarv.ca. qao@geomatics.ucalgary.ca 
Keywords: Intelligent GIS Search Engine, Internet Information Retrieval, Fuzzy Logic, GIS Thesauri 
ABSTRACT 
There are a lot of search engines available to retrieve GIS related information on Internet, such as the general search engines like 
Yahoo, AlltheWeb, or the specially designed search engines for GIS Community like GeoCommunity. It is general for these current 
search engines to provide irrelevant results and miss relevant results when they are used to retrieve GIS related information. No-Name, 
an intelligent search engine designed by author, is developed intentionally to overcome the disadvantages and inabilityof current search 
engines to serve GIS community. No-Name builds a reasonable size keywords database by carefully choosing about 2000 GIS terms to 
construct a GIS thesauri. Since the terms are usually relevant to each other, like a document represented by "Oracle" and "SQL Server" 
has an uncertain relevance to a query represented by "Database", the term connection value that represents the term relevance degree 
can be determined by fuzzy logic. No-Name ranks the websites and documents according to both thefrequency of keywords occurring 
in the text including the heads, titles and bodies with different weights and the relevance level between the keywords according to the 
term connection strength. No-Name supports two independent layers, named term layer and location layer. The term layer has a tree 
like structure with about 2000 GIS terms to construct GIS thesauri. The location layer has a tree-like structure with about 6 continents 
and 100 countries that are considered active in GIS field. Moreover, No-Name has limited power to support nature language input and 
Linguistic Analysis. Several experiments were carried out to assess the performance of No-Name and some famous current search 
engines. The test results show that No-Name is better than current search engines in three respects: Relevant Sites Retrieval, Irrelevant 
Sites Dodge, and Natural Language Identification. The test results also show that No-Name can revise the term connection values by 
learning from users' feedback. 
INTRODUCTION 
Any search engine will fall into one of two categories: for 
general purpose or for a special community. A general search 
engine is designed for general populations. Their service 
covers a wide variety of subjects, and it is the reason why 
most of current search engines are now used for general 
purpose. Yahoo and AlltheWeb are good examples of General 
purpose search engine. However, with width they can not go 
depth further. On the contrary, GIS search engine is one 
application of the specially designed search engines for one 
community. It focuses its service on providing exhaustive GIS 
information, and its customers are mainly from GIS 
community. Example for GIS search engine is Geo- 
Community, founded by GeoComm International Corporation 
in 1995. 
Generally speaking, GIS search engine can provide better 
service than General purpose search engine for GIS 
community. By constructing a more compact structure than 
General purpose search engine, The GIS search engine can 
retrieve more complete, relevant and balanced GIS information 
with much lower cost. The work is easier for GIS search 
engine to increase its performance by retrieving relevant 
information and dodging irrelevant information than that for 
General purpose search engine. 
However, current GIS search engines like Geo-Community 
can improve their performance further by applying fuzzy logic 
and text analysis technology. Fuzzy logic can be used to solve 
the relevance degree between terms. Current information 
retrieval method like Bayesian network model [1] tends to 
assume the terms are atom-like and independent, but the 
independence assumption is not realistic: Document cannot be 
represented by a set of independent terms. Terms are inter 
independent in most application areas [2]. For example, a 
document represented by "Oracle" and "SQL Server" is usually 
relevant to a query represented by "Database" to some extent. 
The thesaurus Wordnet [3] simplify the term relevance 
relations by dividing them into several types. Table 1 shows 
the types of relations. In fact, the relevance degree of every 
term pair is different form each other and the simplification will 
impair the relationship between the query and documents. For 
a small size thesaurus, like a GIS thesaurus containing about 
2000 keywords, it is possible to construct a term relevance 
factor table to quantify the relevance degree of every term- 
pair. In practice, the term relevance factor table can be first 
set by experts in GIS fields and then adjusted by the feedback 
from users. 
relation 
example 
synonymy 
computer - data processor 
antonymy 
big - small 
hyponymy 
tree - maple 
(is a) 
hypemymy 
maple - tree 
( a kind of) 
meronymy 
computer - processor 
( is part of ) 
holonymy 
processor - computer 
( has a ) 
Table 1: Some relations offered by Wordnet 
Most of current search engines classify the website by 
analyzing the heads and titles. Text analysis technology can 
be used to analyze the whole text of a web page. The concept 
of text analysis technology was first introduced by IBM in 1999 
and was applied in IBM Text Search Engine. By recording the 
frequency of keywords occurring in the text and rendering 
weight when the keywords occurring at the heads and titles, 
search engine applying text analysis technology can rank the 
website according to the correlation level to the keywords
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.