The 3rd ISPRS Workshop on Dynamic and Multi-Dimensional GIS & the 10th Annual Conference of CPGIS on Geoinformatics

Chen, Jun
ISPRS, Vol.34, Part 2W2, “Dynamic and Multi-Dimensional GIS”, Bangkok, May 23-25, 2001 
201 
great deal of efforts have been spent in defining IR suited 
semantic networks based on manually constructed thesauri 
[[20], [21], [22], [23]]. Metric over semantic networks is 
determined by measuring the similarity between two terms 
mainly according to the topography of the thesaurus (the 
number and length of links). Two problems may occur in these 
systems. First, the estimation of the strength of term 
connections which is based heavily (if not only) on the use of 
thesaurus topography may fail to reflect the real strength of the 
connections. This strength also depends on the nature of the 
relations between them which affects their relevance to some 
application area. Second, the metrics used to measure term 
connection are often symmetric: for a metric m, we have m 
(a,b) = m (b,a) for any pair of terms a and b. This property is 
obviously counterintuitive. For example, a document about 
object-oriented languages should be more relevant to a query 
on programming languages than in the reverse situation. 
A new method was recently proposed to revise the strength of 
term connections according to users' feedback using fuzzy 
logic [[2],[24]]. The core idea can be represented as follows. 
Assuming one term is relevant to another if a document 
represented by the first term is relevant to the query 
represented by the second term alone. The term relevance 
can be represented with a fuzzy implication relation such as 
aro b where P G [0,1] . In this way, the entire 
P 
thesaurus may be represented as a set of fuzzy term 
relevance relations: 
{ {cr=>p b}, 
The key problem lies in the estimation of term relevance 
strength (3 given a thesaurus relation between two terms 
according to the users' judge on if the documents represented 
by a is related to query represented by b only. 
The principle goes as follows. The system gives a tentative 
query evaluation and provides an answer (a set of ordered 
documents). Then the user is required to give his or her own 
relevance evaluation of the retrieved documents. The user’s 
evaluation is used by the system to revise the strength of term 
relevance relation in order to better fit the user's evaluation. 
This new method is right in concept but is hard to be realized 
on a large size thesauri like WordNet, which contains about 
words and phrases. To determine the strength of each term 
connection, in other word, a node of the semantic networks 
which contains enumerable nodes, needs infinite users' 
feedback, which will rapidly increase the cost and time. 
Moreover, biased users' feedback, for example, users' 
feedback focuses on some nodes heavily but scares on other 
nodes, will deteriorate the final results. In practice, the full 
thesauri is divided into a few groups, a group of thesaurus 
relations are adopted instead of individual relations among 
terms [2]. However, the results will be definitely deteriorated 
since the group classification is very coarse and can not 
represent the true relations among terms. 
Although this new method, to revise the strength of term 
connections according to users' feedback using fuzzy logic, is 
hard or impossible to be realized in a large size thesauri, it is 
very possible to be realized in a small size thesauri. The 
following section describes how this method is improved and 
realized on a GIS thesauri, which contains about 2000 terms. 
INFORMATION RETRIEVAL BASED ON GIS THESAURUS 
thesaurus have to be updated frequently, even the structure of 
the GIS thesaurus might need to be renewed in a few years. 
To enhance the reliability and provide an adequate service, 
any update on the GIS thesaurus should be approved by the 
GIS expert board. 
The GIS thesaurus prototype adopted in No-Name, the GIS 
intelligent search engine, are composed of about 2000 GIS 
terms and further divided into 16 categories. Each category 
contains about a few tens to a few hundreds terms. Figure 1 
shows the structure of the GIS thesaurus prototype. 
Figure 1: GIS thesaurus structure 
The possible combination for 2000 terms can be as much as 
2000 * 2000 = 4,000,000. Notice (a,b) is different from 
P (b,a) where a, b represent any two keywords. /? (a,b) 
stands for term connection valuefrom a to b. Two assumptions 
are made to simplify the computation for term connection 
strength. 
Assumptioni: ¡3 (A0,Ai) = 0; /? (Ai, AiJ) = 0 
where A0 stands for “GIS technology”, Ai (i=1.... 16) stands for 
the categories like “Data Collection” and “Data organization” 
(notice the categories are also terms), and Ai,j (j =1, ...) stands 
for the terms under category i. 
Assumption2: ft (Ai,Aj) = 0 (i <> j); p (Ai.Aj) = 1 (i = j); 
(3 (Aij, Ap,q) = 0(i <>p orjoq); f3 (AiJ, Ap,q) = 1(i =p or 
j=Q)- 
The first assumption shows that a document represented only 
by term “Data collecting" has no relation with a query 
represented only by “GPS”, but a document represented only 
by term “GPS” relates to a query represented only by “Data 
collecting” to some degree. The second assumption shows 
that a documen represented only by term “GPS” has no 
relation to a query represented only by "Digitizing” or “Vector” 
and vice versa. 
The number of term connection with unknown values is 
compressed from 4,000,000 to around 2,000 under these two 
assumptions. It is now a reasonable work for the expert board 
to set a prior value for each term connection and it can be 
realized to modify the prior values for term connection by 
users’ feedback. 
The tree-like GIS thesaurus is constructed by adding a few 
hundred GIS terms into an online GIS dictionary [25]. With the 
fast advancement of GIS technology, the GIS terms of the GIS
1
2
...
212
213
214
215
216
...
448
449
Full text: The 3rd ISPRS Workshop on Dynamic and Multi-Dimensional GIS & the 10th Annual Conference of CPGIS on Geoinformatics

Access restriction

Copyright

Note to user