ISPRS, Vol.34, Part 2W2, “Dynamic and Multi-Dimensional GIS”, Bangkok, May 23-25, 2001
204
Relevance
results
Irrelevance
results
No-Name
1951
547
Yahoo
1593
2640
AlltheWeb
1145
3256
Geo-Community
125
23
Table 3 : Query results of term set with 50 terms
Natural Language Support and Linguistic Analysis
Two sentences, "Tell me all about GPS activities in China" and
"I want to know something about AM/FM applications in
Canada", are selected as queries to test the ability of No-
Name and current search engines to support natural language
input and linguistic analysis. The results are shown in Table 4
and Table 5. While "GPS" stands for "Global Positioning
Satellite System", "AM/FM" stands for "Automated Mapping
and Facilities Management".
Relevance
results
Irrelevance
results
No-Name
261
43
Yahoo
132
68
AlltheWeb
1
199
Geo-Community
0
0
Table 4 : Query results of the first sentence
Relevance
results
Irrelevance
results
No-Name
42
7
Yahoo
10
190
AlltheWeb
0
200
Geo-Community
0
11
Table 5 : Query results of the second sentence
Comparison of the system performances before learning
and after learning
No-Name can revise the term connection value by learning
from the users' feedback. A category name "Database" is
chosen as termX, and 5 terms, ("Binary Large Object",
"Conceptual Model", "Data Definition Language",
"Georelational Model", and "Spatial Database") in this category
are chosen to represent the queries. The initial term
connection values between termX and the 5 terms are set to
0.5. Each query can only contain one term. The results of the
evaluation from expert board are used as users' feedback to
train the search engine. 100 results are chosen as feedback
from the total 408 evaluation results randomly. The query
results before learning and after learning are shown in Table 6
and the term connection values before learning and after
learning are shown in Table 7.
No-Name
Relevance
results
Irrelevance
results
Before learning
271
137
After learning
256
92
Table 6 : Query results before learning and after
learning
Term
Connection
value
TermX
(before
learning)
TermX
(after
learning)
Termi
0.5
0.7
Term2
0.5
0.3
Term3
0.5
0.8
Term4
0.5
0.5
Term 5
0.5
0.9
Table 7 : Term connection values before learning
and after learning
CONCLUSIONS
Compared to current search engines, No-Name, an Intelligent
GIS search engine, has a higher performance in Relevant
Sites Retrieval, Irrelevant Sites Dodge, and Natural Language
Identification.
Spider, robot, and full text analysis technology are applied in
No-Name successfully. The information collecting from Internet
is efficient and adequate.
It is realized to determine the strength of term connections in
small size thesauri according to users' feedback using fuzzy
logic. The experiment results support the search engine can
learn from the users’ feedback. The relevance results are
almost kept as same, but the irrelevance results are highly
reduced after training.
The test results show No-Name has the best performance in
natural language supporting among the four chosen search
engines. Anyway, Yahoo also shows now it can support
natural language identification somehow. The improvement
should be attributed to the partnership between Yahoo and
Google that began in June of 2000 [27].
No-Name's unique multiple independent layers structure can
speed the query and help the users to find better matched
results.
Although the experimental results shows No-Name is an
Intelligent GIS search engine to retrieve GIS information from
Internet with high performance, there are still some room left
for improvement. First, the expert board should contain more
experts to avoid misjudgment. The decisions made by only a
few experts are usually fatal to misjudgment and thus become
doubtful. The noise, or misjudgment by one expert is expected
to be weaken by recruiting more experts to the expert board.
Second, The assumption 2, that any term has no connection
with its peers in the same category, is too strong and might be
not the truth. The term connection value among terms in the
same category should be determined in the future which
means the number of term connection will be increased from a
few thousand to several tens of thousand. Finally, GIS is an
explosive interdisciplinary application science, and the terms
that GIS involves with are increasing dramatically. It is a
challenging problem to maintain a reasonable size GIS
thesauri for the efficiency's sake, and at the same time, to
keep up with the advancement of GIS technologies.