Proceedings; XXI International Congress for Photogrammetry and Remote Sensing: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing

chen, jun
109 
Beijing 2008 
THE CLASSIFICATION OF HIGH DIMENSIONAL INDICES FOR SPATIAL DATA 
SIMILARITY SEARCH 
Yu XIA a ’ *, Xinyan ZHU b , Chang LI a 
a School of Remote Sensing and Information Engineering, Wuhan University - geoxy@126.com 
b State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University 
WG IV/1 
KEY WORDS: Spatial Database; Data Management; Digital Photogrammetry; Query Processing; GIS; Similarity Search; Metric 
Space 
ABSTRACT: 
The applications of spatial data similarity search are increasingly needed nowadays, and accordingly high dimensional index 
becomes one key technology to solve the problem of spatial data similarity search. Firstly, the distribution of high dimensional data 
is in-depth analyzed, and then high dimensional data retrieval for spatial data similarity search is also discussed. Secondly, based on 
the research, the classification of high dimensional indices for spatial data similarity search is presented, which initially makes a 
clear distinction of the relationship between the high dimensional index and the application of spatial data similarity search. Finally, 
the principle of high dimensional indices and the state of the applications in spatial data similarity search are analyzed with an 
example of typical index structure respectively, which lays a foundation for the research on index technology in spatial data 
similarity search. 
1. INTRODUCTION 
How to search similar objects of a given object from spatial 
database efficiently, called spatial data similarity search, has a 
wide application needs nowadays, and becomes an increasingly 
important problem. One of the key technologies to solve the 
similarity search problem is the high dimensional index. 
Accordingly, high dimensional index technology has been a 
very active research area over the past decades, and attracts 
many scholars’ attentions. For example, (Antomn Guttman, 
1984) introduced a dynamic index structure called R-tree, 
which was well suited to data objects of non-zero size located 
in multidimensional spaces; In the later years, the classic 
improved index structure R+-tree (Timos Sellis, 1987) and R* - 
tree (Norbert Beckmann, 1990) were presented respectively; 
The TV-tree (King-lp Lin, 1994) was designed for indexing 
high dimensional objects which used a variable number of 
dimensions to get a larger fanouts and fewer disk accesses; In 
(Stefan Berchtold, 1996), the X-tree was proposed which used a 
split algorithm minimizing overlap and additionally utilized the 
concept of supemodes to keep the directory as hierarchy as 
possible and avoid splits in the directory; (David A. White, 
1996) proposed the similarity search tree, called SS-tree, which 
used spheres as page regions; The SR-tree (Norio Katayama, 
1997) integrated bounding spheres and bounding rectangles, 
and used the intersection as the region, and enhanced the 
performance especially for high dimensional and non-uniform 
data; In (Roger Weber, 1998), a vector approximation schema, 
called VA-file, was presented to give the approximation of 
vectors and make the unavoidable sequence scan as fast as 
possible; (Christians Bohm, 2001) gave a general overview of 
index structures and algorithms in high dimensional space; 
(Christians M. Garcia-Arellano, 2002) compared the index 
structures among VA-File, IQ and A-tree, and then proposed a 
new static similarity search strategy called Quantized 
Clustering tree or QC-tree, which integrated the best 
characteristics observed in the IQ-tree and A-tree; (Ye Hangjun, 
2003) presented the vector quantization (VQ) index schema, 
which assumed a Gaussian mixture distribution of real-world 
data, and trained optimized vector quantizer to partition data 
space; (Cui Jiangtao, 2005) proposed PCR-tree index based on 
R-tree and VA-File, which employed R-tree to manage the 
approximate vectors at principal components; In addition, many 
scholars paid much attention to high dimensional index in 
metric space, and proposed many distance based index 
structures where object proximity was only defined by a 
distance function satisfying the positivity, symmetry and 
triangle inequality; (Paolo Ciaccia,1997) introduced the M-tree 
index to organize and search large data sets, which was a 
dynamic balanced distance based index structure; (Tolga 
Bozkaya, 1999) proposed a distance based index structure 
called multi-vantage point (mvp) tree for similarity search, 
which used more than one vantage point and utilized pre 
computed distances between the data points and the vantage 
points; (Gonzalo Navarro, 2002) presented SA-tree index which 
was based on approaching the searched objects spatially. 
(Benjamin Bustos, 2006) presented the Multi-Metric M-tree 
(M3-tree), which was the extension of the M-tree, stored partial 
distances to better estimate the weighted distance between 
routing entries and each search, where a single distance 
function was used to build the whole index. Though the 
researches have attributes to the high dimensional index 
technology, there isn’t systematic research on high dimensional 
index classification. Furthermore, the relationship between high 
dimensional index structure and the application of spatial data 
similarity search is still not clear. Note that the deficiency limits 
further research on high dimensional index technology in 
spatial data similarity search. So the research on the 
classification of high dimensional index technology for feature- 
geoxy@126.com
1
2
...
134
135
136
137
138
...
540
541
Full text: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing (Part B4-1)

Access restriction

Copyright

Note to user