Full text: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing (Part B4-1)

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Voi. XXXVII. Part B4. Beijing 2008 
113 
From different points of view, distance based indices can be 
divided into different types (Dong Daoguo, 2005). (1) From the 
perspective of data organization, distance based indices can be 
divided into static indices and dynamic indices, where the static 
indices including VP-tree (Gisli R Hjaltason, 2003) and SA-tree 
(Gonzalo Navarro, 2002), etc., don’t support the dynamic 
operation of insert, delete and so on, and the dynamic indices 
are the index structures, whose creation is the process of 
inserting every data point orderly, and usually from bottom to 
top, including M-tree (Paolo Ciaccia, 1997), Slim-tree (C. Train 
Jr., 2000) and so on; (2) From the angle of distance measure, 
distance based indices can be divided into discrete distance 
indices and continuous distance indices. The former means the 
distance function is discrete, including BK-tree (Edgar Chavez, 
2001) and FQ-tree (Edgar Chavez, 2001). The latter refers to 
the indices whose distance function is continuous, including 
VP-tree (Gisli R Hjaltason, 2003) and M-tree (Paolo Ciaccia, 
1997); (3) From the point of view of index structure, distance 
based indices can be divided into tree type indices and non-tree 
type indices. Just as its names implies, tree type indices are 
organized as tree structures including BK-tree, VP-tree, BS-tree 
(Edgar Chavez, 2001). Non-tree type indices include FQA, 
AES A (Edgar Chavez, 2001 ) and so on. 
With an example of classic index structure VP-tree (Gisli R 
Hjaltason, 2003), the principal of distance based index 
structures is analyzed. The main idea of VP-tree is that data 
space is partitioned into two subspaces in term of the distance 
from vantage point, as shown in figure 5(a). The algorithm 
recursively repeats the partitioning process and forms a 
hierarchical binary tree. As figure 5(e) shows, thè tree is 
generated from data space SS where left branch is built from 
SSI and right branch is generated from SS2. Each internal node 
of the binary vp-tree is of the form (S v , M, R ptr ,L ptr ), where Sv is 
the vantage point, M is the median distance among the distances 
of all the points from S v indexed below that node, and R ptr and 
L ptr are points to the left and right branches ( Tolga Bozkaya, 
2000). 
For a given search point q, if d(q,Svì) - Y > R , as figure 
5 (b) shows, where d(q,Sv 1) can be calculated in real time, 
and r is from the search condition, and R is the median distance 
M of the vantage point Svi, then left branch can be pruned 
which are marked with dashed in figure 5(e); if 
d(q, Svi) + r < R , as figure 5 (c) shows, the right branch 
can be pruned, and so the distance from the search point to the 
vantage point is exempted from calculation; if the two 
conditions of d(q,Sv\) - y < R and d(q,SvY) + Y 
> R are satisfied at the same time, as figure 5 (d) shows, then 
both branches cannot be pruned and further travel in both 
branches is needed. 
Figure 5. The idea of distance-based indices 
From the analysis, we can see that data filtering in VP-tree is 
completed by triangular properties in metric space. In fact, the 
triangle inequality property is applied for pruning the search 
space in all distance based indices. Note that the cost of 
distance calculation is related with not only the definition of 
similarity measure, but also the number of dimension. Usually 
the distance calculation cost grows with the number of 
dimensions. For example, it is needed to calculate the distance 
of two feature vectors with n dimensions. When we take Euclid 
distance as the similarity measure method without consideration 
of the evolution operation, n-1 times addition and n times 
multiplication are needed. If the number of feature vector pairs 
is m, then n m times multiplication operation is needed. For high 
dimension data retrieval, n reaches hundreds, even thousands, 
and m depends on the dataset and the specific index structure. 
Therefore, with the increase of the dimension, the distance 
calculation becomes significantly more expensive. Distance 
based indices are suitable for the application of the spatial data 
similarity search with expensive distance calculation, and 
usually including the case of high dimension. It is true that they 
have been used in many applications of similarity search, and 
are a prosperous research area. 
6. CONCLUSION 
High dimensional data index is one key technology to solve the 
problem of spatial data similarity search. In this paper, the 
quantified analysis of the distribution of high dimensional data 
and the curve reflecting the relationship among the dimension, 
the radius and the probability are given, which show the 
sparsity and distribution tendency of high dimensional data. 
Because of the properties in high dimensional space, traditional 
partition based index can not reflect data distribution well and 
can also not avoid “dimension crisis”, which leads to poor 
performance. For the fact that the case of high dimension does 
often appear in spatial data similarity, aiming to the application 
of spatial data similarity search, we present the classification of 
high dimensional indices for spatial data similarity search: 
partition based indices, approximation based indices and 
distance based indices. Partition based indices locate the 
specific data according to the coordinates, which are suitable 
for the application of spatial data similarity search in the case of 
low dimension, such as less than 10 dimension; Approximation 
based indices implement the retrieval by reducing the cost of 
sequence scan, which are suitable for the high dimension 
similarity search such as over ten dimension to more than 100 
dimension; Distance based indices employ triangle inequity to 
complete the retrieval which are suitable for the similarity 
search of expensive distance calculation, usually including the 
case of high dimension, such as dozens to hundreds of 
dimension.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.