Full text: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing (Part B4-1)

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B4. Beijing 2008 
110 
based spatial data similarity search has the theoretical value and 
practical significance. 
2. HIGH DIMENSIONAL DATA RETRIEVAL 
Spatial data similarity search is generally based on spatial data 
features, such as image features, including color features, 
texture features, shape features, etc. , and vector features, 
including topological features, direction features, measure 
features, etc., and so on. The kind of similarity search are called 
feature-based spatial data similarity search. These features are 
usually represented as high dimensional vectors. For example, 
the homogenous texture descriptors recommended by MPEG-7 
are represented as a vector of 62 dimensions, and the edge 
histogram descriptors are denoted as a vector of 150 dimensions. 
Therefore, one key problem of feature-based spatial data 
similarity search is high dimensional data retrieval efficiently. 
2.1 High dimensional data distribution characteristics 
To explore the nature of high dimensional data retrieval, high 
dimensional data distribution is firstly discussed. In 2 
dimension space, the difference between the area of the 
minimum boundary rectangle and that of the minimum 
boundary circle is relatively small, and the ratio is 
(2 X y[2 /2 X r) 2 /( 71 r 2 ) = 0.64; However, in 64 dimension 
space, the ratio becomes (2X >/64 /64Xr) 64 / {71 32 r 64 /32!) = 
9.54 X 10 20 ; In 256 dimension space, the ratio evenly decrease 
to (2 X >/256 /256 X r) 256 / ( 71 128 r 256 /128!) = 5.76 X 10' 80 , 
where r denotes the radius of the hypersphere. 
From the two examples, we can see that high dimension data 
are very sparse in high dimensional space, and the minimum 
boundary hypercube and minimum boundary hypersphere don’t 
reflect the distribution of high dimensional data well. Provided 
that the volume of over 75 percent of radius of the hypersphere 
is defined interior surface, the probability of a data point 
located in the interior surface in 2 dimension space is {71 r 2 - 
71 (3r/4) 2 )/( 7t r 2 ) = 0.438, and in 6 dimension space the 
probability is ( Jt 3 r 6 /6!- 71 3 (3r/4) 6 /6!)/( 7t 3 r 6 /6!) = 0.822. 
The rest may be deduced by analogy, and the change of the 
probability with the dimension is shown as figures 2. 
Figure 2. Probability change of data points located in interior 
surface with dimension in the case of the radius threshold is 
75% of that of the hypersphere. 
It is shown that from the figure 2, as the dimension grows, the 
probability of distribution in interior surface increases, and 
when the dimension is 8, the probability reaches 90 percent. 
As figure 1 shows, with the increase of the dimension, the ratio Note that if interior surface is defined as the volume of over 
decreases sharply, and the difference between the volume of the different radius threshold, the probability of data distribution in 
minimum boundary hypercube and that of the minimum interior surface is different in the case of the same dimension, 
boundary hypersphere becomes significant. Therefore, in the In other words, the dimension is different in the case of same 
case of high dimension, the minimum boundary hypersphere probability of data distribution in interior surface when the 
doesn’t reflect fittingly the distribution of high dimensional data. interior surface involves different radius. 
Figure 1. The change of the ratio of the volume of the 
hypercube and that of the hypersphere with dimension 
Further, supposing 1,000,000 points distributes in a 256- 
dimension hypercube and each dimension is divided into 2 units, 
2 256 = 1.16 X 10 77 units cubes are generated which exceed 
extremely the number of data points. For uniform datasets, there 
aren’t points in the most of the units, and the minimum 
boundary hyperscube doesn’t reflect the distribution of high 
dimensional data as well as the minimum boundary hypersphere. 
Figure 3. The dimension changes with the ratio of the radius 
threshold and the radius of the hypersphere when the 
probability of data distribution in interior surface reaches 80% 
From the analysis, we conclude (1) high dimensional data point 
is very sparse in high dimension space, and traditional partition 
based indices don’t reflect the distribution of high dimensional 
data well; (2) high dimensional data points incline to locate the 
interior surface, and with the increase of the dimension, the 
tendency is more notable. The related work includes (Yue Li, 
2004), (Jinhua Li, 2001), (Tolga Bozkaya, 1997) and so on, but 
the researches don’t involve the quantified analysis and give the
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.