The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part Bl. Beijing 2008
464
continue to be split down. So, we should add a variable for each
leaf node to denote the number of points in this node, while
inserting a new data, examine the variable whether it’s greater
than N, if it is, split the node.
The method based on the space is given a fixed spatial scales R,
such as the volume or length of smallest bounding box
enclosing all data, or the radius of smallest sphere containing all
data in the node. When the node’s spatial scales are smaller
than R, the node needn’t to be split. This approach requires
adding some parameters of smallest bounding box or sphere.
And after each inserting, KD tree need to modify all parameters
from the root to the node where the inserted data stayed.
The N and R defined as a standard above should be estimated
from original data in advance. The above two methods show
some limitations if data are not evenly distributed. With the
number-based standard, the node involves a very large region in
sparse areas, and splits too much in dense areas; while in the
spaced-based standard, the node contains only one point in
sparse area, and involves large number of points in dense areas.
It takes advantage of finding nearest point while based on the
number of points, and takes advantage of simple operation in
display while based on the space. For our data management
purpose, we choose the space-based standard, but some rules
are added:
1) . Based on the needs of displaying and processing, set a
normal space scale rMin for splitting. If the node’s spatial scale
is less than rMin, the node needn’t be split;
2) . Considering the operational environment and complexity,
set the maximum number of points nMax. If the node’s number
of points is larger than nMax, the node must be split into two
nodes until the number is less then nMax;
3) . While the node’s spatial scale is no less than rMin and the
number of points is no larger than nMax, we could split the
node as less as possible by taking in some error. Assume the
node is P. If it is split into two sub-nodes—A and B, calculate
the ‘percentage of precision loss’ f(P). Give a variable u, if
f(P)<u, point P needn’t to be split into two.
PL is the plane which minimizes the squared distances to the
points of P. g(P) means the squared minimum distances, V1(P)
and V2 represent the volume of smallest bounding box of P and
the cube of rMin respectively, k is a specified coefficient
defined by the programmer. The formula can be defined as
f(P)= g(P)/[g(A)+g(B)], and u=k*Vl(P)/V2 .
Figure 1. The left figure shows a KD partition based on the space, where the node contains only one point is spares areas,
and can involves a large number of points in dense areas. The other shows a KD partition based on the number of 7,
this method can make the region very large or very small depending on the dense.
The structure of KD tree’s node and the tree is:
template <typename Xtype>
class KDNode
{
public:
int axis;
Xtype xfSDJ;
Xtype (*xList)[SDJ;
int dataNum;
void insertData(Xtype x[][3],int n);
Xtype bound cube jnin[3];
Xtype bound cube max[3];
KDNode(Xtype*xO, int axisO);
KDNODE * Insert(Xtype * x);
// Definition of data in nodes
// Definition of the number of data in each node
//While the node needn 't to split, put all the data into node
// The bound of smallest bounding box