4.1.2 Feature Index
ICP algorithm is critical to the selection of initial point pairs.
Otherwise the algorithm can easily fall into local minimum.
When camera tracking fails, the assumption that high degree of
similarity between two frames exists will no longer tenable. So
it is obligatory to provide initial point pairs for ICP algorithm.
Note that it is difficult to extract effective feature information
from depth image. In practical applications visual features are
more reliable and visual features can measure the degree of
similarity between two frames. So in SLAM visual features are
applied to detect loop closure. For the establishment of Feature
Index, RGB image is used to extract SIFT features. In practice it
has been proven that SIFT feature is invariant, even for images
with scale change and rotation.
4.1.3 Virtual Camera
Raw depth image is not accurate and contains lots of noise. The
reason that voxel-based data fusion can generate 3D model with
high geometric fidelity is that at same area RGB-D data are
collected from multiple angles. So the generated 3D model is a
weighted average of those data. To improve the accuracy of
registration, virtual depth image is generated combining the 3D
model with current camera position %, Compared to raw depth it
has a higher geometric precision.
As we have a dense surface reconstruction and camera’s global
position Ti, per pixel ray-cast can be performed (Parker, 98).
Each pixel’s corresponding ray is marched starting from the
minimum depth for the pixel and stopping when a zero crossing
is found indicating the surface interface. Then the distances
corresponding to its pixel position is recorded which is used to
generate a virtual depth image.
4.2 Sample Frame Extraction
There are two main components in the process of camera
relocalization. First, a process of data collection, a set of sample
frames is extracted from the map. Second, a sample frame
which is the best match of current frame is determined from the
data set. Note that when the graph is constructed geometric
constraints between consecutive frames are recorded. And there
is only small motion of the camera from frame to frame. So we
can utilize the geometric relationship to reduce the number of
data to be used soon afterwards. As we know that visual
information is often to be used to evaluate the degree of
similarity between frames. So in this paper we extract SIFT
feature from RGB image and match them between current
frame and sample frames. Besides, SIFT pairs can provide
initial point pairs for ICP algorithm. SIFT is widely used feature
detector and descriptor (Lowe, 2004). Though the descriptors
are very distinctive, they must be matched heuristically and
there can be false matches. To determine a subset of feature
pairs corresponding to a consistent rigid transformation
RANSAC algorithm is used. Additionally, the RANSAC
associations act as an initialization for ICP, which 1s a local
optimizer. For the 3D point clouds we employ a point-to-plane
ICP algorithm to compute rigid 6DOF transformation. To
evaluate the accuracy of those matches color similarity
measurement is conducted. Eventually the best match is
selected to recover current camera position. The overall process
of camera relocalization is as follows in listing 1,
Listing 1 Camera Re-localization
1: Nodes = Find Sub Data Set From Graph(distance)
2F= Extract RGB SIFT Features(Ps)
3:1f {F} = Find_ Similar Samples(Feature Index; F”)
For each sample in {F}
4: t=Perform RANSAC Alignment(F;; F”)
Repeat
270
S T'- Compute Closest Points(t; P,; P)
6: until ( Error Converge(t) 0 ) or (max Iteration reached)
g return T'
8: S,= Computer Color Similarity Measurement(P,; Pi; T^)
9: T- Get Max Similarity(S;)
10: Relocate Camera Position (T)
Consecutive depth frame, with an associated live camera pose
estimate, is fused incrementally into one single 3D
reconstruction. After a period of time a huge amount of RGB-D
data are accumulated and maintained by the graph structure.
Note that camera moves at a certain rate. So even if camera
tracking has failed, current position of the camera must locate
around the last sample frame within a certain range.
Consequently if the last sample frame in the graph is picked as
data centre and a radius is pre-defined we can employ Spatial
Index to extract a data set {F}. For simplicity, sample frames
are noted as F; and current frame F. F; which is the best match
of F' will lie in the data set. However, as F; varied from each
other we have to measure the degree of similarity between F;
and F'. Firstly we extract sparse visual features from F' and
associate them with their corresponding depth values to
generate feature points in 3D which will be used later. Then
those features are matched heuristically with features kept by
Feature Index of each node in the data set. This part will be
elaborated in section 1.1.2. Then the number of successfully
matched feature pairs is an indication of the degree of data
similarity. And we will choose 2-3 sample frames from the data
set which rank the top in this procedure.
4.3 Selection Strategy
The ICP algorithm iterates between associating each point in
one time frame to the closest point in the other frame and
computing the rigid transformation that minimizes distance
between the point pairs. However the important first step of ICP
is to find correspondences between frame pairs, otherwise the
ICP algorithm will easily converge to a local minimum. So we
use feature selection to provide initial corresponding point pairs
for ICP algorithm. Through the above procedures 2-3 sample
frames are selected. To determine which frame should be used
to recover camera pose color similarity is introduced.
Combining the color similarity criterion, registering is much
more robust in difficult cases and the result becomes more
reliable.
4.3.1 SIFT+RANSAC
In the process of SIFT match the best candidate match for each
keypoint of F' is found by identifying its nearest neighbor in the
database of keypoints of F;. Consider N pairs of initial feature
pairs between frame F and Fj, represented by vectors (X; Y) in
their respective coordinate system. RANSAC samples the
solution space of (R; T) (rotation and translation) and counts
the number of inliers, f,
f(,T) - 1GC Y^, RT) (D
Where I will be inlier if X' and Y' fit well with a pre-defined
threshold under the constraint of (R; T). A inlier will be
counted as 1 otherwise 0. RANSAC chooses the transform
consistent with the largest number of inlier matches.
4.3.2 ICP
In 2D because of the scale indeterminacy the frame pairs are not
finely aligned. That means the registration accuracy is not
precise enough. ICP is a popular and well-studied algorithm for
fir
thi
Tc
se
di
Th
th
fir