XXII ISPRS Congress 2012: Technical Commission III

  
4.1.2 Feature Index 
ICP algorithm is critical to the selection of initial point pairs. 
Otherwise the algorithm can easily fall into local minimum. 
When camera tracking fails, the assumption that high degree of 
similarity between two frames exists will no longer tenable. So 
it is obligatory to provide initial point pairs for ICP algorithm. 
Note that it is difficult to extract effective feature information 
from depth image. In practical applications visual features are 
more reliable and visual features can measure the degree of 
similarity between two frames. So in SLAM visual features are 
applied to detect loop closure. For the establishment of Feature 
Index, RGB image is used to extract SIFT features. In practice it 
has been proven that SIFT feature is invariant, even for images 
with scale change and rotation. 
4.1.3 Virtual Camera 
Raw depth image is not accurate and contains lots of noise. The 
reason that voxel-based data fusion can generate 3D model with 
high geometric fidelity is that at same area RGB-D data are 
collected from multiple angles. So the generated 3D model is a 
weighted average of those data. To improve the accuracy of 
registration, virtual depth image is generated combining the 3D 
model with current camera position %, Compared to raw depth it 
has a higher geometric precision. 
As we have a dense surface reconstruction and camera’s global 
position Ti, per pixel ray-cast can be performed (Parker, 98). 
Each pixel’s corresponding ray is marched starting from the 
minimum depth for the pixel and stopping when a zero crossing 
is found indicating the surface interface. Then the distances 
corresponding to its pixel position is recorded which is used to 
generate a virtual depth image. 
4.2 Sample Frame Extraction 
There are two main components in the process of camera 
relocalization. First, a process of data collection, a set of sample 
frames is extracted from the map. Second, a sample frame 
which is the best match of current frame is determined from the 
data set. Note that when the graph is constructed geometric 
constraints between consecutive frames are recorded. And there 
is only small motion of the camera from frame to frame. So we 
can utilize the geometric relationship to reduce the number of 
data to be used soon afterwards. As we know that visual 
information is often to be used to evaluate the degree of 
similarity between frames. So in this paper we extract SIFT 
feature from RGB image and match them between current 
frame and sample frames. Besides, SIFT pairs can provide 
initial point pairs for ICP algorithm. SIFT is widely used feature 
detector and descriptor (Lowe, 2004). Though the descriptors 
are very distinctive, they must be matched heuristically and 
there can be false matches. To determine a subset of feature 
pairs corresponding to a consistent rigid transformation 
RANSAC algorithm is used. Additionally, the RANSAC 
associations act as an initialization for ICP, which 1s a local 
optimizer. For the 3D point clouds we employ a point-to-plane 
ICP algorithm to compute rigid 6DOF transformation. To 
evaluate the accuracy of those matches color similarity 
measurement is conducted. Eventually the best match is 
selected to recover current camera position. The overall process 
of camera relocalization is as follows in listing 1, 
  
Listing 1 Camera Re-localization 
1: Nodes = Find Sub Data Set From Graph(distance) 
2F= Extract RGB SIFT Features(Ps) 
3:1f {F} = Find_ Similar Samples(Feature Index; F”) 
For each sample in {F} 
4: t=Perform RANSAC Alignment(F;; F”) 
Repeat 
  
270 
S T'- Compute Closest Points(t; P,; P) 
6: until ( Error Converge(t) 0 ) or (max Iteration reached) 
g return T' 
8: S,= Computer Color Similarity Measurement(P,; Pi; T^) 
9: T- Get Max Similarity(S;) 
10: Relocate Camera Position (T) 
Consecutive depth frame, with an associated live camera pose 
estimate, is fused incrementally into one single 3D 
reconstruction. After a period of time a huge amount of RGB-D 
data are accumulated and maintained by the graph structure. 
Note that camera moves at a certain rate. So even if camera 
tracking has failed, current position of the camera must locate 
around the last sample frame within a certain range. 
Consequently if the last sample frame in the graph is picked as 
data centre and a radius is pre-defined we can employ Spatial 
Index to extract a data set {F}. For simplicity, sample frames 
are noted as F; and current frame F. F; which is the best match 
of F' will lie in the data set. However, as F; varied from each 
other we have to measure the degree of similarity between F; 
and F'. Firstly we extract sparse visual features from F' and 
associate them with their corresponding depth values to 
generate feature points in 3D which will be used later. Then 
those features are matched heuristically with features kept by 
Feature Index of each node in the data set. This part will be 
elaborated in section 1.1.2. Then the number of successfully 
matched feature pairs is an indication of the degree of data 
similarity. And we will choose 2-3 sample frames from the data 
set which rank the top in this procedure. 
4.3 Selection Strategy 
The ICP algorithm iterates between associating each point in 
one time frame to the closest point in the other frame and 
computing the rigid transformation that minimizes distance 
between the point pairs. However the important first step of ICP 
is to find correspondences between frame pairs, otherwise the 
ICP algorithm will easily converge to a local minimum. So we 
use feature selection to provide initial corresponding point pairs 
for ICP algorithm. Through the above procedures 2-3 sample 
frames are selected. To determine which frame should be used 
to recover camera pose color similarity is introduced. 
Combining the color similarity criterion, registering is much 
more robust in difficult cases and the result becomes more 
reliable. 
4.3.1 SIFT+RANSAC 
In the process of SIFT match the best candidate match for each 
keypoint of F' is found by identifying its nearest neighbor in the 
database of keypoints of F;. Consider N pairs of initial feature 
pairs between frame F and Fj, represented by vectors (X; Y) in 
their respective coordinate system. RANSAC samples the 
solution space of (R; T) (rotation and translation) and counts 
the number of inliers, f, 
f(,T) - 1GC Y^, RT) (D 
Where I will be inlier if X' and Y' fit well with a pre-defined 
threshold under the constraint of (R; T). A inlier will be 
counted as 1 otherwise 0. RANSAC chooses the transform 
consistent with the largest number of inlier matches. 
4.3.2 ICP 
In 2D because of the scale indeterminacy the frame pairs are not 
finely aligned. That means the registration accuracy is not 
precise enough. ICP is a popular and well-studied algorithm for 
  
fir 
thi 
Tc 
se 
di 
Th 
th 
fir
1
2
...
281
282
283
284
285
...
586
587
Full text: Technical Commission III (B3)

Access restriction

Copyright

Note to user