3D shape alignment. And ICP has been shown to be effective
when two point clouds are nearly aligned. Since we have two
frames aligned in the process of SIFT+RANSAC the
prerequisites for ICP algorithm has been satisfied. To generate
more accurate alignments than point-to-point ICP an ICP
variant based on point-to-plane error metric has been shown to
improve convergence rates and is the preferred algorithm when
surface normal measurements are available (Rusinkiewicz, 2002)
(Segal, 2009). In the previous section we have mentioned that
virtual depth image is more accurate and less noisy compared to
raw depth image. So in this part the point-to-plane ICP will be
applied between virtual depth image and current RGB-D frame.
In the first iteration of ICP algorithm (R; T) is initialize by
SIFT+RANSAC match. When the point-to-plane error metric is
used, the object of minimization is the sum of the squared
distance between each source point and the tangent plane at its
corresponding destination point. More specifically, if s;— (Sis, Siy,
Sim 1)" is a source point, d = (dix, diy, dip 1)' is the
corresponding destination point, and n — (ni, niy, ni; 1)" is the
unit normal vector at then the goal of each ICP iteration is to
find (Rope; Tope) such that (Low, 2004)
(R,,;7,,) -argmin, Y ((R;T)es,-d)eny ©
After the registration of 3D point clouds the final transformation
(R^ ; T) is computed.
4.3.3 Color Similarity Measurement
The framework above only utilizes little part of pixels
corresponding to SIFT feature in the depth image. It is assumed
that if (R^; T^) applied on frame pairs common areas should
overlap perfectly. However the rigid transformation may be
unreliable under difficult circumstances. So it is not always the
case in practical situations. To compute color similarity, we
choose a set of points from RGB image including all SIFT
feature points and some other visual features such as Harris
Descriptor. SIFT features often locate at the edge of object
while point clouds are not sensitive in these areas. So we put
larger weight on those pixels corresponding to SIFT features in
color similarity measurement.
Every feature point has information including location, gradient
magnitude and orientation. For each image sample, L(x, y), the
gradient magnitude, m(x, y), is precomputed using pixel
differences to produce weight W(x, y):
m(x, y)= AG Ly-I-Ly»y-UG,yr)-Io,y-Dy ©
W(x,y)=1/m(x,y) (6)
To measure color similarity coefficient method is used. First we
set F' as master image and S! as slave image and get pixels
corresponding to SIFT features from both RGB data. The
difference is that pixel window in F' is 4*4 and F! larger 16*16.
The coefficient of the stereo-pair pixel of matching window and
the target window can be calculated by formula below. This
final coefficient r is the max value of the window.
Yo, -Xxy-Y) (7)
i=l
Ea ATIS Y]
i=l i=l
Combining with the pre-defined weight we sum all of those
coefficients up. Each sample frame obtained in section 1.1.2
there will be a coefficient. So for the reason to compare color
similarity S we have to normalize the coefficient value:
r=
271
M (8)
Y W(x, y)
i=l
5. RESULTS & DISCUSSION
We have conducted a number of experiments to investigate the
performance of our system. These and other aspects, such as the
system's ability to keep track during very rapid motion and the
performance of automatic relocalization, are tested. In our
experiment an indoor space is reconstructed. Figure 1 shows an
example frame observed with this RGB-D camera.
Figure 2: (left) RGB image and (right) depth information captured by an
RGB-D camera. Black pixels in the right image have no depth value, mostly
due to max distance, or surface material.
Figure 3: Demonstration of the reconstructed 3D model. The colored
points in the middle linked as a polygonal line are representatives of sample
frame. They are also the representatives of camera positions.
During mapping, the camera was carried by a person,
meanwhile to test the performance of automatic relocalization
the camera was moved shiftily. As shown in Figure 3 that there
is no explicit “holes” or “ghost image” existing in the
reconstructed model. Some holes on the edge of object are
caused by the missing of data information where camera cannot
reach. In our experiment camera tracking failure happened.
However the system only takes some milliseconds to re-
initialize camera position. So the efficiency of our method to
achieve camera relocalization has been proven.
6. CONCLUSION
Building accurate, dense models of indoor environments has
many applications in robotics, gaming. In this paper We
investigate how potentially inexpensive depth cameras-Kinect-
can be utilized to reconstruct 3D model using voxel-based
method. To maintain the stability of our system graph-based
method along with SIFT and Colour Similarity Measurement
has been proposed. And we get a prospective result of camera
relocalization in 3D reconstruction process.
REFERENCES