of Thessaloniki,
photo grammetry
1e McGraw-Hill
OS%20Digital%
.
[
lios Tsioukas for
d supplying the
ke to thank Mr.
g environment in
aboratory of the
d for his great
ACCURACY TEST OF MICROSOFT KINECT FOR HUMAN MORPHOLOGIC
MEASUREMENTS
B. Molnár ^, C. K. Toth?, A. Detrekói?
* Department of Photogrammetry and Geoinformatics
Budapest University of Technology and Economics, Müegyetem rkp 3., Budapest, H-1111, Hungary -
molnar.bence@fmt.bme.hu
^ The Center for Mapping, The Ohio State University
470 Hitchcock Hall, 2070 Neil Avenue, Columbus, OH 43210 - toth@cfm.ohio-state.edu
Commission TCs III and V
KEY WORDS: Flash LiDAR, MS Kinect, point cloud, accuracy
ABSTRACT:
The Microsoft Kinect sensor, a popular gaming console, is widely used in a large number of applications, including close-range 3D
measurements. This low-end device is rather inexpensive compared to similar active imaging systems. The Kinect sensors include an
RGB camera, an IR projector, an IR camera and an audio unit. The human morphologic measurements require high accuracy with
fast data acquisition rate. To achieve the highest accuracy, the depth sensor and the RGB camera should be calibrated and co-
registered to achieve high-quality 3D point cloud as well as optical imagery. Since this is a low-end sensor, developed for different
purpose, the accuracy could be critical for 3D measurement-based applications. Therefore, two types of accuracy test are performed:
(1) for describing the absolute accuracy, the ranging accuracy of the device in the range of 0.4 to 15 m should be estimated, and (2)
the relative accuracy of points depending on the range should be characterized. For the accuracy investigation, a test field was
created with two spheres, while the relative accuracy is described by sphere fitting performance and the distance estimation between
the sphere center points. Some other factors can be also considered, such as the angle of incidence or the material used in these tests.
The non-ambiguity range of the sensor is from 0.3 to 4 m, but, based on our experiences, it can be extended up to 20m. Obviously,
this methodology raises some accuracy issues which make accuracy testing really important.
1. INTRODUCTION
The superior performance and efficiency have made
laserscanning systems the primary source for 3D measurements.
Main LiDAR methods are well explained by Shan and Toth
(Shan and Toth, 2008). The two typical LiDAR platforms are
airborne and terrestrial (TLS) laserscanning (Vosselman, 2010),
though mobile LiDAR (MLS) is gaining rapid acceptance.
These methods use pulsed-based technology with discrete
return detection or waveform recording recently. For close
range LiDAR scanning, Flash LiDAR is increasingly used. This
technology is based on a sensor array, which makes it possible
to measure multiple ranges at the same time. The range of the
captured depth image is mainly limited based on the emitted
impulse power. The frequency is also somewhat limited for
eyesafety and technological reasons. For example the early
Flash LiDAR model, the SWR3000 (Kahlmann et al., 2006) is
based on CW approach, offering an operating range up to 7.5 m
and a frame rate of 15 Hz. The newer PMD [vision] CamCube
2.0 has a range 0.4 to 7 m and 25 fps (PMD).
Successful facial reconstruction requires an appropriate model
of the human face. Therefore, a wide range of data collection
procedures have been developed, mostly based on
photogrammetry (Schrott ef al., 2008). Flash LiDAR is a good
alternative for surface point gathering methods. In addition, it is
fast data acquisition. The post processing and model creation,
however, require some specific knowledge, as the human face
has special surface conditions (Aoki et al., 2000). The
developed model provides a good base for plastic surgery.
2. MICROSOFT KINECT SENSOR
The Kinect" sensor is a motion sensing input device for the
Xbox 360 video game console, originally developed by
PrimeSense (PrimeSense), and acquired by Microsoft”. The
primary purpose is to enable users to control and interact with
the Xbox 360 through a natural user interface using gestures
and spoken commands without the need to touch a game
controller at all. The Kinect has three primary sensors: a Flash
LiDAR (3D camera), a conventional optical RGB sensor (2D
camera), and microphone array input. The device is USB-
interfaced, similar to a webcam, and appears as a "black box"
for the users.
Very little is known of the sensors, internal components and
processing methods stored in the firmware. The laser, IR,
emitter projects a structured light pattern of random points to
support 3D recovery. The 2D camera can acquire standard
VGA, 640x480, and SXGA, 1280x1024, images at 30 Hz. The
color formation is based on Bayer filter solution, transmitted in
32-bit and formatted in the sRGB color space. The FOV of the
2D camera is 57? x 43?. The 3D camera can work in two
resolutions with frame sizes of 640x480 and 320x240,
respectively. The range data comes in 12-bit resolution. The
sensors’ spatial relationship is shown in Figure 1. The
approximate distance between the laser emitter and detector that
form a stereo par is about 7.96 cm, and the baseline between the
2D and 3D cameras is about 2.5 cm.