387
GAZE TRACKING CONTROL USING AN ACTIVE STEREO CAMERA
Masafumi NAKAGAWA*, Eisuke ADACHI, Ryuichi TAKASE, Yumi OKAMURA,
Yoshihiro KAWAI, Takashi YOSHIMI, Fumiaki TOMITA
National Institute of Advanced Industrial Science and Technology, 1-1-1, Umezono, Tukuba-city, Ibaraki, Japan -
(m.nakagawa, e-adachi, r-takase, y.okamura, y.kawai, tak-yoshimi, f.tomita)@aist.go.jp
Commission DI, WG m/4
KEY WORDS: Active stereo camera, Object recognition, Segment based image matching, Gaze control, Real time processing, 3-D
spatial data, Versatile Volumetric Vision
ABSTRACT:
The full automation of 3-D spatial data reference and revision requires spatial registration between existing spatial data and newly
acquired data. In addition, it must be able to recognize an object’s shapes and behaviors. Therefore, the authors propose a real-time
gaze tracking system capable of 3-D object recognition, in which an active stereo camera recognizes 3-D objects without markers.
The real-time gaze tracking system was developed, and scenario-based experiments with the system were conducted. The results
confirmed that our system could gaze and track moving objects successfully. Moreover, the proposed system achieves high-resolution
3-D spatial data acquisition and recognition, relative object behavior detection, and wide range covering.
1. INTRODUCTION
1.1 Background
Recently, semi automated procedures have been developed to
achieve low-cost data handling in the field of 3-D Geographic
Information Systems (GIS), such as 3-D urban data generation,
3-D urban data revision, and Intelligent Transport Systems.
These procedures should be improved from semi automation to
full automation for real-time data processing of real-time data.
The full automation of 3-D spatial data reference and revision
requires spatial registration between existing spatial data and
newly acquired data. In addition, it must be able to recognize
an object’s shape and object’s behavior.
The optical flow algorithm is one of the traditional approaches
to the detection of moving objects [1][2][3][4], However, this
approach has difficulty recognizing moving objects in images
that contain occlusions, mainly because of the shortage of 3-D
spatial information.
An image sensor has the advantage of high-speed data
acquisition [5]. However, when a single camera makes an orbit
around an object, the camera restricts available objects to
simple shapes such as points and spheroids.
The Laser Identification Detection and Ranging (LIDAR) is
also an effective sensor for detecting objects [6]. However, the
low resolution of LIDAR requires manual registration for
object recognition [7].
In addition, self-position estimation requires continuous 3-D
information in a wide range environment. Usually, a fisheye
camera has been used to acquire the wide range information [8].
However, the resolution of the camera is insufficient for
generating precise 3-D spatial data.
1.2 Objective
The full automation of 3-D spatial data reference and revision
requires the following capabilities to achieve spatial
registration between existing spatial data and newly acquired
data.
high-resolution 3-D spatial data acquisition and
recognition using image sensors without markers
relative object behavior detection using temporal data, and
wide range covering by a combination of camera
translations and rotations
For local area surveys such as aerial photogrammetry from
low-altitude flight, the authors believe that an active stereo
camera is a suitable sensor for satisfying the above
requirements. However, a gaze tracking procedure is necessary
to realize the advantages of the active stereo camera. Therefore,
we have developed a spatial registration system using an active
stereo camera. In addition, a real-time gaze tracking system
without markers is proposed in this research.
2. APPROACH
Here, we describe two cases of the gaze tracking procedures.
The first case is gaze tracking with known 3-D models such as
existing 3-D urban data. When 3-D data have been prepared for
an area, they can be used as reference data for the gaze tracking
procedure. The known 3-D model could possibly have been
prepared as a CAD model, generated via manual operations.
Alternatively, the known 3-D model could be generated via a
stereo matching procedure.
The second case is a gaze tracking without a known 3-D model.
When no 3-D data has been prepared for an area, reference data
must be prepared then and there, to be able to conduct the gaze
tracking procedure.
This leads to three scenarios, described as follows.
Scenario 1: Camera positioning via a known 3-D model.
The gaze tracking procedure is performed with a known 3-D
model such as a CAD model (e.g. change detection by use of
existing 3-D GIS data, such as camera positioning for
autonomous robots).