The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3b. Beijing 2008
Scenario 2: Texture data acquisition via a known 3-D
model.
The gaze tracking procedure is performed with a known 3-D
model, generated through a stereo matching procedure (e.g.
texture mapping for 3-D GIS geometrical data).
Scenario 3: Gaze tracking of objects appearing in
sequence data
The gaze tracking procedure is performed without a known 3-D
model such as a CAD model or existing GIS data (e.g.
pedestrian tracking or vehicle tracking).
A real-time gaze tracking system was developed using these
scenarios. In addition, experiments were conducted for these
scenarios to evaluate the performance of the system. Three
approaches are described in this paper.
Approach 1: The gaze tracking procedure with a known
3-D model (CAD model).
Approach 2: The gaze tracking procedure with a known
3-D model (stereo matching procedure).
Approach 3: The gaze tracking procedure without a known
3-D model.
3. THE REAL-TIME GAZE TRACKING SYSTEM
3.1 Concept of the real-time gaze tracking system
The system comprises an airborne stereo camera simulator
system and a 3-D object recognition system. The airborne
stereo camera simulator has five degrees of freedom, namely
X-Y-Z translations and TILT-PAN axes of rotation. The 3-D
object recognition system is applied to the real-time gaze
tracking of objects. Continuous gaze tracking of objects is
achieved by a combination of the simulator and the 3-D object
recognition system.
A basic procedure in gaze tracking is to locate the object of
interest in the center of a captured image, as follows. First, a
model matching procedure uses the segment-based stereo
spatial information of the object on the captured images. Then,
the active stereo camera moves to catch the object in the
centers of the captured images using the results of the previous
model matching procedure.
3.2 Airborne stereo camera simulator
The hardware of the airborne stereo camera simulator system
comprises a turntable and an active stereo camera hung on a
crane (Figure 1).
Figure 1. The airborne stereo camera simulator
Objects are placed on the turntable, which simulates horizontal
rotation of the objects. The crane simulates vertical and
horizontal translation of the objects. These rotation and
translation data are transferred through controllers from the
turntable and the crane to the 3-D object recognition system.
Note that, these are simulated as relative rotation and
translation parameters between the objects and the cameras.
The active stereo camera comprises three cameras mounted in
one head. The head can rotate on TILT and PAN axes. The
positions and orientations of the cameras are given from the
angle of the turntable and the position of the crane.
3.3 3-D object recognition system
A block diagram of the gaze tracking procedure is shown in
Figure 2.
Identification j** Tracking
Model alignment
When an object exists
outside images
Camera displacement |
Figure 2. Block diagram of gaze tracking procedure
Traditional 3-D object recognition methodology cannot easily
track moving 3-D objects in real time, because the processing
time for 3-D object recognition is too great to track moving 3-D
objects, even for a single stereo shot.
Here, we are developing Versatile Volumetric Vision (VVV) [9]
technology for the 3-D vision system in the real-time gaze
tracking system. The 3-D object recognition module of this
system detects and localizes an object in the most recent frame
of a ‘frame memory,’ which requires hundreds of megabytes of
RAM. It is based on two continuous procedures, namely the
rough matching and the precise matching of camera position
estimation. These procedures perform the ‘identification’ in
Figure 2. The ‘3-D model,’ which is generated from CAD, or
acquired via 3-D sensors, is used as the given data. ‘Tracking’
tracks object motion, frame-by-frame, within the stereo image
sequence buffered in the frame memory.
Sequential stereo images must be captured and processed at
high speed to track the objects in the gaze tracking procedure
for camera position estimation. Here, Hyper Frame Vision
technology [10] is used with a high-capacity frame memory.
The computational time for the identification task generally
exceeds one frame period. Therefore, the identification task
requires substantial processing time. On the other hand, the
tracking task takes less than one frame period. While waiting
for the identification task to be completed, the stereo image
sequence is buffered into the frame memory for the tracking
task. Buffering the image sequence safely during the
identification task can make use of the computational time
difference between the identification task and the tracking task.
Through memory sharing, independent tasks, such as
recognition, tracking, and image viewing, are processed in
parallel. By this means, the 3-D object recognition system
388