Full text: Proceedings (Part B3b-2)

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3b. Beijing 2008 
Scenario 2: Texture data acquisition via a known 3-D 
model. 
The gaze tracking procedure is performed with a known 3-D 
model, generated through a stereo matching procedure (e.g. 
texture mapping for 3-D GIS geometrical data). 
Scenario 3: Gaze tracking of objects appearing in 
sequence data 
The gaze tracking procedure is performed without a known 3-D 
model such as a CAD model or existing GIS data (e.g. 
pedestrian tracking or vehicle tracking). 
A real-time gaze tracking system was developed using these 
scenarios. In addition, experiments were conducted for these 
scenarios to evaluate the performance of the system. Three 
approaches are described in this paper. 
Approach 1: The gaze tracking procedure with a known 
3-D model (CAD model). 
Approach 2: The gaze tracking procedure with a known 
3-D model (stereo matching procedure). 
Approach 3: The gaze tracking procedure without a known 
3-D model. 
3. THE REAL-TIME GAZE TRACKING SYSTEM 
3.1 Concept of the real-time gaze tracking system 
The system comprises an airborne stereo camera simulator 
system and a 3-D object recognition system. The airborne 
stereo camera simulator has five degrees of freedom, namely 
X-Y-Z translations and TILT-PAN axes of rotation. The 3-D 
object recognition system is applied to the real-time gaze 
tracking of objects. Continuous gaze tracking of objects is 
achieved by a combination of the simulator and the 3-D object 
recognition system. 
A basic procedure in gaze tracking is to locate the object of 
interest in the center of a captured image, as follows. First, a 
model matching procedure uses the segment-based stereo 
spatial information of the object on the captured images. Then, 
the active stereo camera moves to catch the object in the 
centers of the captured images using the results of the previous 
model matching procedure. 
3.2 Airborne stereo camera simulator 
The hardware of the airborne stereo camera simulator system 
comprises a turntable and an active stereo camera hung on a 
crane (Figure 1). 
Figure 1. The airborne stereo camera simulator 
Objects are placed on the turntable, which simulates horizontal 
rotation of the objects. The crane simulates vertical and 
horizontal translation of the objects. These rotation and 
translation data are transferred through controllers from the 
turntable and the crane to the 3-D object recognition system. 
Note that, these are simulated as relative rotation and 
translation parameters between the objects and the cameras. 
The active stereo camera comprises three cameras mounted in 
one head. The head can rotate on TILT and PAN axes. The 
positions and orientations of the cameras are given from the 
angle of the turntable and the position of the crane. 
3.3 3-D object recognition system 
A block diagram of the gaze tracking procedure is shown in 
Figure 2. 
Identification j** Tracking 
Model alignment 
When an object exists 
outside images 
Camera displacement | 
Figure 2. Block diagram of gaze tracking procedure 
Traditional 3-D object recognition methodology cannot easily 
track moving 3-D objects in real time, because the processing 
time for 3-D object recognition is too great to track moving 3-D 
objects, even for a single stereo shot. 
Here, we are developing Versatile Volumetric Vision (VVV) [9] 
technology for the 3-D vision system in the real-time gaze 
tracking system. The 3-D object recognition module of this 
system detects and localizes an object in the most recent frame 
of a ‘frame memory,’ which requires hundreds of megabytes of 
RAM. It is based on two continuous procedures, namely the 
rough matching and the precise matching of camera position 
estimation. These procedures perform the ‘identification’ in 
Figure 2. The ‘3-D model,’ which is generated from CAD, or 
acquired via 3-D sensors, is used as the given data. ‘Tracking’ 
tracks object motion, frame-by-frame, within the stereo image 
sequence buffered in the frame memory. 
Sequential stereo images must be captured and processed at 
high speed to track the objects in the gaze tracking procedure 
for camera position estimation. Here, Hyper Frame Vision 
technology [10] is used with a high-capacity frame memory. 
The computational time for the identification task generally 
exceeds one frame period. Therefore, the identification task 
requires substantial processing time. On the other hand, the 
tracking task takes less than one frame period. While waiting 
for the identification task to be completed, the stereo image 
sequence is buffered into the frame memory for the tracking 
task. Buffering the image sequence safely during the 
identification task can make use of the computational time 
difference between the identification task and the tracking task. 
Through memory sharing, independent tasks, such as 
recognition, tracking, and image viewing, are processed in 
parallel. By this means, the 3-D object recognition system 
388
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.