leo
ed
tly
m
le
on
Ise
SO
he
en
he
ult
ds
he
D'Apuzzo, Nicola
modeling, this can pose accuracy problems caused by the need for a person to remain immobile for several seconds.
Photogrammetry methods (D'Apuzzo 1998) can instead acquire all data in less than one second.
The second component of the human body modeling process consists in the capture of the motion (Dyer et al. 1995). The
different systems can be divided into groups depending on which characteristic is used for classification, e.g. accuracy,
time to process the data, method used, price of the system, portability of the system. Photogrammetric systems measure
very accurately the trajectories of signalized target points on the body (Boulic et al. 1998, Vicon, Qualisys, Northen
Digital); some of them compute the data in real-time. Other systems use electromagnetic sensors which are connected to
a computer unit which can process the data and produce 3-D data in real time (Ascension, Polhemus). There are also
mechanical systems, where the person has to wear special suits with mechanical sensors which register the movement of
the different articulations (Analogus). Motorized video theodolites in combination with a digital video camera have also
been used for human motion analysis (Anai et al. 1999). A different approach is used by the image-based methods where
image sequences are taken from different positions and then processed to recover the 3-D motion of the body (Gravila et
al. 1996).
The common characteristic of these systems is the separated consideration of the two modeling aims: shape and motion
are modeled in two different steps. In this paper, we present instead a method to solve the two problems simultaneously,
recovering from one data set both 3-D shape and 3-D motion information.
The core of this paper is the description of the least squares matching tracking algorithm (LSMTA). It uses the least
squares matching process to establish the correspondences between subsequent frames of the same view as well as
correspondences between the images of the different views. Least squares matching has been chosen among others
methods for its adaptivity.
2 EXTRACTION OF 3-D DATA FROM VIDEO SEQUENCES
In this section, we will first describe the system for data acquisition and the method used for its calibration. We then
depict our methods for the extraction of 3-D data from the multi-image video sequence. The extracted information is of
two different types: 3-D points clouds of the visible parts of the human body for each time step and a 3-D vector field of
trajectories. The LSMTA can be also used in 2-D mode; we give an example of this possible use in tracking of facial
expressions.
2.1 Data Acquisition and Calibration
Three synchronized CCD cameras in a linear arrangement (left,
center, right) are used. A sequence of triplet images is acquired with
a frame grabber and the images are stored with 768x576 pixels at 8
bit quantization. The CCD cameras are interlaced, i.e. a full frame
is split into two fields which are recorded and read-out
consecutively. As odd and even lines of an image are captured at
different times, a saw pattern is created in the image when
recording moving objects. For this reason only the odd lines of the
images are processed, at the cost of reducing the resolution in
vertical direction by 50 percent. In the future is planned the use of Figure 1. Automatically measured image
coordinates of the two points on the reference bar
progressive scan cameras which acquire full frames.
To calibrate the system, the reference bar method (Maas 1998) is
used. A reference bar with two retroreflective target points is moved through the object space and at each location image
triplets are acquired. The image coordinates of the two target points are automatically measured and tracked during the
sequence with a least squares matching based process (Figure 1).
The three camera system can then be calibrated by self-calibrating bundle adjustment with the additional information of
the known distance between the two points at every location. The result of the calibration process are the exterior
orientation of the three cameras (position and rotations: 6 parameters), parameters of the interior orientation of the
cameras (camera constant, principle point, sensor size, pixel size: 7 parameters), parameters for the radial and decentring
distortion of the lenses and optic systems (5 parameters) and 2 additional parameters modeling other effects as
differential scaling and shearing (Brown 1971). A thorough determination of these parameters modeling distortions and
other effects is required to achieve high accuracy.
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000. 165