time step
e process
used im-
hronized
a triangle
are con-
je images
ts quanti-
reference
:alibrated
s. The re-
ior orien-
e interior
ric radial
] two ad-
aling and
1 Of these
acy in the
Our ap-
zo, 2002]
hod with
matched
from few
et of cor-
:ntium III
. matched
applying
plate im-
g process
ints in the
matically
along the
y gonal re-
is closest
ed points,
matically
until the
quality of
orks adap-
shift, big-
d for each
. covered.
ints in the
. matched
^ction us-
the cam-
-D points
)' Apuzzo
matching
techniques. Its basic idea is to track triplets of cor-
responding points in the three images through the se-
quence and compute their 3-D trajectories. The spa-
tial correspondences between the three images at the
same time and the temporal correspondences between
subsequent frames are determined with a least squares
matching algorithm. The results of the tracking pro-
cess are the coordinates of a point in the three images
through the sequence, thus the 3-D trajectory is deter-
mined by computing the 3-D coordinates of the point
at each time step by forward ray intersection. Veloci-
ties and accelerations are also computed.
The tracking process is applied to all the points mat-
ched in the region of interest, resulting in a vector
field of trajectories (position, velocity and accelera-
tion), that can be checked for consistency and local
uniformity of the movement. Key points can be de-
fined and tracked in the vector field, producing 3-D
information that can be used to establish the approxi-
mative posture of the body, e.g. position of joints.
Figure 2 depicts the output of this tracking process on one
of the image triplets we work with.
4 MODEL FITTING
We use the body model of Section 2 both to track the hu-
man figure and to recover shape parameters. Our system
is intended to run in batch mode, which means that we ex-
pect the two or more video sequences we use have been
acquired before running our system. It goes through the
following steps:
e Initialization: We initialize the model interactively
in one frame of the sequence. The user has to enter
the approximate position of some key joints, such as
shoulders, elbows, hands, hips, knees and feet. Here,
it was done by clicking on these features in two im-
ages and triangulating the corresponding points. This
initialization gives us a rough shape, this is a scal-
ing of the skeleton, and an approximate model pose.
Techniques such as those proposed by [Barron and
Kakadiaris, 2000, Taylor, 2000] could eliminate most
of the currently necessary interaction.
e Data Acquisition: We use either clouds of 3-D points
derived from the input stereo-pairs or triplets using ei-
ther a simple correlation-based algorithm [Fua, 1993]
or the higher quality data derived using the techniques
introduced in Section 3. In the first case, the 3-D
points form a noisy and irregular sampling of the un-
derlying body surface. To reduce the size of the cloud
and begin eliminating outliers, we robustly fit local
surface patches to the raw 3-D points [Fua, 1997] and
use the center of those patches as input to our system.
e Frame-to-frame tracking: At a given time step the
tracking process adjusts the model’s joint angles by
minimizing, with respect to the joint angle values that
relate to that frame, the distance of the model to the
3-D point clouds. This modified posture is saved for
the current frame and serves as initialization for the
next one. Optionally The system may use the model's
projection into the images to derive initial silhouette
estimates, optimize these using image gradients and
derive from the results silhouette observations that it
uses to constrain the minimization [Plánkers and Fua,
2002].
e Global fitting: The results from the tracking in all
frames serve as initialization for global fitting. Its
goal is to refine the postures in all frames and to ad-
just the skeleton and/or metaball parameters to make
the model correspond more closely to the person. To
this end, it optimizes over all frames simultaneously,
again by minimizing the sane distance as before but,
this time, with respect to the full state vector includ-
ing the parameters that control the length and width
of body parts.
The final fitting step is required to correctly model the pro-
portions of the skeleton and derive the exact position of
the articulations inside the skin surface. This must be done
over many frames and allows us find a configuration that
conforms to every posture. To stabilize the optimization,
we add to our objective function additional observations
that favor constant angular speeds. Their weight is taken
to be small so that they do not degrade the quality of the
fit but, nevertheless, help avoid local minima in isolated
frames and yield smoother and more realistic motions. Fig-
ure 3 depicts the results on a difficult fully 3-dimensional
motion.
5 CONCLUSION
In this work, we use a flexible framework for video-based
modeling using articulated 3-D soft objects. The volu-
metric models we use are sophisticated enough to recover
shape and simple enough to track motion using potentially
noisy image data. This has allowed us to validate our ap-
proach using complex video-sequences featuring fully 3—
dimensional motions without engineering the environment
or adding markers.
The implicit surface approach to modeling we advocate ex-
tends earlier robotics approaches designed to handle artic-
ulated bodies. It has a number of advantages for our pur-
poses. First, it allows us to define a distance function from
data points to models that is both differentiable and com-
putable without search. Second, it lets us describe accu-
rately both shape and motion using a fairly small number
of parameters. Last, the explicit modeling of 3-D geome-
try lets us predict the expected location of image features
—259—
MAGN CO RE at get