Figure 2: Top row: An image triplet. Bottom row: Measured 3-D point cloud.
1 21
31
51 71
Figure 3: Tracking results in frames 1, 21, 31, 51, and 71 of a 300-frame sequence exhibiting a complex fully 3-dimensional
motion. Top row: Frames from one of three synchronized video sequences. Bottom row: Shaded represenation of the
recovered model.
such as silhouettes and occluded areas, thereby increasing
the reliability of image-based algorithms.
Our approach relies on optimization to deform the gene-
ric model so that it conforms to the image data. This in-
volves computing first and second derivatives of the dis-
tance function from model to data points. The main con-
tribution of this paper is a mathematical formalism that
greatly simplifies these computations and allows a fast and
robust implementation. This is in many ways orthogonal
to recent approaches to human body tracking as we ad-
dress the question of how to best represent the human body
for tracking and fitting purposes. The specific optimiza-
tion scheme we use could easily be replaced by a more
sophisticated one that incorporates statistics and can han-
dle multiple hypotheses [Deutscher et al., 2000, Davison
et al., 2001, Choo and Fleet, 2001]. Another natural ex-
tension of this work would be to develop better body and
motion models: The current model constrains the shape
and imposes joint angle limits. This is not quite enough
under difficult circumstances: A complete model ought to
also include more bio-mechanical constraints that dictate
how body parts can move with respect to each other, for
example in terms of dependencies between joint angles.
In our current work, we rely on cheap and easily installed
video cameras to provide data. This, we hope, will lead
to practical applications in the fields of medicine, athletics
and entertainment. It would also be interesting to test our
approach using high quality data coming from a new breed
of image or laser-based dynamic 3—dimensional scanners
[Saito and Kanade, 1999, Davis et al., 1999]. Our tech-
nique will provide the relative position of the skeleton in-
side the data and a standard joint angle based description
of the subject’s motion. Having high-resolution front and
back data coverage of the subject should allow us to re-
cover very high-quality animatable body models.
REFERENCES
[Aggarwal and Cai, 1999]Aggarwal, J. and Cai, Q., 1999.
Human motion analysis: a review. Computer Vision and
Image Understanding 73(3), pp. 428—440.
[Barron and Kakadiaris, 2000]Barron, C. and Kakadiaris,
I., 2000. Estimating anthropometry and pose from a single
image. In: Conference on Computer Vision and Pattern
Recognition, Vol. 1, Hilton Head Island, South Carolina.
[Blinn, 1982]Blinn, J. F., 1982. A Generalization of AI-
gebraic Surface Drawing. ACM Transactions on Graphics
1(3), pp. 235-256.
—260—
uU t "m rm
R
rd Ppl B om A =
pr
pote +5 mi. rp
A a NN —-