Full text: Close-range imaging, long-range vision

  
MARKERLESS FULL BODY SHAPE AND MOTION CAPTURE FROM VIDEO SEQUENCES 
P. Fua?, A. Gruen^, N. D'Apuzzo^, R. Plánkers?* 
? VrLab, EPFL, 1015 Lausanne, ^ IGP, ETH-Hónggerberg, 8093 Zürich 
KEY WORDS: Body Modeling, Motion Capture, Stereo, Silhouettes, Least-Squares Matching. 
ABSTRACT 
We develop a framework for 3-D shape and motion recovery of articulated deformable objects. We propose a formalism 
that incorporates the use of implicit surfaces into earlier robotics approaches that were designed to handle articulated 
structures. We demonstrate its effectiveness for human body modeling from video sequences. Our method is both robust 
and generic. It could easily be applied to other shape and motion recovery problems. 
1 INTRODUCTION 
Recently, many approaches to tracking and modeling ar- 
ticulated 3-D objects have been proposed. They have been 
used to capture people's motion in video sequences with 
potential applications to animation, surveillance, medicine, 
and man-machine interaction. See [Aggarwal and Cai, 1999, 
Gavrila, 1999, Moeslund and Granum, 2001] for recent re- 
views. 
Such systems are promising. However, they typically use 
oversimplified models, such as cylinders or ellipsoids at- 
tached to articulated skeletons. Such models are too crude 
for precise recovery of both shape and motion. In our 
work, we have proposed a framework that retains the ar- 
ticulated skeleton but replaces the simple geometric primi- 
tives by soft objects. Each primitive defines a field function 
and the skin is taken to be a level set of the sum of these 
fields. This implicit surface formulation has the following 
advantages: 
e Effective use of stereo and silhouette data: Defin- 
ing surfaces implicitly allows us to define a distance 
function of data points to models that is both differ- 
entiable and computable without search. 
e Accurate shape description by a small number of 
parameters: Varying a few dimensions yields mod- 
els that can match different body shapes and allow 
both shape and motion recovery. 
e Explicit modeling of 3-D geometry: Geometry can 
be taken into account to predict the expected location 
of image features and occluded areas, thereby making 
the extraction algorithm more robust. 
Our approach, like many others, relies on optimization to 
deform the generic model so that it conforms to the data. 
This involves computing first and second order derivatives 
of the distance function of the model to the data points. 
This turns out to be prohibitively complex and slow if done 
in a brute-force fashion. The main contribution of this ap- 
proach is a mathematical formalism that greatly simplifies 
  
*This work was supported in part by the Swiss National Science Foun- 
dation. 
these computations and allows a fast and robust imple- 
mentation of articulated soft objects. It extends the tradi- 
tional robotics approach that was designed to handle artic- 
ulated bodies [Craig, 1989] and allows the use of implicit 
surfaces. For additional details, we refer the interested 
reader to our earlier publications [Plánkers and Fua, 2001, 
Plänkers and Fua, 2002]. 
We have integrated our formalism into a complete frame- 
work for tracking and modeling and demonstrate its robust- 
ness using video sequences of complex 3-D motions. We 
have set up a comprehensive concept to fit animation mod- 
els to a variety of different data [Fua et al., 1998]. This in- 
cludes image silhouettes, key body points and surface data 
generated by stereo or multi-image matching. All these ob- 
servations are brought together under a joint least squares 
estimation system, from which the body model parameters 
are derived. 
To validate it, we focus on using stereo and silhouette data 
because they are complementary sources of information, 
as illustrated by Figure 1. Stereo works well on both tex- 
tured clothes and bare skin for surfaces facing the camera 
but fails where the view direction and the surface normal is 
close to being orthogonal, which is exactly where silhou- 
ettes provide information. To increase the performance of 
our system, we have also developed an improved approach 
to extracting stereo-data using least-squares matching and 
tracking methods. 
In the remainder of this paper we first introduce our mod- 
els. We then discuss our approach to extracting 3-D infor- 
mation from the video sequences and, finally, to fitting the 
3-D body models to it. 
2 ARTICULATED MODEL AND SURFACES 
The human body model we use in this work [Thalmann 
et al., 1996] is depicted by Figures 1(a,b). It incorporates 
a highly effective multi-layered approach for constructing 
and animating realistic human bodies. The first layer is a 
skeleton that is a connected set of segments, correspond- 
ing to limbs and joints. A joint is the intersection of two 
segments, which means it is a skeleton point around which 
the limb linked to that point may move. 
-256- 
  
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.