Full text: XIXth congress (Part B5,1)

  
Fua, Pascal 
  
since the two reconstruction methods are independent, places of agreement are very likely to be correct for both. As 
discussed in Section 4.1.1, we model the deformation induced by our arbitrary choice of internal camera parameters as an 
affine transform: We have therefore computed the affine transform that brings the bundle-adjustment triangulation closest 
to the laser-scanner model. In both cases, the median distance between the affine-transformed face models and the laser 
output is approximately 1 millimeter which, given the camera geometry, corresponds to a shift in disparity of less than 
1/5 a pixel. The precision of the correlation based algorithm we use is in the order of half a pixel, outliers excluded (Fua, 
1993). We therefore conclude that our motion recovery algorithm performs an effective and robust averaging of the input 
data. 
4.2 Body Modeling 
If the face can be assumed to be relatively rigid as a long as the subject does not change his expression, when dealing 
with the body, one must take into account its articulated nature. Here, we use two or three video sequences acquired using 
synchronized cameras. The body model and the image data are used throughout the fitting process. 
Recently, a number of techniques have been proposed (Kakadiaris and Metaxas, 1996, Gavrila and Davis, 1996, Lerasle 
et al, 1996, Bregler and Malik, 1998) to track human motions from video sequences. They are fairly effective but use 
very simplified models of the human body, such as ellipsoids or cylinders, that do not precisely model the human shape 
and would not be sufficient for a truly realistic simulation. By contrast, we use the full body model of Figure 1. 
The algorithm goes through four steps that we summarize below. For a more complete description, we refer the interested 
reader to our earlier publications (D' Apuzzo et al., 1999, Plánkers et al., 1999). 
Data Acquisition Clouds of 3-D points are derived from the input images using correlation-based stereo (Fua, 1993). 
Alternatively, we can use least-squares matching to derive these clouds (D' Apuzzo et al., 2000). Silhouette edges may be 
delineated in several key-frames or automatically generated for the whole sequence. 
Initialization: We first initialize the model interactively in one frame of the sequence. The user has to enter the approx- 
imate position of some key joints, like shoulders, elbows, hands, hips, knees and feet. Here, it was done by clicking on 
these features in two images and triangulating the corresponding points. This initialization gives us a rough shape, i.e. a 
scaling of the skeleton, and an approximate posture of the model. 
Tracking: Ata given time step the tracking process adjusts the model’s joint angles by minimizing an objective function. 
This modified posture is saved for the current frame and serves as initialization for the next one. The computing power 
of today’s PCs allows for interactivity. If, for some reason, the algorithm loses track the user simply pauses the program, 
adjusts the posture interactively and hands the control back to the algorithm for further processing. 
Fitting: The results from the tracking step serve as initialization for a fitting step. Its goal is to refine the postures in all 
frames and to adjust the skeleton and/or metaball parameters to make the model correspond more closely to the person. 
The fitting optimizes over all frames simultaneously, by minimizing the same objective function as before. This allows 
us to find a single set of parameters that describe a model that is consistent with the images of the whole sequence. The 
results are further improved by introducing inter-frame constraints such as smoothness or limits on velocity/acceleration. 
In practice, the model and the constraints it imposes are used to overcome the inherent noisiness of the data. We recover 
both motion and body shape from stereo video sequences. The corresponding parameters can be used to recreate realistic 
3-D animations. 
4.2.1 Least Squares Framework Our system must deal with heterogeneous sources of information—3-D data and 
2-D outlines—whose contributions may not be commensurate. To this end, we have developed the following framework. 
In standard least-squares fashion, we use the image data to write nobs observation equations of the form 
Fi(S) = obs; — ei ‚1<4< nobs‘, (3) 
where S is the state vector that defines the shape and position of the body model and e; is the deviation from the model. 
We will then minimize 
v7 Pv 2 Min , (4) 
where v is the vector of residuals and P is a weight matrix associated with the observations. P is usually introduced as 
diagonal. 
Our system must be able to deal with observations coming from different sources that may not be commensurate with 
each other. Formally, we can rewrite the observation equations of Equation 3 as 
fH?*(S) — obs?" —e, 1 « i € nobs , 6) 
  
264 International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000. 
  
  
ce 
di 
es 
en et 78 9 03$ ON ON ZI
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.