XIXth congress: XIXth congress

beek, klaas jan; molenaar, martin
  
Fua, Pascal 
  
4.1.1 Relative Motion Recovery First, we estimate the relative motion of the face with respect to the camera. Given 
sequences in which the subjects keep a fairly neutral facial expression, we treat the head as a rigid object. We assume 
that the intrinsic camera parameters remain constant throughout the sequence. In theory, given high precision matches, 
bundle-adjustment can recover both intrinsic parameters and camera motion (Gruen and Beyer, 1992). The same holds 
true for recent auto-calibration techniques but, typical sequences of head images are close to exhibiting degenerate mo- 
tions (Sturm, 1997, Zisserman et al., 1998). Again, extremely precise matches would be required. 
In practice, however, face images exhibit little texture and we must be prepared to deal with the potentially poor quality 
of the point matches. Therefore, we have chosen to roughly estimate the intrinsic parameters and to concentrate on 
computing the extrinsic ones using bundle-adjustment: We use an approximate value for the focal length and assume that 
the principal point remains in the center of the image. By so doing, we generate 3-D models that are deformed versions 
of the real heads. When the motion between the camera viewpoints is a pure translation, this deformation is an affine 
transform (Luong and Viéville, 1996). In practice, the deformation is still adequately modeled by an affine transform even 
if the motion is not a pure translation (Fua, 2000). The closer the approximate value of the focal length to its true value, 
the closer that affine transform is to being a simple rotation, translation and scaling. 
Initialization A well known limitation of bundle adjustment algorithms is the fact that, to ensure convergence, one 
must provide initial adequate initialization. To fulfil this requirement, we begin by retriangulating the surface of the 
generic face model introduced in Section 2 to produce the regular mesh shown in Figure 7(b). We will refer to it as the 
bundle-adjustment triangulation. 
   
(a) (b) 
Figure 8: Input video sequence: For each person, 3 of a sequence of 9 consecutive images. 
   
(a) (b) (c) (d) (e) (f) (g) 
Figure 9: Regularized bundle-adjustment: (a) The five manually supplied keypoints used to compute the orientation of the first cam- 
era. (b) The projections of the bundle-adjustment triangulation vertices of Figure 7(b) into the central image of Figure 8(a). 
(c,d) Matching point in the images immediately following and immediately preceding the central image of Figure 8(a) in 
the sequence. (e,f) Shaded representation of the bundle-adjustment triangulation. (g) Recovered relative camera positions. 
To initialize the process for a video sequence such as the ones shown in Figure 8, we manually supply the approximate 
position of the five feature points depicted by Figure 9(a) in one, and only one, reference image. We compute the position 
and orientation of the reference camera that brings the five projections of the corresponding keypoints as close as possible 
to those positions. We then estimate the positions and orientations for the two images on either side of the reference image 
as follows. 
Generic Bundle Adjustment As shown in Figure 9(b), our initial orientation guarantees that the bundle-adjustment 
triangulation vertices’ projections fall roughly on the face. We match these projections into the other images using 
a simple correlation-based algorithm. Figure 9(c,d) depicts the results. For each of vertex (zi, yi, zi) of the bundle- 
adjustment triangulation and each projection (ul vi ) of this vertex in image 7, we write two observation equations: 
I 
Pri(z; + dz, yi + dyi, zi + dz) ul + €, (1) 
Pr} (x; + dxi, Yi + dyi zi * dzi) = vj dE 
  
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000. 261
1
2
...
262
263
264
265
266
...
488
489
Full text: XIXth congress (Part B5,1)

Access restriction

Copyright

Note to user