Fua, Pascal
4.1.1 Relative Motion Recovery First, we estimate the relative motion of the face with respect to the camera. Given
sequences in which the subjects keep a fairly neutral facial expression, we treat the head as a rigid object. We assume
that the intrinsic camera parameters remain constant throughout the sequence. In theory, given high precision matches,
bundle-adjustment can recover both intrinsic parameters and camera motion (Gruen and Beyer, 1992). The same holds
true for recent auto-calibration techniques but, typical sequences of head images are close to exhibiting degenerate mo-
tions (Sturm, 1997, Zisserman et al., 1998). Again, extremely precise matches would be required.
In practice, however, face images exhibit little texture and we must be prepared to deal with the potentially poor quality
of the point matches. Therefore, we have chosen to roughly estimate the intrinsic parameters and to concentrate on
computing the extrinsic ones using bundle-adjustment: We use an approximate value for the focal length and assume that
the principal point remains in the center of the image. By so doing, we generate 3-D models that are deformed versions
of the real heads. When the motion between the camera viewpoints is a pure translation, this deformation is an affine
transform (Luong and Viéville, 1996). In practice, the deformation is still adequately modeled by an affine transform even
if the motion is not a pure translation (Fua, 2000). The closer the approximate value of the focal length to its true value,
the closer that affine transform is to being a simple rotation, translation and scaling.
Initialization A well known limitation of bundle adjustment algorithms is the fact that, to ensure convergence, one
must provide initial adequate initialization. To fulfil this requirement, we begin by retriangulating the surface of the
generic face model introduced in Section 2 to produce the regular mesh shown in Figure 7(b). We will refer to it as the
bundle-adjustment triangulation.
(a) (b)
Figure 8: Input video sequence: For each person, 3 of a sequence of 9 consecutive images.
(a) (b) (c) (d) (e) (f) (g)
Figure 9: Regularized bundle-adjustment: (a) The five manually supplied keypoints used to compute the orientation of the first cam-
era. (b) The projections of the bundle-adjustment triangulation vertices of Figure 7(b) into the central image of Figure 8(a).
(c,d) Matching point in the images immediately following and immediately preceding the central image of Figure 8(a) in
the sequence. (e,f) Shaded representation of the bundle-adjustment triangulation. (g) Recovered relative camera positions.
To initialize the process for a video sequence such as the ones shown in Figure 8, we manually supply the approximate
position of the five feature points depicted by Figure 9(a) in one, and only one, reference image. We compute the position
and orientation of the reference camera that brings the five projections of the corresponding keypoints as close as possible
to those positions. We then estimate the positions and orientations for the two images on either side of the reference image
as follows.
Generic Bundle Adjustment As shown in Figure 9(b), our initial orientation guarantees that the bundle-adjustment
triangulation vertices’ projections fall roughly on the face. We match these projections into the other images using
a simple correlation-based algorithm. Figure 9(c,d) depicts the results. For each of vertex (zi, yi, zi) of the bundle-
adjustment triangulation and each projection (ul vi ) of this vertex in image 7, we write two observation equations:
I
Pri(z; + dz, yi + dyi, zi + dz) ul + €, (1)
Pr} (x; + dxi, Yi + dyi zi * dzi) = vj dE
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000. 261