Fua, Pascal
Images Generic Face m Generic Hair Model
Bundle Adjust. MA —— S | 5 Required Feature Points
ater
=> À 2D Silhouettes ed
| nus Face ose | | ue Hair "m |
Face Texture wo | Hair Texture Map
Complete Head Model
(a)
Figure 7: Face Reconstruction Procedure: (a) Flow chart. The manually and semi-automatically entered data appears on the right.
The location of 5 2-D points must be supplied, the rest is optional. (b) Regular sampling of the face used to perform bundle
adjustment. (c) Control triangulation used to deform the face.
e Correspondences are hard to establish and can be expected to be neither precise nor reliable due to lack of texture.
e A Euclidean or Quasi-Euclidean (Beardsley et al., 1997) reconstruction is required for realism.
e The motion is far from being optimal for most of the auto-calibration techniques that have been developed in recent
years (Sturm, 1997, Zisserman et al., 1998).
To overcome these difficulties, we have developed an approach based on bundle-adjustment that takes advantage of our
rough knowledge of the face's shape, in the form of the generic face model of Section 2 to introduce regularization
constraints. This has allowed us to robustly estimate the relative head motion. The resulting image registration is accurate
enough to use a simple correlation-based stereo algorithm to derive 3-D information from the data and to fit the animation
model to it.
Bundle-adjustment is, of course, well established technique in the photogrammetric community (Gruen and Beyer, 1992).
However, it is typically used in a context, mapping or close-range photogrammetry, where reliable and precise corre-
spondences can be established. In addition, because it involves nonlinear optimization, it requires good initialization for
proper convergence. Lately, it has been increasingly used in the computer vision community to refine the output of auto-
calibration techniques. There again, however, most results have been demonstrated in man-made environments where
feature points can be reliably extracted and matched across images. One cannot assume that those results carry over
directly in the case of ill-textured objects such as faces and low quality correspondences.
Successful approaches to automating the fitting process have involved the use of optical flow (DeCarlo and Metaxas,
1998) or appearance based techniques (Kang, 1997) to overcome the fact that faces have little texture and that, as a
result, automatically and reliably establishing correspondences is difficult. This latter technique is closely related to ours
because head shape and camera motion are recovered simultaneously. However, the optical flow approach avoids the
"correspondence problem" at the cest of making assumptions about constant illumination of the face that may be violated
as the head moves. This tends to limit the range of images that can be used, especially if the lighting is not diffuse.
More recently, another extremely impressive appearance-based approach that uses a sophisticated statistical head model
has been proposed (Blanz and Vetter, 1999). This model has been learned from a large database of human heads and its
parameters can be adjusted so that it can synthesize images that closely resemble the input image or images. While the
result are outstanding even when only one image is used, the recovered shape cannot be guaranteed to be correct unless
more than one is used. Because the model is Euclidean, initial camera parameters must be supplied when dealing with
uncalibrated imagery. Therefore, the technique proposed here could be used to initialize the Blanz & Vetter system in an
automated fashion. In other words, if we had had their model, we could have used it to develop the technique described
here.
Our procedure takes the steps depicted by the flowchart of Figure 7 and described in more detail below. The only manual
intervention that is mandatory is supplying the approximate 2-D location on one single image of five feature points:
Corners of the eyes and mouth and tip of the nose.
260 International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000.
4.1
sec
tha
bui
tru
t10
In
of
CO!
the
of
tra
if |
the
In
ge