Fua, Pascal
where Pri(x,y,z) and Pri(z,y,2) denote the two image coordinates of the projection of point (z, y, z) in image 7
using the current estimate of the camera models; (dz;, dy;, dz;) represents the 3-D displacement of vertex à to conform
to the actual face shape; and, € ui» Eyi are the projection errors to be minimized. The camera position parameters can be
recovered by minimizing the sum of the squares of the € id and e ul with respect to the six external parameters of each
camera and the (dz, dyi, dz;) displacement vectors. The solution can only be found up to a global rotation, translation
and scaling. To remove this ambiguity, we fix the position of the first camera and one additional parameter such as the
distance of one vertex in the triangulation.
Robust Bundle Adjustment If the correspondences were perfect, the above procedure would suffice. However, the
point correspondences can be expected to be noisy and to include mismatches. To increase the procedure’s robustness, we
introduce the two following techniques.
Iterative reweighted least squares. We first run the bundle adjustment algorithm with all the observations of Equation 1
having the same weight. We then recompute these weights so that they are inversely proportional to the final residual
errors. We minimize our criterion again using these new weights and iterate the whole process until the weights stabilize.
Regularization. We prevent excessive deformation of the bundle-adjustment triangulation by treating the bundle-adjustment
triangulation’s facets as C? finite elements and adding a quadratic regularization term to the sum of the squares of the e, ;
and e, ; of Equation 1. :
For the image triplet formed by the central image of the video sequence of Figure 8(a) and the images immediately
preceding and following it, the procedure yields the bundle-adjustment triangulation's shape depicted by Figure 9(e.f).
By repeating this computation over all overlapping triplets of images in the video sequences we can compute the camera
positions depicted by Figure 9(g).
4.1.2 Model Fitting Given the camera models computed above, we can now recover additional information about the
surface by using a simple correlation-based algorithm (Fua, 1993) to compute a disparity map for each pair of consecutive
images in the video sequences and by turning each valid disparity value into a 3-D point. Because, these 3-D points
typically form an extremely noisy and irregular sampling of the underlying global 3-D surface, we begin by robustly
fitting surface patches to the raw 3-D points. This first step eliminates some of the outliers and generates meaningful local
surface information for arbitrary surface orientation and topology (Fua, 1997).
Our goal, then, is to deform the generic mask so that it conforms to the cloud of points, that is to treat each patch as
an attractor and to minimize its distance to the final mask. In our implementation, this is achieved by computing the
orthogonal distance d? of each attractor to the closest facet as a function of the z,y, and z coordinates of its vertices and
minimizing the objective function:
2
Gau. Q)
i
Control Triangulation In theory we could optimize with respect to the state vector P of all x, y, and z coordinates of
the surface triangulation. However, because the image data is very noisy, we would have to impose a very strong regu-
larization constraint. Instead, we introduce control triangulations such as the one shown in Figure 7(c). The vertices of
the surface triangulation are "attached" to the control triangulation and the range of allowable deformations of the surface
triangulation is defined in terms of weighted averages of displacements of the vertices of the control triangulation (Fua
and Miccio, 1998).
Because there may be gaps in the image data, it is necessary to add a small stiffness term into the optimization to ensure
that the displacements of the control vertices are consistent with their neighbors where there is little or no data. As before,
we treat the control triangulation'$ facets as C? finite elements and add a quadratic stiffness term the objective function
of Equation 2.
Because there is no guarantee that the image data covers equally both sides of the head, we also add a small number of
symmetry observations between control vertices on both sides of the face. They serve the same purpose as the stiffness
term: Where there is no data, the shape is derived by symmetry. An alternative would have been to use a completely
symmetric model with half of the degrees of freedom of the one we use. We chose not to do so because, in reality, faces
are somewhat asymmetric. Because the control triangulation has fewer vertices that are more regularly spaced than the
surface triangulation, the least-squares optimization has better convergence properties. Of course, the finer the control
triangulation, the less smoothing it provides. By using a precomputed set of increasingly refined control triangulations,
we implement a hierarchical fitting scheme that has proved very useful when dealing with noisy data We recompute the
facet closest to each attractor at each stage of our hierarchical fitting scheme, that is each time we introduce a new control
triangulation. To discount outliers, we also recompute the weight associated with each attractor and take it to be inversely
proportional to the initial distance of the data point to the surface triangulation.
t3
e
N
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000.
(a
(t