MEASUREMENT AND MODELING OF HUMAN FACES FROM MULTI IMAGES
Nicola D'Apuzzo
Institute of Geodesy and Photogrammetry, ETH-Hoenggerberg, 8093 Zurich, Switzerland, nicola@geod.baug.ethz.ch
Commission V, WG V/6
KEYWORDS: Automation, Photogrammetry, Surface, Measurement, Visualization, Photo-Realism
ABSTRACT:
Modeling and measurement of the human face have been increasing by importance for various purposes. Laser scanning, coded light
range digitizers, image-based approaches and digital stereo photogrammetry are the used methods currently employed in medical
applications, computer animation, video surveillance, teleconferencing and virtual reality to produce three dimensional computer
models of the human face. Depending on the application, different are the requirements. Ours are primarily high accuracy of the
measurement and automation in the process. The method presented in this paper is based on multi-image photogrammetry. The
equipment, the method and results achieved with this technique are here depicted. The process is composed of five steps: acquisition
of multi-images, calibration of the system, establishment of corresponding points in the images, computation of their 3-D coordinates
and generation of a surface model. The images captured by five CCD cameras arranged in front of the subject are digitized by a
frame grabber. The complete system is calibrated using a reference object with coded target points, which can be measured fully
automatically. To facilitate the establishment of correspondences in the images, texture in the form of random patterns can be
projected from two directions onto the face. The multi-image matching process, based on a geometrical constrained least squares
matching algorithm, produces a dense set of corresponding points in the five images. Neighborhood filters are then applied on the
matching results to remove the errors. After filtering the data, the three-dimensional coordinates of the matched points are computed
by forward intersection using the results of the calibration process; the achieved mean accuracy is about 0.2 mm in the sagittal
direction and about 0.1 mm in the lateral direction. The last step of data processing is the generation of a surface model from the
point cloud and the application of smooth filters. Moreover, a color texture image can be draped over the model to achieve a
photorealistic visualization. The advantage of the presented method over laser scanning and coded light range digitizers is the
acquisition of the source data in a fraction of a second, allowing the measurement of human faces with higher accuracy and the
possibility to measure dynamic events like the speech of a person.
1. INTRODUCTION
Modeling and measurements of the human face have wide
applications ranging from medical purposes (Banda et al., 1992;
Koch et al. 1996; Motegi et al., 1996; D'Apuzzo, 1998; Okada,
2001) to computer animation (Pighin et al., 1998; Blanz and
Vetter, 1999; Lee and Magnenat-Thalmann, 2000; Liu et al.,
2000; Marschner et al., 2000; Sitnik and Kujawinska, 2000),
from video surveillance (CNN, 2001) to lip reading systems
(Minaku et al., 1995), from video teleconferencing to virtual
reality (De Carlo et al., 1998; Borghese and Ferrari, 2000; Fua,
2000; Shan et al, 2001). How realistic and accurate the
obtained shape is, how long it takes to get a result, how simple
the equipment is and how much the equipment costs are the
issues that must be considered to model the face of a real
person.
The different approaches to enable the reconstruction of a
human face can be classified depending on the requirements.
For animation, virtual reality and teleconferencing purposes, the
photorealistic aspect is essential. In contrast, high accuracy is
required for medical applications. Two major groups can also
be distinguished based on their data source: the first using range
digitizers and the second using only images.
To date, the most popular measurement technique is laser
scanning (Motegi et al., 1996; Hasegawa, 1999; Marschner et
al, 2000; Okada, 2001), for example the head scanner of
Cyberware (Cyberware, 2002). These scanners are expensive
and the data is usually noisy, requiring touchups by hand and
sometimes manual registration. Another solution is offered by
the structured light range digitizers (Proesmans and Van Gol,
1996; Wolf, 1996; Sitnik and Kujawinska, 2000) which are
usually composed of a stripe projector and one or more CCD
cameras. These can be used for face reconstruction with
relatively inexpensive equipment compared to laser scanners.
The accuracy of both systems is satisfactory for static objects,
however their acquisition time ranges from a couple of seconds
to half of a minute, depending on the size of the surface to
measure. Thus, a person must remain stationary during the
measurement. Not only does this place a burden on the subject,
but it is also difficult to obtain stable measurement results. In
fact, even when the acquisition time is short, the person moves
slightly unconsciously.
A different approach to face modeling uses images as source
data. Various image-based techniques have been developed.
They can be distinguished by the type of used image data: a
single photograph, two orthogonal photographs, a set of images,
video sequences or multi-images acquired simultaneously.
Parametric face modeling techniques (Blanz and Vetter, 1999)
start from a single photograph to generate a complete 3-D
model of the face. Exploiting the statistics of a large data set of
3-D face scans, the face model is built by applying pattern
classification methods. The results are impressively realistic,
however the accuracy of the reconstructed shape is low.
A number of researchers have proposed creating models from
two orthogonal views (Ip and Yin, 1996). Manual intervention
is required for the modeling process by selecting feature points
in the images. It is basically a simplified method to produce
realistic models of human faces. The obtained shape does
-241—
RE 7 PETE EEE