IMAGE SEQUENCE ANALYSIS FOR HUMAN BODY RECONSTRUCTION
Fabio Remondino
Institute for Geodesy and Photogrammetry, ETH Zurich, Switzerland
E-mail: fabio@geod.baug.ethz.ch
Commission V, ICWG V/III
KEY WORDS: Camera Calibration, Least Squares Matching, Reconstruction
ABSTRACT
The generation of 3-D models from uncalibrated image sequences is a challenging problem that has been investigated in many
research activities in the last decade. In particular, a topic of great interest is the modeling of realistic humans, for animation,
manufacture or medicine purposes. Nowadays the common approaches try to reconstruct the human body using specialized hardware
(laser scanners) resulting in high costs. In this paper a different method for the three-dimensional reconstruction of human bodies
from image sequences acquired with a standard video-camera is presented. The core of the presented work describes the calibration
and orientation of the images but the whole process includes also the extraction of correspondences on the body using least squares
matching and the reconstruction of the 3-D body model.
1. INTRODUCTION
The actual interests in 3-D object reconstruction are motivated
by a wide spectrum of applications, such as object recognition,
city modeling, video games, animations, surveillance and
visualization. In the last years, great progress in creating and
visualizing 3-D models from images has been made, with
particular attention to the visual quality of the results. The
existing systems are often built around specialized hardware
(e.g. laser scanner), often resulting in high costs. Other methods
based on photogrammetry [Grün et al, 2001; Remondino,
2002] or computer vision [Pollefeys, 2000], can instead obtain
3-D models of objects with low cost acquisition systems, using
photo or video cameras. Since many years, photogrammetry
deals with high accuracy measurements from image sequences,
including 3-D object tracking [Maas, 1991], deformation
measurements or motion analysis [D'Apuzzo et al., 2000]; even
if these applications require very precise calibration, automated
and reliable procedures are available.
Concerning the reconstruction and modeling of human bodies,
nowadays the demand for 3-D models has drastically increased.
A complete model of a human consists of both the shape and
the movements of the body. These two modeling processes are
often considered as separate even if they are very close. A
classical approach to build human shape models uses 3-D
scanners [Cyberware, 2002, Vitus, 2002, Horiguchi, 1998]: they
are expensive but simple to use and software are available to
edit and model the obtained point cloud. Other techniques use
structured light methods [Wolf, 1996], silhouette extraction
[Zheng, 1994], multi-image photogrammetry [D'Apuzzo, 2002].
The human body models can be used in different fields, like
animation, manufacturing or medicine. For animation purpose,
only approximative measurements are necessary: the shape can
be first defined (e.g. smoothing 3-D mesh with splines,
attaching generalized cylinders or volumetric primitives to a
skeleton) and then animated using motion capture data. For
medical applications or in manufacture industries, digital
surfaces are required for metric body information and design of
clothes [McKenna, 1996]; therefore exact 3-D models of the
body are needed and usually performed with scanning devices
[Tailor, 2002].
In this paper a photogrammetric approach for the reconstruction
of 3-D models of static humans from uncalibrated image
sequences is described. The process consists of three parts:
1) Acquisition and analysis of the image sequence (section 2)
2) Calibration and orientation of the images (section 3)
3) Matching process on the human body surface and point
cloud generation (section 4).
This work belongs to a project called Characters Animation and
Understanding from SEquence of images (CAUSE). Its goal is
the extraction of complete 3-D animation models of characters
from old movies or video sequences, where no information
about the cameras and the objects are available.
2. IMAGE ACQUISITION
The images can be acquired with a still-video camera or with a
camcorder. A complete reconstruction of the human body
requires a 360 degrees azimuth coverage, while, for the time
being, only frames in front of the body are acquired. The
acquisition lasts ca. 30 seconds and requires no movements of
the person. This could be considered a limit of the procedure
but also 3-D scanners need at least 15 seconds to acquire a full
body model. Figure 1 shows three images (out of 6) of a
sequence acquired with a Sony DSC-S70, with a resolution of
768x1024 pixels. During the acquisition, the camera constant
was kept fixed not to deal with varying camera constant. If a
video camera is used (section 5), the acquired video has to be
digitalized and the artefacts created by interlace effects must be
removed.
3. CALIBRATION AND ORIENTATION
OF THE IMAGES
Camera calibration and image orientation are prerequisites for
accurate and reliable results, in particular for those applications
that rely on the extraction of precise 3-D information from
imagery. The early theories and formulations of orientation
procedures were developed in the first half of the 19" century
and today a great number of procedures and algorithms is
available. A fundamental criterion for grouping the orientation
procedures is based on the used camera model, ie. the
projective camera model or the perspective camera one. Camera
mode
requii
a stal
deal :
minin
(equa
The |
presei
adjust
corres
steps:
e
i
r
1
f
e €
C
e
In the
is cor
1998]
extens
Fig.
3.1 F
The fi
image
thresh
image
the im
the hig
The ne
first ci
using
The cr
point i;
that ar
point
approx
the bes
first in
the thr
betwee
points.
The fo
unguid
corresp
dispari:
and P,
points
points (