CHARACTER RECONSTRUCTION AND ANIMATION
FROM MONOCULAR SEQUENCE OF IMAGES
Fabio Remondino
Institute for Geodesy and Photogrammetry, ETH Zurich, Switzerland
E-mail: fabio@geod.baug.ethz.ch
Commission V, ICWG V/IH
KEY WORDS: Calibration, Orientation, Matching, Reconstruction, Body Modeling, Animation
ABSTRACT
In this paper we present different methods for the calibration and orientation of monocular image sequences and the 3D
reconstruction of human characters. Three different situations are considered: a static character imaged with a moving camera, a
moving character imaged with a fix camera and a moving character imaged with a moving camera. A self-acquired sequence is used
in the first case while in the other cases we used existing sequences available on the Internet or digitized from old videotapes. Most
of the image-based techniques use probabilistic approaches to model a character from monocular sequences; on the other hand we
use a determinist approach, recovering character's model and movement through a camera model. The recovered human models can
be used for visualization purposes, to generate new virtual scenes of the analyzed sequence or for gait analysis.
1. INTRODUCTION
The realistic modeling of human characters from video
sequences is a challenging problem that has been investigated a
lot in the last decade. Recently the demand of 3D human
models is drastically increased for applications like movies,
video games, ergonomic, e-commerce, virtual environments
and medicine. In this short introduction we consider only the
passive image- and triangulation-based reconstruction methods,
neglecting those techniques that do not use correspondences
(e.g. shape from shading) or computer animation software. A
complete human model consists of the 3D shape and the
movements of the body (Table 1): most of the available systems
consider these two modeling procedures as separate even if they
are very closed. A standard approach to capture the static 3D
shape (and colour) of an entire human body uses laser scanner
téchnology: it is quite expensive but it can generate a whole
body model in ca 20 seconds. On the other hand, precise
information related to character movements is generally
acquired with motion capture techniques: they involve a
network of cameras and prove an effective and successfully
mean to replicate human movements. In between, single- or
multi-stations videogrammetry offers an attractive alternative
technique, requiring cheap sensors, allowing markerless
tracking and providing, at the same time, for 3D shapes and
movements information. Model-based approaches are very
common, in particular with monocular video streams, while
deterministic approaches are almost neglected, often due to the
difficulties in recovering the camera parameters. The analysis
of existing videos can moreover allow the generation of 3D
models of characters who may be long dead or unavailable for
common modeling techniques.
; Single-station Multi-stations
3D Shape Active Videogrammetry Videogrammetry
Sensors Howe [2000] Gavrila [19967
€ Sidenbladh [2000] Yamamoto[98]
Movements Motion Sminchisescu [02] Vedula [1999]
Capture Remondino [02, 03] D'Apuzzo [03]
Table 1: Techniques for human shape and movements modeling.
In this paper we present the analysis of monocular sequences
with the aim of (1) generating reliable procedures to calibrate
and orient image sequences without typical photogrammetric
information and (2) reconstruct 3D models of characters for
visualization and animation purposes. The virtual characters
can be used in areas like film production, entertainment, fashion
design and augment reality. Moreover the recovered 3D
positions can also serve as basis for the analysis of human
movements or medical studies.
2. RECOVERING CAMERA’S PARAMETERS
APPROXIMATIONS FROM EXISTING SEQUENCES
As we want to recover metric information from video
sequences (3D characters, scene models or human movement
information), we need some metric information about the
camera (interior and exterior parameters) and the images (pixel
size). The approximations of these parameters are also
necessary in the photo-triangulation procedure (bundle
adjustment), as we must solve a non-linear problem, based on
the collinearity fundamental condition, to obtain a rigorous
solution. We assume that we do not know the parameters of the
used camera and that we can always define some control points,
knowing the dimensions of some objects in the imaged scene.
The pixel size is mainly a scale factor for the camera focal
length. Its value can be recovered from a set of corresponding
object and image coordinates distributed on a plane.
The camera interior parameters can be recovered with an
approach based on vanishing point and line segments clustering
[Caprile et al., 1990; Remondino, 2002] or with orthogonality
conditions on line measurements [Krauss, 1996; Van den
Heuvel, 1999]. If the image quality does not allow the
extraction of lines, the decomposition of the 3x4 matrix of the
projective camera model can simultaneously derive the interior
parameters given at least 6 control points [Hartley et al., 2000;
Remondino, 2003].
Concerning the exterior parameters, an approximate solution
can be achieved with a closed form space resection [Zeng et al.,
1992] or the classical non-linear space resection based on
collinearity, given more than 4 points. The DLT method can
sequentially recover all the 9 camera parameters given at least 6
control points [Abdel-Aziz et al, 1971]. DLT contains Hu
parameters, where two mainly account for film deformation: if
no film deformation is present, two constraints can be add to
solve the singularity of the redundant parameters [Bopp et al,
1978]. Other approaches are also described in [Slama, 1980;
Criminisi, 1999; Foerstner, 2000; Wolf et al., 2000].
Foi
36
cat
car
pre
she
ob!
of
an
Fi