ind the 3D
2 camera, a
ence is used
tapes. Most
ier hand we
models can
| characters
ent, fashion
overed 3D
; of human
TERS
UENCES
rom video
movement
about the
ages (pixel
s/ are also
© (bundle
n, based on
a rigorous
eters of the
1trol points,
ed scene.
imera focal
rresponding
cd with an
s clustering
thogonality
Van den
allow the
atrix of the
the interior
:t al., 2000;
ite solution
Zeng et al,
| based on
method can
en at least 6
-ontains 11
ormation: if
1 be add to
Bopp et all,
ama, 1980;
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part BS. Istanbul 2004
3. MODELING A STATIC CHARACTER WITH AN
IMAGE SEQUENCE
For the complete reconstruction of a static human model, a full
360 degree azimuth image coverage is required. A single
camera, with a fix focal length, is used. The image acquisition
can last 30 seconds and this could be considered a limit of the
process, as no movement of the person is required. The 3D
shape reconstruction from the uncalibrated image sequence ls
obtained with a process based on (1) calibration and orientation
of the images, (2) matching process on the human body surface
and (3) 3D point cloud generation and modeling.
3.1 Camera calibration and image orientation
The calibration. and orientation of the images have to be
performed in order to extract precise 3D information of the
character. The process (Figure 1) is based on a
photogrammetric bundle adjustment; the required image
correspondences (section 3.1.1) are found with an improved
version of a process already presented in [Remondino, 2002].
Features : : Cross-Corr
| Extraction nes + ALSM ;
3 Views Pairwise |
Epipolar Geometry Epipolar Geometry |. |
Correspondences | |, — Self-calibrating
Tracking Bundle Adjustment
Automated Tie Points Extraction
Approximations
Unknown
Parameters
Figure 1: The camera calibration and image orientation pipeline.
3.1.1 Automatic tie points extraction
Most of the presented systems [e.g. Fitzgibbon et al., 1998;
Pollefeys et al, 1998; Roth et al, 2000] developed for the
orientation of image sequences with automatic extraction of
corresponding points require very short baseline between the
images (typically called 'shape-from-video') Few strategy
instead can reliable deal with wide-baseline images [Tuytelaars
et al, 2000; Remondino, 2002]. Our approach extracts
automatically corresponding points with the following 6 steps:
I. Interest points identification. A set of interest points or
corners in each image of the sequence is extracted using
Foerstner operator or Harris corner detector with a threshold
on the number of corners extracted based on the image size.
A good point distribution is assured by subdividing the
images in small patches (9x9 pixel on an image of
1200x1600) and keeping only the points with the highest
interest value in those patches.
2. Correspondences matching. The extracted features between
adjacent images are matched at first with cross-correlation
and then the results are refined using adaptive least square
matching (ALSM) [Gruen, 1985]. Cross-correlation alone
cannot always guarantee the correct match while the ALSM,
with template rotation and reshaping, provides for more
accurate results. The point with biggest correlation coefficient
is used as approximation for the template matching process.
The process returns the best match in the second image for
each interest point in the first image.
3. Filtering false correspondences. Because of the unguided
matching process, the found matched pairs often contain
outliers. Therefore a filtering of the incorrect matches is
performed using the disparity gradient between the found
correspondences. The smaller is the disparity gradient, the
more the two correspondences are in agreement. The sum of
all disparity gradients of each matched point relative to all
other neighbourhood matches is computed. Those matches
that have a disparity gradient sum greater than the median of
the sums are removed. In case of big baselines or in presence
(at the same time) of translation, rotation, shearing and scale
between consecutive images, the algorithm can achieve
incorrect results if applied on the whole image: therefore the
filtering process has to be performed on small image regions.
4. Epipolar geometry between image pairs. À pairwise relative
orientation and an outlier rejection using those matches that
pass the filtering process are afterwards performed. Based on
the coplanarity condition, the fundamental matrix is
computed with the Least Median of the Squares (LMedS)
method; LMedS estimators solve non-linear minimization
problems and yield the smallest value for the median of the
squared residuals computed for the data set. Therefore they
are very robust in case of false matches or outliers due to
false localisation. The computed epipolar geometry is then
used to refine the matching process (step 3), which is now
performed as guided matching along the epipolar line.
5. Epipolar geometry between image triplets. Not all the
correspondences that support the pairwise relative orientation
are necessarily correct. In fact a pair of correspondences can
support the epipolar geometry by chance (e.g. a repeated
pattern aligned with the epipolar line). These kinds of
ambiguities and blunders are reduced considering the epipolar
geometry between three consecutive images. A linear
representation for the relative orientation of three frames is
represented by the trifocal tensor T [Shashua, 1994]. T is
represented by a set of three 3x3 matrices and is computed
from at least 7 correspondences without knowledge of the
motion or calibration of the cameras. In our process, the
tensor is computed with a RANSAC algorithm [Fischler et
al., 1981] using the correspondences that support two
adjacent pair of images and their epipolar geometry. The
RANSAC is a robust estimator, which fits a model (tensor T)
to a data set (triplet of correspondences) starting from a
minimal subset of the data. The found tensor T is used (1) to
verify whether the image points are correct corresponding
features between three views and (2) to compute the image
coordinates of a point in a view, given the corresponding
image positions in the other two images. This transfer is very
useful when in one view are not found many
correspondences. As result of this step, for each triplet of
images, a set of corresponding points, supporting the related
epipolar geometry is recovered.
6. Tracking image correspondences through the sequence.
After the computation of a T tensor for every consecutive
triplet of images, we consider all the overlapping tensors (e.g.
Tis Toe Taas, ..) and we look for those correspondences
which support consecutive tensors. That is, given two
adjacent tensors Ta, and Ty,4 with supporting points (<a.Yas
XpYb» Xoyc and (X iy p X oy e X ey a). if Guy XoYo) in the
first tensor T4,, is equal to (x yy'p, X'aoy';) in the successive
tensor Ty, this means that the point in images a, b, c and d is
the same and therefore this point must have the same
identifier. Each point is tracked as long as possible in the se-
quence and the obtained correspondences are used as tie
points for the successive bundle-adjustment.
3.1.2. Photo-triangulation with bundle-adjustment
A photogrammetric self-calibrating bundle-adjustment is
performed using the found image correspondences and the
approximations of the unknown camera parameters (section 2).
Because of the network geometry and the lack of accurate