ON
ated in many
or animation,
ized hardware
human bodies
he calibration
' least squares
'econstruction
brated image
ee parts:
> (section 2)
on 3)
ce and point
nimation and
E). Its goal is
of characters
| information
lera or with a
human body
for the time
cquired. The
10vements of
he procedure
acquire a full
L of 6) of a
resolution of
lera constant
'onstant. If a
leo has to be
fects must be
ON
equisites for
applications
mation from
F orientation
19% century
Igorithms is
e orientation
lel, ie. the
one. Camera
models based on perspective collineation have high stability,
require a minimum of three corresponding points per image and
a stable optics. On the other hand, projective approaches can
deal with variable focal length, but need more parameters, a
minimum of six corresponding points and are quite instable
(equations need normalization).
The calibration and orientation process is the core of the
presented work and is based on a photogrammetric bundle-
adjustment (section 3.3); the required tie points (image
correspondences) are found automatically with the following
steps:
e interest points extraction from each image;
e matching of potential feature pairs between adjacent
images;
e false matches clearing using local filtering;
e epipolar geometry computation to refine the matching
process and remove any outliers;
* correspondences tracking in all the image sequence.
In the following section these steps are described. The process
is completely automated; it is similar to [Fitzigibbon et al.,
1998] and [Roth et al, 2000], but additional changes and
extensions to these algorithms are presented and discussed.
Fig.1: Three images (out of six) used for the reconstruction
3.1 Finding image correspondences
The first step is to find a set of interest points or corners in each
image of the sequence. Harris corner detector is used. The
threshold on the number of corners extracted is based on the
image size. A good point distribution is assured by subdividing
the images in small patches and keeping only the points with
the highest interest value in those patches.
The next step is to match points between adjacent images. At
first cross-correlation is used and then the results are refined
using adaptive least square matching (ALSM) [Grün, 1985].
The cross-correlation process uses a small window around each
point in the first image and tries to correlate it against all points
that are inside a bigger window in the adjacent image. The
point with biggest correlation coefficient is used as
approximation for the matching process. The process returns
the best match in the second image for each interest point in the
first image. The final number of possible matches depends on
the threshold parameters of the matching and on the disparity
between image pairs; usually it is around 40% of the extracted
points.
The found matched pairs always contain outliers, due to the
unguided matching process. Therefore a filtering of false
correspondences has to be performed. A process based on
disparity gradient concept is used [Klette et al., 1998]. If Py grr
and Pgrigur as well as Qyerr and Qricur are corresponding
points in the left and right image, the disparity gradient of two
points (P,Q) is the vector G defined as:
a .1DXP) - D(Q)| [1]
Des (P,Q)
where:
D(P)= (Prerrx - Pricurxs PLerr.y - Pricur,y) is the parallax of
P, e.g. the pixel distance of P between the 2 images;
D(Q) = (Querrx - QriauT,x> Querty - Quianr.y) is the parallax of
Q, e.g. the pixel distance of Q between the 2 images;
Des = [(PLerr + Prigur)/2, (Queer + Quianr)/2] is the cyclopean
separator, e.g. the difference between the two midpoints of the
straight line segment connecting a point in the left image to the
corresponding in the right one.
Per == P
ec = right
\
\D.,
oy À
Q S
let Qn
Figure 2: The disparity gradient between two correspondences
(P and Q) in image left and right.
If P and Q are close together in both images, they should have a
similar parallax (e.g. a small numerator in equation 1).
Therefore, the smaller the disparity gradient G is, the more the
two correspondences are in agreement. This filtering process is
performed locally and not on the whole image, because the
algorithm can achieve incorrect results due to very different
disparity values and in presence of translation, rotation,
shearing and scale between consecutive images. The sum of all
disparity gradients G of each matched point relative to all other
neighbourhood matches is computed. Those matches that have
a disparity gradient sum greater than the median of the sums of
G are removed. The process removes ca. 8096 of the false
correspondences. Other possible approaches to remove false
matches are described in [Pilu, 1997] and [Zhang et al., 1994].
The next step performs a pairwise relative orientation and an
outlier rejection using those matches that pass the previous
filtering step. Based on the coplanarity condition, the process
computes the projective singular correlation between two
images [Niini, 1994], also called epipolar transformation
(because it transforms an image point from the first image to an
epipolar line in the second image) or fundamental matrix (in
case the interior orientation parameters of both images are the
same) [Faugeras et al., 1992]. The fundamental matrix Mj; is
defined by the equation:
pi Mj, p, -0 with pj ={x, yi Ir [2]
for every pair of matching points p;, p; (homogeneous vectors)
in image 1 and 2. The epipoles of the images are defined as the
right and left null-space of M;; and can be computed with
singular value decomposition of Mj. A point p; in the second
image lies on the epipolar line I, defined as 1, = M,p; and
must satisfy the relation p,l, =0. Similarly, 1; = Mp,
represents the epipolar line in the first image corresponding to
p» in the second image. The 3x3 singular matrix M can be
computed just from image points and at least 8 correspondences
are needed to compute it. Many solutions have been published
to compute M, but to cope with possible blunders, a robust
method of estimation is required. In general least median
estimators are very powerful in presence of outliers; so the
Least Median of the Squares (LMedS) method is used to
achieve a robust computation of the epipolar geometry and to
reject possible outliers [Scaioni, 2001; Zhang et al, 1994].
LMedS estimators solve non-linear minimization problems and
yield the smallest value for the median of the squared residuals
computed for the data set. Therefore they are very robust in
case of false matches or outliers due to false localisation.
—591—