a
preceding iterations and compute for it mean y and standard devia-
tion c. Empirically we found that 1 is a good threshold for 1 and 0.1
for a.
Finally, sub-pixel estimation of the surface is based on a parabola in-
volving the matching scores of the voxels having a lower and higher
disparity than the given disparity for a pixel. We found that sub-
pixel-interpolation improves the smoothness of the result. Yet, eval-
uation criteria such as the percentage of bad pixels mostly deterio-
rates while the RMS is only very slightly improved.
6 VIEW SYNTHESIS WITH THE TRIFOCAL TENSOR
We use the view synthesis scheme proposed in (Avidan and Shashua,
1998). The basic idea is to use calibrated imagery together with a
disparity map. With the latter, points corresponding to given points
in the first image are obtained for the second image. At least a
weak calibration is necessary to make a navigation through the im-
age meaningful for the user as only then rotation matrices and trans-
lation vectors are defined in a Euclidean sense.
The trifocal tensor is initially instantiated from the fundamental ma-
trix
jk. n jh
7" = el Ph,
where &"* is the cross-product tensor and 77; is F in tensor notation.
Then, the view synthesis is accomplished by modifying the trifocal
tensor by rotation matrices R (R? in tensor notation) and translation
vectors £ given by the user. The modified tensor is
gi* — RET + tad
where aj is the first part A of the calibrated projection matrix of the
second camera.
The actual projection is done based on the optimized scheme pro-
posed in Section 3. The synthesized image is produced indirectly
by mapping the pixels via the affine transformation obtained by the
known coordinates of triangle meshes in the given and the synthe-
sized image.
We have obtained results based on calibrated test data of ISPRS
Working Group V/2 and images courtesy of the Robotvis group at
INRIA. Figure 5 from the V/2 data set shows the result for the dis-
parity estimation and view synthesis for the first two views of Figure
2. One can clearly see the corner structure. Also Figure 6 is from
the V/2 data set. Finally, in Figure 7 based on images from INRIA
one can see how the chair, which is relatively close to the camera,
occludes the background when moving the camera.
b
Figure 2: Image triplet with points and corresponding epipolar lines. a) Epipolar lines for b) and c) b) and c) Epipolar lines for a) only
C
7 CONCLUSIONS
In this paper we have presented the estimation of the fundamental
matrix as well as of the trifocal tensor, an improved novel approach
for disparity estimation, and the use of the trifocal tensor for view
synthesis. All results presented have been obtained totally automat-
ically, without any user interaction. The same parameters have been
used for all examples. While the estimation of the fundamental ma-
trix and the trifocal tensor based on pyramids, least squares match-
ing, and RANSAC works reliable for a wide range of imagery, the
end-to-end automation of view synthesis still is an intricate problem.
Especially we still need to improve the disparity estimation.
Opposed to the determination of the orientation, which is defined
by very few parameters and is therefore a highly redundant problem,
disparity estimation aims at determining many parameters. Although
the approach we are using is relatively sophisticated, the results are
in many instances unstable and not really good. One way for im-
provement would be to utilize sophisticated recent approaches based,
e.g., on graph cuts (Kolmogorov and Zahib, 2002) or on Markov ran-
dom fields and belief propagation (Sun et al., 2002). Another way
would be to use more images. This makes it computationally more
expensive as the simple epipolar geometry cannot be used any more.
As the most important problem is the determination of approximate
values, a combination with direct sensors with possibly a lower reso-
lution such as cheap laser-scanners planned, e.g., for airbag inflation
control, might be considered for the application domain of video
communication.
Finally, new results show that an image can be synthesized by a large
number of views by matching the gray value profiles on correspond-
ing lines of view (epipolar lines) (Irani et al., 2002).
ACKNOWLEDGMENTS
We thank Peter Krzystek for making us available his code for least
squares matching.
REFERENCES
Avidan, S. and Shashua, A., 1998. Novel View Synthesis by Cas-
cading Trilinear Tensors. IEEE Transactions on Visualization and
Computer Graphics 4(4), pp. 293-306.
Carlsson, S., 1995. Duality of Reconstruction and Positioning from
Projective Views. In: IEEE Workshop on Representation of Visual
Scenes, Boston, USA.
—196—
F
Fis
Par
anc
PP-
Föl
PP-
Fói
Pre
Fe:
ing
Te
14
Ira
Lo
on