ISPRS Commission III, Vol.34, Part 3A „Photogrammetric Computer Vision“, Graz, 2002
PARALLEL APPROACH TO BINOCULAR STEREO MATCHING
; Herbert Jahn
DLR, Institute of Space Sensor Technology and Planetary Exploration, Berlin, Germany
Herbert.Jahn@dlr.de
Commission III, WG III/8
KEY WORDS: Vision Sciences, Stereoscopic Matching, Real-time Processing, Dynamic Networks
ABSTRACT:
An approach for parallel-sequential binocular stereo matching is presented. It is based on discrete dynamical models which can be
implemented in neural multi-layer networks. It is based on the idea that some features (edges) in the left image exert forces on
similar features in the right image in order to attract them. Each feature point (i,j) of the right image is described by a coordinate
x(i,j). The coordinates obey a system of time discrete Newtonian equations of motion, which allow the recursive updating of the
coordinates until they match the corresponding points in the left image. That model is very flexible. It allows shift, expansion and
compression of image regions of the right image, and it takes into account occlusion to a certain amount. To obtain good results a
robust and efficient edge detection filter is necessary. It relies on a non-linear averaging algorithm which also can be implemented
using discrete dynamical models. Both networks use processing elements (neurons) of different kind, i.e. the processing function is
not given a priori but derived from the models. This is justified by the fact that in the visual system of mammals (humans) a variety
of different neurons adapted to specific tasks exist. A few examples show that the problem of edge preserving smoothing can be
solved with a quality which is sufficient for many applications (various images not shown here have been processed with good
success). A certain success was also achieved in the main problem of stereo matching but further improvements are necessary.
1. INTRODUCTION
Real-time stereo processing which is necessary in many
applications needs very fast algorithms and processing
hardware. The stereo processing capability of the human visual
system together with the parallel-sequential neural network
structures of the brain (Hubel, 1995) lead to the conjecture that
there exist parallel-sequential algorithms which do the job very
efficiently. Therefore, it seems to be natural to concentrate
effort to the development of such algorithms.
In prior attempts to develop parallel-sequential matching
algorithms (Jahn, 2000a; Jahn, 2000b) some promising results
have been obtained. But in some image regions serious errors
occurred which have led to a new attempt to be presented here.
If one de-aligns both our eyes by pressing one eye with the
thumb then one has the impression, as if one of the images is
pulled to the other until matching is achieved.
This has led to the idea that prominent features (especially edge
elements) of one image exert forces to corresponding features in
the other image in order to attract them. A (homogeneous)
region between such features is shifted together with the region
bounding features whereas it can be compressed or stretched,
because corresponding regions may have different extensions.
Therefore, an adequate model for the matching process seems to
be a system of Newtonian equations of motion governing the
shift of the pixels of one image. Assuming epipolar geometry a
pixel (i) of the left image corresponds to a pixel (i) of the
right image of the same image row. If a mass point with
coordinate x(i',j) and mass m is assigned to that pixel then with
appropriate forces of various origins acting on that point it can
be shifted to match the corresponding point (ij). To match
points inside homogeneous regions, the idea is to couple
neighboured points by springs in order to shift these points
together with the edge points. The model then resembles a little
bit the old model of Julesz which he proposed in (Julesz, 1971)
for stereo matching.
To obtain good results a robust and efficient edge detection
filter is necessary. The filter used here is based on a non-linear
edge preserving smoothing algorithm which can be
implemented with the same type of parallel-sequential
networks, the so-called discrete dynamical networks (Serra,
Zanarini, 1990) which can be described (in 2D notation) by
z; ;( +1)= f, (20) PK, (0) (1)
(i=1..N; j=1...N;)
Here, z;; is a state vector defined in each image point (i,j) (z(t)
denotes the matrix of the z;;(t)), K is an external force vector,
and P is a parameter vector. The initial state z;;(0) is given by a
feature vector which is derived from the given image data.
Then, according to (1), the feature vector is updated recursively
leading to a final state (hopefully a fix point) at t — oo (or
approximately at. t — t,4,). That final state is the result of the
image processing task.
The algorithm (1) is of complexity O(N) (N = N; - N,) if the
number of iterations is limited (^ £,,,). In each iteration step it
needs a constant number n of calculations for every image point
(ij). Then the total number of operations is N - n - fmax-
Therefore, it is very fast if it is implemented in a multi-layer
network structure. Here, each neural layer is assigned to a
discrete time t of (1), and the state of neuron (i,j) in layer t is
given by z(t). Via the (nonlinear) function fj; each neuron (1,])
of layer t+1 is coupled with neurons (k,l) of layer t.
In chapter 2 algorithm (1) is specified to edge preserving
smoothing. Then, chapter 3 is dedicated to stereo matching
within the same framework. Some results are shown. Finally,
in the conclusions some ideas for future research are presented.
A - 175