A NEW TECHNIQUE FOR OBTAINING DEPTH
INFORMATION
FROM A MOVING SENSOR”
H. Harlyn Baker
Robert C. Bolles
David H. Marimont
SRI International
333 Ravenswood Avenue
Menlo Park, CA 94025.
Abstract
We present a new approach to depth measurement, one which combines the advantages
of narrow and wide baseline imaging — giving easier matching and greater accuracies. The
technique works with a sequence of images, and unifies the spatial and temporal analysis of
data obtained from a camera moving in a straight line. The technique is based on the use
of a dense sequence of images — images taken sufficiently close together that they have both
spatial and temporal continuity. The sequence of data then forms a solid, slices of which
directly encode changes due to the motion of the camera. We will discuss the theory behind
this technique, describe our current implementation of the process, present our preliminary
results, and, finally, comment on the direction in which we are taking the work.
1: Introduction
Most approaches to depth measurement through stereo analysis suffer from the dichotomy of
choosing between a wide baseline, with high precision of matched features but increased match
failures and increased difficulties of perspective and occlusion effects, and a narrow baseline,
with easy matching but poor depth accuracy. This is an obvious limitation confronted whenever
depth measurements must be made on the basis of just 2 views of a scene. In this work we
take a direction that gains the advantages of both approaches. We process a large number
of closely spaced images: the close spacing makes matching easy, while the large number of
images means a wider baseline, and therefore higher accuracy. While bearing the increased
cost of processing the large number of images, and having to know precisely the position and
attitude of the camera at each imaging site, our technique brings significant advantages in
accuracy and reliability over existing depth measurement approaches.
Since we work with a sequence of images, this research has closer connection to motion detection
than it has to traditional stereo analysis. Although most motion-detection techniques (e.g.,
[Barnard 1980], [Haynes 1983], and [Hildreth 1984]) analyze pairs of images, and hence are
undamentally similar to conventional stereo techniques, a few researchers have considered
sequences of three or more images (e.g., [Nevatia 1976], [Ullman 1979], and [Yen 1983]). Even
in these, the process is one of matching discrete items at discrete times. Furthermore, image
matching techniques are designed to process images that contain significant changes from one
to another — features may move more than a score of pixels between views. These large changes
force the techniques to tackle the difficult problem of stereo correspondence (see [Baker 1982]).
Our approach, on the other hand, is to take a sequence of images from positions that are very
close together — close enough that almost nothing changes from one image to the next. In
particular, we take images close enough together that none of the image features moves more
than a few pixels (Figure 1 shows the first four images from one of our sequences containing 125
images). This sampling frequency guarantees a continuity in the temporal domain that is similar
to continuity in the spatial domain. Thus, an edge of an object in one image appears temporally
adjacent to (within a pixel or so of) its occurrence in both the preceding and following images.
This temporal continuity makes it possible to construct a solid of data in which time is the
third dimension and continuity is maintained over all three dimensions (see Figure 2). This
solid of data is referred to as spatio-temporal data.
"This research was supported by DARPA Contracts MDA 903-83-C-0027 and DACA 76-85-C-0004
120