' "• : ' '
photogrammetric applications, such as quality control, camera
calibration, sensor navigation and object reconstruction.
Reviews of references on algorithms for estimation of
motion/structure parameters from image sequences have been
provided by Aggarwal and Nandhakumar (1988), and Huang
and Netravali (1994).
3.2 Short Range vs. Long Range (Continuous vs. Discrete)
Motion Analysis
Generally, there are two complementary classifications of
schemes to compute visual motion. The first classifies according
to the spatio-temporal range over which methods are applicable,
analogous to the human visual system: (1) short range motion
(continuous) process and long range motion (discrete)
process. The second classification distinguishes between the
fundamentally different processes involved: (2) optical flow
and correspondence. In fact, the optical flow scheme, which
uses image gradients to derive image motion, is intrinsically
restricted t short range, while correspondence or similarity
matching schemes can be of short range or long range.
In terms of short range motion analysis, images are taken at
video rate. Thus the emphasis is generally placed on the
estimation of the optical flow field between two successive
frames, or on the direct use of the spatio-temporal derivatives of
the image brightness. These observations must also be
combined with a measure of the camera velocity (instead of
camera displacement) to determine the 3-D structure of objects.
In long range motion analysis, images are acquired at larger
time intervals, and a large camera displacement is observed.
Since the image motion of the features is “large” compared to
the temporal sampling rate, the eye has to solve the
correspondence problem, i.e., it has to establish which feature at
one time instant corresponds to which feature at the next time
instant. Therefore, in long range motion analysis, a set of
relatively sparse, distinguishable two-dimensional features, such
as points, straight lines, curved lines, corners and regions, in the
successive images is firstly extracted. Secondly, feature
correspondences are established between consecutive features,
and finally, the 3-D structure of the object and its relative
motion with respect to the camera can be determined based on
the motion of these features. It is worth mentioning that most of
the research for long range motion analysis has concentrated on
determining motion estimation and feature correspondences
over a short image sequence (i.e., two to three images).
In general, if the scene has many easily identifiable feature
points or lines, the discrete approach based on feature
correspondence is suitable. If the surfaces in the scene are
smooth and have no texture, then the continuous approach
based on intensity derivatives is better. However, robust and
accurate computation of feature correspondence and optical
flow still remains a difficult problem. The optical flow field is
often corrupted by image noise or occlusion, leading to
generally poor and unstable results in the 3-D reconstruction.
Feature correspondence also easily fails in areas where either
the distortion is large, or the occlusion occurs. Hybrid
approaches combining both feature correspondence and optical
flow would be a way to alleviate the above problems (Baker et
al., 1994; Hanna and Okamoto, 1993; and Navab and
Zhang, 1994).
The research showed that optical flow field based approach is
not suitable for the VISAT images, since the image capture
. si:!
interval is about 0.4 second and the camera movement between
the imaging intervals is large and of the order of 6-10 meters.
Intuitively, our research falls in the category of long range
motion analysis. However, compared to the processing of
monocular image sequences commonly addressed in the
literature, we are dealing with binocular image sequences. Such
redundant image information allows us to develop more robust
algorithms for the processing of image sequences. In this
research, feature correspondence and image matching
techniques are mainly used in the proposed methods. There are
a number of good references available with reviews of
techniques for feature correspondence and image matching
(Agouris, 1992; Baltsavias, 1991; Barnard and Fischler, 1982;
Dhond and Aggarwal, 1989; Forstner, 1993; Gruen, 1994;
Jones, 1997, and Lemmens, 1988 and Mass, 1996).
3.3 Visual Motion Analysis with Known Ego-Motion
Vision analysis with known ego-motion refers to motion
analysis under known dynamics of the camera (observer). In
fact, known ego-motion analysis forms the basis of an active
vision system. Under the condition of known ego-motion, the 3-
D reconstruction problem can be solved more efficiently. This
fact has motivated some investigations (Aloimonos et al., 1988;
Bajcsy, 1988). On the other hand, accurate geometric
constraints, such as the epipolar line constraint, are also
available, resulting in a more robust realization of feature
correspondences.
In the VISAT mobile mapping system, the kinematic trajectory
of the vehicle can be determined with a high accuracy of 5-15
cm, and the camera dynamics can be examined rigorously by
using GPS/INS georeferencing technique (Schwarz and El-
Sheimy, 1996). As a result, visual analysis can be conducted
under the constraint of known ego-motion. It will be seen that
this constraint is very valuable for automating and optimizing a
reliable procedure for object measurement and feature
extraction.
3.4 Active Vision
A very important advance in the theoretical framework of
computer vision is the concept of active vision, proposed by
Aloimonos et al. (1988). Active vision represents a behaviorism
school, which is directly opposite to Marr’s theory of vision, a
recovery school (Marr, 1982).
There is a noncontroversial observation that vision is an
underconstrained problem. Thus the main goal of vision work is
to find and develop constraints. However, rather than focusing
on narrow sources of constraints, mostly oversimplified
constraints such as smoothness constraints widely used in the
recovery school, it is argued that one must exploit constraints
from all possible sources and incorporate them systematically.
The basic idea of active vision is the introduction of a new
source of constraints arising from the internal architecture of the
system itself and the iteration of its components, such as
observer-based constraints, e.g., the sensor and/or the computer
(Jolion, 1994). Under the constraint that the active observer
moves with known motion, a unique solution is available,
resulting in a well-posed formulation of the problem. Moreover,
the knowledge of these viewpoints of the active observer
increases the robustness to noise.
The known motion of the observer can be determined by the use
of advanced navigation technology. In this context, an imaging
2-5-3
■
3É6