control of image processing $
C E point in time when the next
ntern. represen
2 world (2) measurement is going to be
3 .
3 B | spatio-temporal taken. By applying forward
S | medets dox perspective projection to fea-
— shapes (3D) :
— motion laws tures measured, using the same
= = envre laws mapping conditions as the
3 goals . .
SE + evaluations video camera, a model image
œ . .
9 + control effectiveness can be generated, which
actions
should duplicate the measured
Real world (1) Generation Discrepancy
ol Interpretation:
Processes in space object- minimum square
and time hypotheses, [~~ n Sce & time
— phys. objects aspect- Cr Single processes
— motion conditions Elim. of | Jacobian
outliers matrix Em
— el.-magn. radiation 7 , Tk 7 TK
+ other subjects with 9s $
te
internal repres. ee S on
and goal A © axi
goais 9,
Jo, DZ S 4
; 30 % 5
i 2 :
2396 se 0 pes XF, \ [Feature © [Feature
DO Hz (6g Y Jextraction a. (prediction
) N -edges nominal
-corners and syst.
[areas + — | varied
color (model
Hntensities | image)
controls yy)
NS a . . . .
Te specie m J| image if the situation has been
Coppia (gero interpreted properly. The sit-
uation is thus ’imagined’ (right
and lower center right in fig.8).
The big advantage of this ap-
proach is that due to the inter-
Fig.8: Survey block diagramm of 4D approach
The key tools for integrating space and time in the
internal representation are the dynamical models, which
are used for capturing the behavior over time of a physical
process. As usual in rigid body mechanics, the motion of
bodies is separated into center-of-gravity (cg) translation
and rotation around the cg. These motion components are
described by ordinary differential equations including the
effects of control input. For digital control, transition
matrices and control effect matrices are derived using well
known methods.
Control inputs to the mobile robot carrying the vision
system lead to changes in the visual appearance of the
world through egomotion. The continous motion of the
vehicle and the relative position in the world over time is
sensed by conventional black and white video cameras.
They record the incoming light intensity from a certain
field of view at a fixed sampling rate. By this imaging
process the information flow is discretized in several
ways.
There is a limited spatial resolution in the image plane
and a temporal discretization of 16 2/3 or 20 ms, usually
including some averaging over time. This reduces the data
flow to a sequence of 2D arrays at fixed time intervals (20
ms). Instead of trying to invert this image sequence for
3-D-scene understanding, a different approach by analysis
through synthesis has been selected. From previous
human experience, generic models of objects in the 3-D-
world are assumed to be known in the interpretation
process. This comprises both 3-D shape, recognizable by
certain feature aggregations, given the aspect conditions,
and motion behavior over time. In an initialisation phase,
starting from a collection of features extracted by the low
level pel processing (BVV 2, lower center left in fig.8),
object hypotheses including the aspect conditions and the
motion behavior (transition matrices) in space have to be
generated (upper center left). The motion capabilities of
the robot, which are constraints characterizing the object,
are represented by difference equations, describing the
state evolution. With the help of these so-called dynamical
models, it is possible to predict the object states to that
nal 4-D model not only the
actual situation at the present
time but also the sensitivity
matrix of the feature positions with respect to state changes
can be determined and exploited over time, the socalled
Jacobian matrix. This rich information is then used for
adjusting the state estimates recursively in a least squares
manner based on the differences between the predicted
and the measured feature positions. By this approach, the
nonunique inversion of the perspective projection is by-
passed based on the continuity conditions captured in the
spatio-temporal world model (4-D model). For details see
[Dickmanns, Graefe 88] and the references given there.
This approach has several very important practical advan-
tages:
- no previous images need be stored and retrieved for
computing optical flow or velocity components in the
image as an intermediate step;
- the transition from signals (pel data in the image) to
symbols (spatio-temporal motion state of objects) is done
ina very direct way, well based on higher level knowledge,
the 4-D world model integrating spatial and temporal
aspects;
- intelligent nonuniform image analysis becomes possible,
allowing to concentrate computer resources to areas of
interest known to carry meaningful information;
- viewing direction control can be done directly in an
object-oriented manner
- the image processing computer architecture can be
structured modularly according to the internal
representation of spatial objects.
Dynamical model
As mentioned above, it is intented to recover the actual
positions relative to landmarks by measuring their feature
position in a temporal image sequence. The prime interest
within a known planar surrounding is the position
(xp, yp) and the angular orientation (V) of the vehicle.
Control inputs (Ua, Uv ) result either in acceleration in
longitudinal direction (Vp ) or in turning the front wheel