Close-range imaging, long-range vision

time step 
e process 
used im- 
hronized 
a triangle 
are con- 
je images 
ts quanti- 
reference 
:alibrated 
s. The re- 
ior orien- 
e interior 
ric radial 
] two ad- 
aling and 
1 Of these 
acy in the 
Our ap- 
zo, 2002] 
hod with 
matched 
from few 
et of cor- 
:ntium III 
. matched 
applying 
plate im- 
g process 
ints in the 
matically 
along the 
y gonal re- 
is closest 
ed points, 
matically 
until the 
quality of 
orks adap- 
shift, big- 
d for each 
. covered. 
ints in the 
. matched 
^ction us- 
the cam- 
-D points 
)' Apuzzo 
matching 
  
techniques. Its basic idea is to track triplets of cor- 
responding points in the three images through the se- 
quence and compute their 3-D trajectories. The spa- 
tial correspondences between the three images at the 
same time and the temporal correspondences between 
subsequent frames are determined with a least squares 
matching algorithm. The results of the tracking pro- 
cess are the coordinates of a point in the three images 
through the sequence, thus the 3-D trajectory is deter- 
mined by computing the 3-D coordinates of the point 
at each time step by forward ray intersection. Veloci- 
ties and accelerations are also computed. 
The tracking process is applied to all the points mat- 
ched in the region of interest, resulting in a vector 
field of trajectories (position, velocity and accelera- 
tion), that can be checked for consistency and local 
uniformity of the movement. Key points can be de- 
fined and tracked in the vector field, producing 3-D 
information that can be used to establish the approxi- 
mative posture of the body, e.g. position of joints. 
Figure 2 depicts the output of this tracking process on one 
of the image triplets we work with. 
4 MODEL FITTING 
We use the body model of Section 2 both to track the hu- 
man figure and to recover shape parameters. Our system 
is intended to run in batch mode, which means that we ex- 
pect the two or more video sequences we use have been 
acquired before running our system. It goes through the 
following steps: 
e Initialization: We initialize the model interactively 
in one frame of the sequence. The user has to enter 
the approximate position of some key joints, such as 
shoulders, elbows, hands, hips, knees and feet. Here, 
it was done by clicking on these features in two im- 
ages and triangulating the corresponding points. This 
initialization gives us a rough shape, this is a scal- 
ing of the skeleton, and an approximate model pose. 
Techniques such as those proposed by [Barron and 
Kakadiaris, 2000, Taylor, 2000] could eliminate most 
of the currently necessary interaction. 
e Data Acquisition: We use either clouds of 3-D points 
derived from the input stereo-pairs or triplets using ei- 
ther a simple correlation-based algorithm [Fua, 1993] 
or the higher quality data derived using the techniques 
introduced in Section 3. In the first case, the 3-D 
points form a noisy and irregular sampling of the un- 
derlying body surface. To reduce the size of the cloud 
and begin eliminating outliers, we robustly fit local 
surface patches to the raw 3-D points [Fua, 1997] and 
use the center of those patches as input to our system. 
e Frame-to-frame tracking: At a given time step the 
tracking process adjusts the model’s joint angles by 
minimizing, with respect to the joint angle values that 
relate to that frame, the distance of the model to the 
3-D point clouds. This modified posture is saved for 
the current frame and serves as initialization for the 
next one. Optionally The system may use the model's 
projection into the images to derive initial silhouette 
estimates, optimize these using image gradients and 
derive from the results silhouette observations that it 
uses to constrain the minimization [Plánkers and Fua, 
2002]. 
e Global fitting: The results from the tracking in all 
frames serve as initialization for global fitting. Its 
goal is to refine the postures in all frames and to ad- 
just the skeleton and/or metaball parameters to make 
the model correspond more closely to the person. To 
this end, it optimizes over all frames simultaneously, 
again by minimizing the sane distance as before but, 
this time, with respect to the full state vector includ- 
ing the parameters that control the length and width 
of body parts. 
The final fitting step is required to correctly model the pro- 
portions of the skeleton and derive the exact position of 
the articulations inside the skin surface. This must be done 
over many frames and allows us find a configuration that 
conforms to every posture. To stabilize the optimization, 
we add to our objective function additional observations 
that favor constant angular speeds. Their weight is taken 
to be small so that they do not degrade the quality of the 
fit but, nevertheless, help avoid local minima in isolated 
frames and yield smoother and more realistic motions. Fig- 
ure 3 depicts the results on a difficult fully 3-dimensional 
motion. 
5 CONCLUSION 
In this work, we use a flexible framework for video-based 
modeling using articulated 3-D soft objects. The volu- 
metric models we use are sophisticated enough to recover 
shape and simple enough to track motion using potentially 
noisy image data. This has allowed us to validate our ap- 
proach using complex video-sequences featuring fully 3— 
dimensional motions without engineering the environment 
or adding markers. 
The implicit surface approach to modeling we advocate ex- 
tends earlier robotics approaches designed to handle artic- 
ulated bodies. It has a number of advantages for our pur- 
poses. First, it allows us to define a distance function from 
data points to models that is both differentiable and com- 
putable without search. Second, it lets us describe accu- 
rately both shape and motion using a fairly small number 
of parameters. Last, the explicit modeling of 3-D geome- 
try lets us predict the expected location of image features 
—259— 
  
  
MAGN CO RE at get
1
2
...
272
273
274
275
276
...
640
641
Full text: Close-range imaging, long-range vision

Access restriction

Copyright

Note to user