Fua, Pascal
(b)
Figure 3: Skeleton and Marker Model. (a) Generic skeleton Model. (b) The generic model is scaled to conform the performer's
anatomy. Each marker is attached to a joint and can move on a sphere centered around that joint.
these markers are reconstructed by trinocular stereo, that is, using at least three cameras. This is in contrast to markers
reconstructed using only two camera views, and for which the projections into the other views failed.
Once we have reconstructed these trinocular 3-D markers in the first frame, we need to compare the number of recon-
structed markers with the number of markers known to be carried by the actor. As all remaining processing is automatic,
it is absolutely essential that all markers be identified in the first frame. Any marker not present in the first frame is lost
for the entire sequence. Therefore, if the number of reconstructed markers is insufficient, a second stereo matching is
performed, this time also taking into account markers seen in only two views. As binocular stereo matching is bound to
introduce errors, the user is then prompted to confirm whether or not these binocular reconstructions are correct.
As soon as all markers are found in the first frame, the user is asked to associate each marker to a joint. For each
highlighted marker, the user must select a body part and corresponding joint. Any marker not associated to a body part
is discarded during the fitting process. Once these associations have been manually created, we can proceed with 2-D
and 3-D tracking of the markers over the entire sequence. 2-D tracking is carried out at the same time as 3-D tracking
because 2-D sequences are bound to provide more continuity than reconstructed 3-D sequences. We therefore use 2-D
tracking in order to accelerate 3-D reconstruction: For each reliably reconstructed marker in frame [f], we consider the
two sets of 2-D coordinates that were used to compute its 3-D coordinates. After 2-D tracking, these two sets of 2-D
coordinates will most likely have links to two sets of 2-D coordinates in [f+1], the next frame. If so, we can then use
them in [£1] to construct the corresponding 3-D marker. To determine the related 2—D positions in the other camera
views, we reproject the 3-D coordinates, as in the stereo matching process described above. 3-D tracking propagates
the information attached to each marker in the first frame throughout the entire gym motion, so that as many rnarkers as
possible are identified in all frames. A broken link in the tracked trajectory of a marker implies the loss of its identity
and the user must then be prompted. In Section 3.2, we will see how we use the skeleton to overcome that problem in an
automated fashion.
To compute the trajectory of a marker from frame [f] into frame [f+1], both in 2-D and 3—D, we look at the displacement of
the marker over a four-frame sliding window (Malik et al., 1993). The basic assumption is that displacement is minimal
from one frame into the next, and the idea is to predict and confirm the position of a marker in the next frame. The
displacement of a marker from [f-1] into [f] predicts the position in [f+1]. The actual position in [f+1] and the projection
of the movement into [f+2] should confirm the previously-made hypothesis by eliminating ambiguities.
At the end of the marker reconstruction process and 2-D/3-D tracking steps, we have the gym motion reconstructed in
3-D, the trajectories of the markers throughout the sequence, as well as the identification of the markers with respect to
the skeleton model.
3.1.2 Initial Joint Localization Let us consider a referential bound to a bone represented as a segment. Under the
assumption that the distance between markers and joints remains constant, the markers that are attached on adjacent
segments move on a sphere centered on the joint that links the two segments. The position of a segment in space is
completely defined by three points. Thus, if we have a minimum of three markers on a segment, we can define the
position and orientation of that segment in space. Afterwards, we compute the movement of the markers on adjacent
segments in the referential established by these markers and we estimate their centers of rotation (Silaghi et al., 1998).
To take advantage of this observation, we partition the markers into sets that appear to move rigidly and estimate the
3-D location of the center of rotation between adjacent subsets, which corresponds to the joint location. This yields the
256 International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000.
ap
pe
the
ot]
on
of
Te
th
th
ra
th
al
A" ad — (fA ph: jt A P*n N ui