3 measured
r reduction
1e image is
'kers
latrix
| of
sitio
In the first
ace is mea-
For the se-
omposition
1 estimates
thod we do
the subject
1trol points
te perioral
ze sequence
enough or-
ation func-
ce is insuf-
ould be in-
1 front of à
jn the sub-
(b)
Figure 4: (a) An image example extracted from an
image sequence used in the first stage. The pixel size
is 640x480. Thirteen markers are placed around the
lips. (b) Cropped area for f. The original pixel size
148x84 is re-quantized to 37x21.
ject's face as shown in figure 1. A marker tracking
program extracts the markers, and determines their
trajectories.
We tested the following two types of facial motion.
e Horizontal motion of face
e Chewing motion
Each motion was viewed with a CCD camera, and
digitized into 640 x 480 pixels by a video capture. The
sampling rate was 30 frames/second. The length of
the sequences are between 240 to 250 frames (almost
8 seconds). Image vector f is the gray values within a
square area around lips. Figure 4(a) shows the image
used in the first stage. Figure 4(b) is the square area
for f. The image, originally 148% 84, is reduced to
37x21.
Figure 5 compares estimated position and actual
position of a marker for horizontal motion. In this
estimation, f;’s and p;’s for frame 0 to frame 149
are used as the learning sample. The SVD result
of these frames is applied to f;’s for frame 150 to
frame 200. The result shows the amplitude of the
horizontal motion is reproduced only from gray level
images. The marker is located just under lips, and is
labeled as C2 in figure 1.
Figure 6 shows the results of proposed method
applied to chewing motion. Figure (a) compares the
output and the true position of point C2 during mas-
tication. In the experiment, frames from 50 to 150
320 T T T T T T T T T
MEASURED P1393
315
310
305
X-axis (pixel)
300
295
290 1 1 1 L 1 L 1 1 1
150 155 160 165 170 175 180 185 190 195 200
Frame Number
Figure 5: An example of estimation. Frames from
0 to 149 are the learning sample. The position of
the markers are estimated from f for 150-200 frames.
Crosses: estimation. Diamonds: true position.
are used as the training sample. The estimation error
is much greater than figure 5 because the subject’s
face motion made the basis insufficient. Nevertheless,
rough motion is reproduced by the estimation. Fig-
ures (b)-(d) are the true trajectories and estimated
trajectories for thirteen points. Figure (c) is the re-
sult for the training sample. Figure (d) is the case
that the SVD result for (c) is applied to another in-
put sequence. Results (a) and (c) show that rough
motion can be recovered with the method. However,
the motion which cannot be spanned by the training
sample will be distorted by the method.
In the example, the first 50 frames of the training
sample is not spanned by frames 50-150. This caused
the error in the first frames large (figure (a)).
4. CONCLUSIONS
An advantage of our proposed method is that the
algorithm measures virtual marker locations without
makers once the estimation equation is created in the
first stage. In clinical application, an examination is
often done periodically. In such a case, it is trouble-
some to attach the markers to the same location of
the face. Our method does not require any prepara-
tion after the first measurement.
A problem in the current method is that the or-
thonormal basis must span sufficient image space.
The algorithm outputs incorrect estimates if the in-
put image is not supported by the basis. The sample
image sequences must be carefully chosen to satisfy
the requirement. Practically, if the facial position is
fixed, the variation of sample images will be reduced
because no degree of freedom is required for trans-