Full text: Real-time imaging and dynamic analysis

Figure 4: (a) An image example extracted from an 
image sequence used in the first stage. The pixel size 
is 640x480. Thirteen markers are placed around the 
lips. (b) Cropped area for f. The original pixel size 
148x84 is re-quantized to 37x21. 
ject's face as shown in figure 1. A marker tracking 
program extracts the markers, and determines their 
We tested the following two types of facial motion. 
e Horizontal motion of face 
e Chewing motion 
Each motion was viewed with a CCD camera, and 
digitized into 640 x 480 pixels by a video capture. The 
sampling rate was 30 frames/second. The length of 
the sequences are between 240 to 250 frames (almost 
8 seconds). Image vector f is the gray values within a 
square area around lips. Figure 4(a) shows the image 
used in the first stage. Figure 4(b) is the square area 
for f. The image, originally 148% 84, is reduced to 
Figure 5 compares estimated position and actual 
position of a marker for horizontal motion. In this 
estimation, f;’s and p;’s for frame 0 to frame 149 
are used as the learning sample. The SVD result 
of these frames is applied to f;’s for frame 150 to 
frame 200. The result shows the amplitude of the 
horizontal motion is reproduced only from gray level 
images. The marker is located just under lips, and is 
labeled as C2 in figure 1. 
Figure 6 shows the results of proposed method 
applied to chewing motion. Figure (a) compares the 
output and the true position of point C2 during mas- 
tication. In the experiment, frames from 50 to 150 
320 T T T T T T T T T 
X-axis (pixel) 
290 1 1 1 L 1 L 1 1 1 
150 155 160 165 170 175 180 185 190 195 200 
Frame Number 
Figure 5: An example of estimation. Frames from 
0 to 149 are the learning sample. The position of 
the markers are estimated from f for 150-200 frames. 
Crosses: estimation. Diamonds: true position. 
are used as the training sample. The estimation error 
is much greater than figure 5 because the subject’s 
face motion made the basis insufficient. Nevertheless, 
rough motion is reproduced by the estimation. Fig- 
ures (b)-(d) are the true trajectories and estimated 
trajectories for thirteen points. Figure (c) is the re- 
sult for the training sample. Figure (d) is the case 
that the SVD result for (c) is applied to another in- 
put sequence. Results (a) and (c) show that rough 
motion can be recovered with the method. However, 
the motion which cannot be spanned by the training 
sample will be distorted by the method. 
In the example, the first 50 frames of the training 
sample is not spanned by frames 50-150. This caused 
the error in the first frames large (figure (a)). 
An advantage of our proposed method is that the 
algorithm measures virtual marker locations without 
makers once the estimation equation is created in the 
first stage. In clinical application, an examination is 
often done periodically. In such a case, it is trouble- 
some to attach the markers to the same location of 
the face. Our method does not require any prepara- 
tion after the first measurement. 
A problem in the current method is that the or- 
thonormal basis must span sufficient image space. 
The algorithm outputs incorrect estimates if the in- 
put image is not supported by the basis. The sample 
image sequences must be carefully chosen to satisfy 
the requirement. Practically, if the facial position is 
fixed, the variation of sample images will be reduced 
because no degree of freedom is required for trans- 

