in the victory
1e eyebrows,
are input into
essions are
ctory neuron
the eyebrows,
zing map for
| are image
to one of the
are 30 such
pare various
aps an ability
ns, including
elf-organizing
aps are given
includes an
] is converted
"hen intensity
ment is input
The neuron
he feature of
suron. The
re defined by
output layers.
uron and the
to make the
> of the input
lerconnection
bers of which
beginning of
s small as its
it of learning.
ess. A self-
this process
| to the eyes
Input image sequences
Next image :
repeat 20000 times
j Selected seg
of eyebrow
ment
2
2
a
s
à
0
d
NS
:
input segment.
The interconnection weights are changed in order
to make the features of the victory neuron and the
neighborhood neurons approximate those of the
Figure 2 Learning Self-organizing Maps
if iN, (t)
and the mouth are prepared through the same learning
process.
3.3 Feature Extraction method
The features of the image sequences of recognition
targets are extracted by using learning self-organizing
maps. Positions of facial segments such as eyebrows,
eyes and mouth in facial images are supposedly already
known, and rectangular segments that include important
facial segments are selected. The features of rectangular
segments are expressed in victory neuron number by
inputting into learning self-organizing maps the intensity of
elements in rectangular segments. This process is
carried out for all image sequences, and changes in the
victory neuron number corresponding to those in the facial
expression are analyzed. To recognize facial
expressions, it is necessary to take the movements of all
facial segments into consideration. Thus, the eyebrows
and the eyes and the mouth of the neuron number
Happiness Anger
Surprise
constitute the axes and the feature of the facial expression
in time is represented by a three-dimensional point which
is defined by the victory neuron number of the eyebrow,
the eye and the mouth. The point will move as facial
expression change. Facial expressions are classified by
analyzing the movement of the point.
4. DISCRIMINATION EXPERIMENT
4.1 Making Image Sequences of Facial Expressions
Subjects are ordinary people without formal training in
acting. The filming is done indoors. Lights are placed to
make the brightness in front of the face 700-lux. A video
camera is placed directly in front of the subject's face.
Subjects are asked to make expressions ranging from
expressionless to one of the six types of facial expressions
one at a time and while trying their best not to move their
heads. Each facial expression is recorded three times,
and emotions are expressed one at a time. Subjects are
not asked to express multiple emotions or mixed emotions.
Under these conditions, facial expressions are recorded
on VTR. Video sequences last from two to four seconds.
Video sequences are rendered as image sequences of 30
images per second. Figure 3 shows the subjects and
Sadness
Disgust Fear
Figure 3 Subjects and Their Facial Expressions
445