Gonzalo-Tasis, Margarita
4 INTERPRETATION OF IMAGES AS POSTURES
Images interpretation concerns to a medium level in the Recognition Phase. Interpretation is performed with reference
to coarse geometric and kinematic models. Geometric models are based on symbolic representations of an articulated
hand in terms of a graph guided by the largest segments obtained in each image. Euclidean version of this method is
strongly dependent of the viewpoint, but its symbolic representation by means an adjacency graphs is efficient and
robust w.r.t. small perturbations, and this symbolic approach does not need complicated evaluation of invariant
properties w.r.t. rigid and scale transformations due to strong constraints about articulated mechanisms. Furthermore, it
allows us to simplify postures analysis due to robustness of tracking segments instead of tracking control points. In this
way, in this model, it is possible to integrate control based on position and control based on trajectories, without making
a cumbersome analysis of kinematics.
Thus, in this article we have concentrated our attention in posture recognition based on geometric aspects, in detriment
of another aspects related to the tracking and interaction with environment.
Firstly, we select the parallel segments with greatest width corresponding to the palm of the hand. We assign a variable
weight to parallel segments, which is proportional to relative orientation of the camera w.r.t. to the hand. A correct
identification of the palm is crucial for the rest of process. After identification of the palm, we evaluate characteristics
of segments associated to the fingers, which are common between them.
Fingertips and base points (corresponding to first knuckles) are located in regions where we have many corners.
Selection procedure depends on largest segments connecting such corners. In this case, we need also to assign a variable
weight allowing us to make an automatic selection of segments corresponding to fingers meaningful for each posture.
These weights have been obtained in an experimental way. After this selection, one can verify that making use of only a
small number of segments performs posture recognition: usually two or three segments are sufficient (in the worst case,
we need eight segments to perform this identification without error).
Coarse kinematic models are obtained following hypothesis propagation between standard geometric models, which act
as possible attractors for evolving postures. Determination of the hand kinematics is a very hard mathematical problem,
due to our partial knowledge about non-linear effects for each finger, about modeling of distributed processes involving
to different fingers and coordination problems between fingers to perform a concrete task. Additional dynamical aspects
related to stiffness and compliance that are crucial for coordination have not been considered here.
Traditional postures tracking are an approach very expensive and very dependent of the specific model we are
looking for ([RK94). For, in most presentation one needs a very concrete model, well-specified states for the model and
some knowledge about kinematics connecting different well-known states. In this scheme, one must characterise
geometric features, which are meaningful for each posture, to measure these features, to estimate states and track their
spatio-temporal evolution. Furthermore, if we wish to interact with the artificial hand, one must to make a 3D
reconstruction arising from some stereo vision system.
Next challenge is to deduce inverse kinematics from postures learning in order to improve man-machine
interaction. This goal requires to transfer visual information to motor representation, under partial or incomplete
information, i.e. without having an exact knowledge of dynamical equations controlling the motion, but only some
partial qualitative data arising from effects observed by neural fields. These neural fields act by means stimula on very
few neurones, by selecting the preferred orientation and activation/inhibition level for each task on each knuckle.
Multivector representing this orientation in the configuration space gives the posture up to scale by means a (3x3)-
matrix corresponding to the sin(0,), where 6; represents nine relative angles. Relative angles are given as
differences between angles corresponding to consecutive phalanxes in virtual knuckles (points where segments
symbolising phalanxes are intersecting). To improve accuracy, it must to add geometric information about the DOF of
articulated hand and allowed movements of the fingers. This representation (as first-order differences) is especially well
behaved to a discrete approach to kinematic questions, which can be easily adapted to Neurodynamical Models [Ha94].
The description of neural fields in the working space is not easy; for, we would need to specify euclidean information
about control points where we must apply neural fields. Instead, we describe several basic neural fields depending on
the agonist /antagonist situation (corresponding to flexion/extension postures of involved muscles). We have a specific
neural field for each finger (not only for each neuron, because we have functionally similar neurons in different
fingers), a codebook for different joints allowing us to discriminate between signals corresponding to different
articulations appearing in each finger. Activation/inhibition mechanisms of neural fields depend on the task and the
proximity relations w.r.t. obstacles presenting in the scene.
302 International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B5. Amsterdam 2000.