238
Another application is exception handling. There can be object streams (e.g., on conveyor belts) which are
mixed. The robot vision system knows object models for most of the objects which have to be grasped, but
every once in à while an unknown object is arriving. Whenever the vision system realizes that it cannot
recognize the object it invokes an auxiliary system (not relying on object models) which then analyzes the
scene and derives possible gripping points. Alternatively, the model based system and the auxiliary system
could continuously work in parallel. The results of the auxiliary system would only be used if the model based
component fails.
From the given examples it is evident that techniques for object manipulation without the use of object models
can be useful in many application areas. Up to now however, little work has been done by the robot vision
community for the case of scenes consisting of heaps of unmodeled objects.
The work reported in this paper aims at developing methods to infer from sensor data alone sufficient information
for a robot to grasp an unmodeled object and take it away. Obviously, when we say that an object is unknown,
we still tend to make some minimal assumptions, e.g., that the objects are not too soft or elastic, that their
size and weight is compatible with the properties of the gripper, and that their surfaces are piecewise smooth
so that they can be modeled at least locally. The robot's vision system must be provided with the capability to
extract and represent surface patches in its work space. This representation of the scene must be rich enough to
support segmentation and to find out which patches are likely to belong to one object, to come to a decision as to
which hypothesized object has the best “grippability” at a certain moment, and finally to take it away without
collision with other objects. All knowledge needed for this is extracted from the range data. In accordance with
this outline we are building a complete system, from data acquisition to action, adhering to the paradigm of
“purposive vision”.
2 RELATED WORK
Work on deriving grasps for single unmodeled objects includes [Boissonnat 1982] and [Stansfield 1991]. Bois-
sonnat describes a method to find stable grasps for a robot gripper without the use of object models, merely
from an analysis of the object silhouette which is approximated by a polygonal sequence. The grasps are ranked
by quality which is determined by a criterion having four components. An extension to three-dimensional sil-
houettes is proposed. Stansfield proposes to grasp single unmodeled objects with a knowledge-based approach
which draws on theories about human grasping behavior. From range data a representation of the sensed object
in the form of a set of up to five aspects is generated. This symbolic representation is used by a rule-based
system to derive a set of possible grasps for the object. The gripper is a three-fingered Salisbury hand.
Several authors have employed generic object models where the type of the admitted objects is known, but
where the dimensions of the instances occurring in the scene have to be determined. [Ikeuchi and Hebert 1990]
describe a vision system for a planetary explorer, supposed to automatically collect rock samples. Pebbles
which are not touching each other are partially buried in sand, and range images of them are taken. The visible
surface parts serve to estimate shape and pose parameters of superquadrics. The pebbles are then taken by
kind of a shovel excavator robot. [Tsikos and Bajcsy 1991] describe a system which is able to remove objects
from a heap, one by one. The heap is lying on a base plane and single range views and/or intensity images
are taken from an essentially vertical direction. Thus it is not possible to see vertical or overhanging surfaces.
However they assume that only convex objects are admitted, more specifically objects from the postal domain,
i.e., flats, parcels and tubes. This generic model knowledge helps them to interpret the views and to identify
grasps. [Mulgaonkar et al. 1992] have also been working on a project for the US Postal Service. They tried to
physically understand object configurations using range images. Generic object models, i.e., boxes and cylinders
with circular cross sections were used.
Our work differs from the above in that it deals with heaps of unknown objects which do not need to be concave
or conform to a generic model. We are using two opposite oblique range sensors whereby it is possible to also
sense vertical and even overhanging surfaces which is an advantage in terms of descriptive power.
Our emphasis is on identifying grasping opportunities in the heap rather than objects. This could be termed
action-based recognition and it is interesting to compare it to function-based object recognition, proposed by
[Stark et al. 1993]. The latter is (generic) object recognition based on the detection of features in the object
instance which (after evidence accumulation) allow to identify the function for which people use the object and
therefore the object class. In our approach we do not recognize object classes but action classes that a robot
might perform on a certain part of a heap. The means to arrive at the recognition of grasping opportunities is
accumulation of evidence as in [Stark et al. 1993|.
IAPRS, Vol. 30, Part 5W1, ISPRS Intercommission Workshop "From Pixels to Sequences", Zurich, March 22-24 1995