A Hierarchical Neural Network Approach to
Three-Dimensional Object Recognition
Yongsheng Zhang
Department of Photogrammetry & Remote Sensing
Zhengzhou Institute of Surveying & Mapping, PR China
KEY WORDS: Three-dimensional Object Recognition, Neural Network , Image Matching
ABSTRACT-This paper proposes a hierarchical approach to solving the surface and vertex correspondence
problems in multipe-view-based three-dimensional object recognition systems. The proposed scheme is a
coarse-to-fine search process and a Hopfield network is employed at each stage. Compared with conventional
object matching schemes, the proposed technique provides a more general and compact formulation of the
problem and a solution more suitable for parallel implementation. At the coarse search stage, the surface
matching scores between the input image and each object model in the database are computed through a Hop-
field ntework and are used to select the candidates for further consideration. At the fine search stage, the ob-
ject models selected from the previous stage are fed into another Hopfield network for vertex matching. The
object model that has the best surface and vertex correspondences with the input image is finally singled out
as the best matched model.
I .INTRODUCTION
THREE-DIMENSIONAL (3-D) object recogni-
tion is the process of matching an object to a scene
description to determine the object/s identity and/or
its pose (position and orientation) in space [1 ]-[ 3].
Any system capable of recognizing its input image
must in some sense be model-based. The problem of
object recognition can be scparatcd into two closely
relatcd subproblems-that of model building and that
of recognition. There are different approaches to
both these subproblems, and the procedure used for
recognition will have a strong impact on the kind of
model that will be required and vice versa.
The multiple-view approach to 3-D object recogni-
tion [4]-[6] models an object by collecting all its
topologically different 2-D projections from various.
viewing angles. In the model database, each 2-D
projection is topologically different from the others
and is referred to as a characteristic view (CV) [4],
[5]. In [7], we have proposed a computer system to
automatically construct multiple-view model database
for polyhedral objects. The database is organized as a
graph in which a node represents a characteristic view
and an arc represents the transformation between two
characteristic views. It is also referred to as a CV li-
brary (or aspect graph [6]).
Although the redundancy of the model database
has been reduced to the largest extent in the CV li-
brary generation process, the size of the library is still
large if the target object is complex in shape. This
makes the subsequent recognition process very time-
consuming if a traditional sequential matching scheme
is adopted. Generally, the bottleneck of the recogni-
tion process is to establish the correspondence rela-
1010
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
tionships between the contents of the image and the
object model.
In this paper, we propose a coarse-to-fine strategy
to solve the correspondence problem in 3-D object
recognition based on Hopfield networks [8], [9].
Compared with the conventional object matching
schemes, the proposed technique provides a more
general and compact formulation of the problem and a
solution more suitable for parallel implementation.
I . HOPFIELD NETWORKS FOR
IMAGE MATCHING
A Hopfield net is built from a single layer of neurons,
with fecdback connections from each unit to every
other unit (although not to itself). The weights on
these connections are constrained to be symmetrical.
Generally, a problem to be solved by a Hopfield net
can be characterized by an energy function E.
Through minimizing the energy function, an optimal
(or near optimal) solution is ultimately reflected in
the outputs of the neurons in the network. The ap-
plications of the Hopfield net are multifarious. In
[10], object recognition is based on subgraph match-
ing. The graph matching technique is formulated as
an optimization problem where an energy function is
minimized. The optimization problem is then solved
by a discrete Hopfield network. In [11], a Hopfield
network realizes a constraint satisfaction process to
match visible surfaces of 3-D objects. In [12], the
object recognition problem is casted as an inexact
graph matching problem and then formulated in
terms of constrained optimization. In [13], the
problem of constraint satisfaction in computer vision
is mapped to a network where the nodes are the hy-
potheses and the links are the constraints. The net-
work is then
hypotheses w
In this par
is in the forn
of the array :
and the colu
model. The
similarity be
and the othe
process can t
ing energy fi
E =—
N
4
* SM
where V is
1. 0 if the «tt
node in the
0. The first
The second :
the uniquene
ject model e
input image
neurons in e
jor compone
strength of i
à column k
pressed in te
F(z
where 0 is a
pertaining t
this paper,
will be usec
they will be
IV , respecti
where z, is
umn k, y, i
ume /, and
Ws is equa
can be sim
form of a H
1
E
where Z, =
where d,,=
Matching
faction pro
from the in
for local fea