A detailed description of the developed algorithms is presented
in Section 2 of this paper. Results obtained on synthetic and real
dataset are shown in Section 3.
2. ITERATIVE POSE ESTIMATION ALGORITHM
BASED ON CONTOUR TEMPLATES
2.1 Distance between image and projected contours
Assuming approximate initial pose estimate is available, the
model is projected onto the image plane, and a contour
representation of the projection if obtained. This contour, which
is one pixel thick, is called contour template. The pose
estimation problem consists in finding pose parameters (camera
position specified by three rotation angles and three
coordinates) which minimize the difference between the
contour template and the image. In this work we present an
iterative algorithm for pose estimation based on distances from
the model's projected contours and image edges.
The first step is to obtain the projected contour of the model.
Since the 3D model of the object is known, this task is
performed using z-buffering and rasterization algorithms. Next,
in each point belonging to the projected contour we compute
the distance from the point to the nearest edge on the image.
The search for the nearest edge is performed along the line
parallel to the vector normal at the current contour point. Only
edges which are parallel to the projected contour are
considered. To achieve this, the image is convolved with a bank
of filters which accentuate edges of particular direction. In this
work we quantize the search directions into four bins
corresponding to 0?, 45?, 90? and 135?. The video frame is thus
convolved with four 7x7 filters, with kernels equal to the
Gaussian derivative along the x-coordinate rotated in the
direction 6 (Geusebroek, J.-M., 2003). Various parts of ISS are
accentuated by different directional filters.
The contour-based pose estimation algorithm originates from
the works of Lowe (Lowe, D. G., 1991) and (Vacchetti, L., et
al, 2004). For each point (x,,y,) located on the contour
template we inspect a set of image points,
X,(3,,3,5,0) » (x, * jcos0, y,- sin), je[-R,R]. The
points are located on both sides of the contour, on the straight
line L, which is perpendicular to the contour in the current
point. The radius of the search neighborhood along the line is
denoted by R. In every point X,(x,, y,;0) the absolute value
of the previously computed directional derivative along
direction Q is analyzed. Points where these values exceed a
threshold are stored into a list. For every point (x,,y,) from
the list, we compute the signed distance from (x,,y,) to the
line specified by the normal vector with coordinates
(cos0,—sin0) passing through the point (x,, y,) (the line is
tangential to the projected contour). We use the screen
coordinate system in which the y-axis points downwards and (0,
0) corresponds to the upper left corner of the image. The search
for the nearest edge is illustrated in Figure 2; the projected
contour is drawn by a dashed line. The signed distance is
computed as
d, 2 (x, —x,)cos0 - (y, - ,)sinQ . (1)
The distance is positive if the points (x,,y,) and (0,0) are
located in the same half-plane with respect to the straight line;
otherwise it is negative.
The misfit function is the sum of all squared reprojection errors,
i.e. distances between all projected contour points to the closest
corresponding image edges:
F(m)- |a(m)| . Q)
The vector m= lo, gr. T contains the pose parameters,
three rotation angles and three camera coordinates that are to be
found.
A x= ^
us
Figure 2. Search for the closest edge
2.2 Expressions for contour-based distance and its
derivatives
Before starting the process of iterative pose estimation using the
projected contour template, it is necessary to find the three-
dimensional coordinates of points whose projections belong to
the contour. The “inverse projections” of the points lying on the
contour are found by tracing the imaginary rays coming from
the camera through the image plane and finding their
intersection with the 3D model, assuming that pose parameters
are known. In this way, every contour point is represented by a
set of its 2D pixel coordinates (x,,y,) and 3D coordinates
(X,,Y,Z,). Let us assume that the camera (or observer)
coordinates are given by T=(T,T,,T.} The angles of
rotation around the coordinate axes are w@, @ and x. The
pixel coordinates (x,y)of a point with homogeneous
coordinates x =(X,Y,Z,1)" can be found according to
X=u/t
3
y=vit 3)
[u, v, w,t]" 2 PVx, (4)
R^ -R'T
V= ; (5)
0 0 0 1
wh
The
ang
Scr
ana
obt
coc
wh
sig
2.3
In
tha
r=
use
Th
cot
inti
At
acc
F(
the
res
unt
fen
Fal
hig
COI