Close-range imaging, long-range vision

fundamental 
= 0 with A 
- vector cor- 
9-matrix A, 
The SVD of 
and diagonal 
ned equation 
he condition 
1e column of 
ilar value, of 
the system is 
solution for 
‘2. Using the 
. This results 
'eal solutions 
he Carlsson- 
1995). This 
s. The basic 
being viewed 
is means, the 
be employed 
pyramids we 
the efficiency 
least-squares 
stner operator 
of the match- 
iere no reduc- 
s, is yet avail- 
al least-square 
ipproximation 
relation score 
irselves in the 
'est points and 
n even-spaced 
ental matrices. 
by taking into 
| matrices. For 
cal tensor. On 
n the epipolar 
ositions in the 
rified by least 
Section 4.1). If 
e solution non- 
ley and Zisser- 
1ize the twelve 
and the 24 pa- 
cal tensor. On 
e, We compute 
a rotation ma- 
ey make up the 
[0]. Finally, 
to improve the results, the three parameters of the rotation matrix 
and the two parameters of the translation vector are optimized by the 
non-linear Levenberg-Marquardt algorithm. If no calibration infor- 
mation is available, we assume that both images are taken with the 
same camera without modifications. We further assume that princi- 
pal point equals the image center, that there is no sheer, i.e., zp, = 0, 
yn — 0and s = 0, the focal length c = 1 and the pixels are 
square. We provisionally calibrate the fundamental matrix with these 
assumptions and when optimizing by the Levenberg-Marquardt al- 
gorithm we vary two additional parameters describing c as well as 
the relation of the width and the height of a pixel. 
The approach was implemented in C++ using the linear algebra 
package LAPACK interfaced by the template numerical toolkit 
(TNT; math.nist.gov/tnt). For image processing we employ HAL- 
CON (www.mvtec.com). A result for the estimation of the trifocal 
tensor with a quadratic search space covering 50% of the image area 
is presented in Figure 2. 
5 DISPARITY ESTIMATION 
To utilize the point transfer described in Section 3, the coordinates of 
homologous points need to be known. We generate them by disparity 
estimation. Opposed to optical flow estimation we take into account 
that points can only be homologous if they are on the corresponding 
epipolar line in the other image. This constraint is taken into account 
by epipolar resampling. 
Our approach for disparity estimation can deal with strong occlu- 
sions and large disparity ranges also for details, i.e., very tiny struc- 
tures in the image. It is built upon (Zitnick and Kanade, 2000). 
The main idea is a cooperation between support and inhibition 
(cf. Fig. 3). The support region is a 2D- or usually 3D-box. All 
matching scores in this box corroborate to generate a disparity map 
which is locally continuous. When employing a 3D-box also sloped 
regions are implicitly modeled. 
Support Inhibition 
disparity 
   
—_— ——— 
width width 
Figure 3: Support and inhibition 
Inhibition enforces the uniqueness of a match. Assuming opaque 
and non-reflecting surfaces, a ray of view emanating from a cam- 
era will hit the scene only at one point. The idea is to gradually 
weight down all matches on a ray of view besides the strongest. For 
a stereo pair there are two rays (cf. Fig. 3). We store the matching 
scores in a 3D array where for every pixel in the left/reference im- 
age the matching score is stored as third dimension (cf. also Fig. 4). 
Therefore, for the left image the ray of view is just a column in the 
2D-slice of width and disparity. Because we work in disparity and 
not in depth space, the ray of view of the right image consists of the 
45? Jeft-slanted diagonal going through the pixel of interest. Putting 
everything together, the support Sy, for a pixel at row r and column 
c with disparity d is defined as 
TTT EN EZ 
/ 
Generation 
staff sheet box 
| (mete | | Hor 
height 
| | Update 
pue bv 8 (11 à 
fi EF » LB p 
S 
Et s cd 
Edd subtract f add £7 
Figure 4: Separation of 3 x 3 x 3 box: Generation and update 
Ls(r 4- r',ct cdd), 
Sn(r,c,d) = y 
(r',c',d')ee 
with L,, the score for the preceding iteration and ® the support re- 
gion (usually a box). The new score for iteration n + 1 is obtained 
as 
œ 
Sy, (ried) 
, 
Sn (r^. en d") 
(r^ c^ ,d")ev 
  
Lt (7, C, d) = Lo(r, C, d) 3 
with V the union of the left and right inhibition region and o an 
exponent controling the speed of convergence. o has to be chosen 
greater than 1 to make the scores converge to 1. The multiplication 
with the original matching scores Lo avoids hallucination in weak 
matching areas. Finally, for each pixel of the left image the disparity 
is chosen which has the maximum score. Practically, it is important 
to correct the inhibition value for the fact that on the left and the right 
side of the image a number of pixels depending on search width 
and offset are not matched and therefore do not contribute to the 
inhibition. 
A critical parameter is the disparity range as it defines the search 
space in the image. We automatically estimate it from the mean 
and the standard deviation of the disparities for a number of sample 
lines. This works relatively robustly and considerably reduces the 
computation time. For the matching scores we employ (normalized) 
cross-correlation optimized by separating the operations for mean 
and variance computation. 
Filtering with a 3D box-filter based on simple summation is highly 
redundant. Even though this is true also for 2D, we restrict the pre- 
sentation to 3D. To get rid of this redundancy, we separate the filter 
into one-dimensional (1D) staffs and 2D sheets. By adding pixels 
on top of each other we generate staffs (cf. Fig. 4). From them we 
build sheets and finally from the sheets the box. The important part 
is updating. To filter with a translated box, instead of adding sheets 
we add a (new) sheet on one side and subtract the (old) sheet on the 
other side. The same can be done for the sheets and the staffs. By 
this means the complexity becomes independent of the size of the 
box. 
We have also automated the setting of one important parameter, 
namely the number of iterations, by mapping it to the problem 
of convergence determination. For the latter also parameters are 
needed, but our empirical investigations show that they are relatively 
independent from the images at hand. To determine the convergence, 
we compute the difference image of the disparity maps from the two 
—195—
1
2
...
208
209
210
211
212
...
640
641
Full text: Close-range imaging, long-range vision

Access restriction

Copyright

Note to user