33, 2012
* » »Matching Costs | |
M Path Costs 1
7 Path Costs 2
Path Costs 3
—PathCosts4 | .|
| 7 Path Costs 5
| Path Costs 6
* | Path Costs 7
> dm Path Costs 8
[s Sum Costs/ 6. |
I£s p ET
and sum costs
calculated with
can be adapted
Ity functions is
.3 in detail.
> is achieved by
pixel, as shown
he path costs
(6)
ive of the base
th the minimal
(7)
minimal aggre-
are selected:
(8)
disparity selec-
sts (dotted line)
ght aggregated
med path costs
isparity level of
fference for the
;orrect position
is obtained by
m costs around
lation methods
rformed to en-
e level are pro-
lid if the min-
check sets dis-
esponding dis-
ocessing steps,
\n overview of
and Fig. 2 the
for the overall
hmiiller, 2008)
'enalize abrupt
nt, a depth dis-
ssary to assess
n the algorith-
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
Em mm mm meme T Ze
! Transform: | (p) I !
a Rank (2) or > d
I,(p) ; Census (3) Inital cost | !| Path cost
| computation > calculation
t [ Transform: (Dor(4) || 1| Rx(5)
Tan Rank (2) or HP |
m(P | Census (3) |R (p) iC(p. a)
T Matching Cost Calculation
| Disparity
| selection D
-» | let(7) | D,(p)
| Left/Right
: > i ES ac >
—> (6) | Disparity D(p)
selection i
I 1 D, (p)
L (p.d) I$(p, a). right (8)
T Semi-Global Matching
Figure 3: Processing steps for disparity estimation using rank transform/census transform and semi-global matching.
3 EVALUATION AND RESULTS
The four evaluated penalty functions are:
(a) empirically determined constant value, i. e.
Pac — const. (9)
(b) negatively proportional to the absolute luminous intensity gra-
dient of the currently processed pixels along the path, i. e.
P --o-:|I(p)- I(p-r)| y (10)
(c) inversely proportional to the absolute luminous intensity gra-
dient of the currently processed pixels along the path. This fol-
lows the original proposal from SGM.
œ
TPE HR ST
(d) negatively proportional to the variance of the luminous inten-
sity in a local window, i. e.
an
Pay = —a - Var (A(P)) +7 (12)
In all cases it has to be ensured that P» 7 P,. Therefore, a lower
bound is introduced Ps min to which the values are clipped. An
upper bound is not required because penalty higher than Cmaz +
P, cause that value never to be taken in the outer min-term in
Eq. (5). It follows that (b) does not require a parameter 3 for
shift in x direction. This is implicitly done by adjusting vy. Cases
(b) and (c) are based on the hypothesis that depth changes are
often visible as luminance changes. Case (d) is based on the hy-
pothesis that matching costs in highly structured areas are highly
discriminative and luminance changes not only occur due to ob-
ject changes.
3.4 Methodology and Middlebury Images
For the first set of experiments the established Middlebury stereo
data set (Cones, Teddy, Venus and Tsukuba) is used (Scharstein
and Szeliski, 2002). These were taken under controlled labora-
tory conditions. Intensity differences and noise are expected to
be minimal. The disparity ranges are 64 px for Cones and Teddy,
32 px for Venus, and 16 px for Tsukuba. Each penalty function
is parametrized for each image with both matching cost func-
tions for 4 and 8 paths. The resulting disparity maps are eval-
uated by counting the number of erroneous disparities in non-
occluded areas. An erroneous disparity differs by more than a
defined threshold from ground truth. Two thresholds are consid-
ered: |A| > 1 px and |A| » 0.5 px. Percentages stated in the
following are the number of erroneous pixels of all non-occluded
pixels (not the entire image). Ignoring occluded areas, i. e. where
disparities cannot be computed, allows to focus on the perfor-
mance of the disparity estimation algorithm rather than any post-
processing steps. Otherwise, the results would be biased by the
quality of the hole interpolation algorithm. For the same reasons
no post-processing steps are applied to the disparity maps.
Questions the first set of experiments is aimed at to answer are:
Is there a clear favorite among the penalty functions? How sensi-
tive is the performance towards the parametrization of the penalty
function? Is the parametrization robust across different images
taken with different setups and cameras? These questions are
of relevance for real world system since insensitivity towards
non-optimal parametrization and camera imposed differences are
mandatory.
Fig. 4 shows the results computed with census and 8 paths for
the four test images as the parametrization of each function is
changed. The parameter configurations for each penalty function
are sorted with increasing error and the best 100 configurations
are shown. The parameters of each function (P1, Pz min, @, 3,
and ^) are changed systematically with carefully determined step
sizes big enough to ensure sufficiently different configuration sets
on the one hand and small enough not to miss local minima on
the other hand.
Setting P» constant performs well if carefully adjusted to the par-
ticular image but quality degrades quickly as these values are
changed. The adaptive functions P»; and 7»; perform signifi-
cantly better with up to 1 percentage points improvement. Both
are comparable in terms of quality and superiority is minimal de-
pending on the particular image. The variance based approach
performs significantly worse than the other adaptive approaches
and sometimes even worse than the fixed approach. This could
be due to the fact that P», does not calculate penalties along the
currently processed path but from the local window giving the
same penalty value for all path directions. For the census-based
matching costs P»; and P^; are the best functions.
The second row of Fig. 4 shows the data re-grouped according
to penalty function, this time over all configurations analyzed.
All functions are insensitive to a certain degree of non-optimal
parametrization to the image content. However, it is also clear
that good parametrization is essential for obtaining the maximum
of correct information.
The third row of Fig. 4 assess if optimal configurations coincide
from image to image. The configurations are now ordered ac-
cording to the parameter values and same configurations are on
the same x-position. Clearly, performance of a particular configu-
ration coincides across all images. Further, the best configuration
for one image is usually found for the other images when allow-
ing a minimal 0.5% percentage point error margin. When going
from 8 paths to 4 paths (data not shown) the same observations
and conclusions can be made with just slightly increased error
counts. For half-pel error thresholds the are no changes in con-
figurations (data not shown).
Results employing the rank transform are shown in Fig. 4 fourth
row. Error counts for best performance are always slightly higher