The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B5. Beijing 2008
accurate than other approaches that account for the projective
distortion due to the orientation of the imaged surface (Zabulis,
2007). For this reason, approaches that employ sweeping in
multiple directions (Mordohai et al., 2007) or refine an initial
estimation obtained by space-sweeping (Zabulis and Kordelas,
2006) have been proposed.
The proposed technique, based on spherical sweeping, provides
higher reconstruction accuracy, especially in the periphery of
the images (see Zabulis (2007) for an explanation) and, thus,
the available images are more efficiently utilized. In addition, a
memory-conserving extension is made to the conventional
space-sweeping approaches. This extension also facilitates the
acceleration of the methods, based on a coarse-to-fine depth
map computation. The importance of memory conservation is
twofold. First, the memory of conventional PCs is insufficient
to process high-resolution images and using virtual memory
renders the process extremely slow. Second, state-of-the-art
approaches to stereo reconstruction utilize the graphics
hardware to process large amounts of data processing
(Mordohai et al., 2007).
The sweeping procedure, which is similar to plane-sweeping, is
summarized here briefly. For each depth the images are
backprojected on the, backprojection surface and locally
compared. The output of this comparison is a similarity image
Sj at each depth, whose size is equal to that of the
backprojection surface. At each iteration i, the pixels in 5, are
compared to their corresponding pixels in S i+1 and S,-.;. As depth
increases, the values for a point in the similarity image
correspond to locations along a ray of visibility from the
cyclopean eye. The strongest local similarity maximum along
each such a ray is selected as the optimum depth. The
requirement for maxima to be local is used to avoid artifacts
that may occur in the textureless areas of the input images.
Memory conservation is achieved by tessellating the
backprojection image into, say, k x k equal spherical segments.
This tessellation is parameterized along the two spherical
coordinates that, also, correspond to image width and height.
The sweeping algorithm is performed independently for each
such partition. These partitions overlap slightly, in order to
avoid “blocking artifacts” at their boundaries. The amount of
overlap is exactly determined by the size of the comparison
kernel so that a scene point is not reconstructed twice.
The acceleration of the space-sweeping approach is based on an
iterative and coarse-to-fine approach that is combined with the
above memory conservation technique. The image data in each
iteration are obtained from traditional image pyramids of the
input images, starting from the smallest image of the pyramid
and advancing a layer in each iteration; at the last iteration the
original image is utilized. Also in each iteration, the
parameterization of the backprojection surface becomes denser.
As described above, the backprojection surface is tessellated
and the sweeping algorithm is executed independently for each
segment. At each iteration, though, each spherical segment is
re-segmented into k x k more segments. After the 2 nd iteration,
the range of evaluated depths (c/,) is drastically constrained,
based on the reconstruction result previously obtained for the
“parent” segment.
The obtained depth map is filtered very conservatively (as in
Mulligan et al., 2004), to suppress artifacts at depth
discontinuities and remove outliers. By doing so, some valid
matches are indeed rejected; however, in the utilized multiview
setup the corresponding points are most likely to be
reconstructed from another binocular pair. The result is
spatially quantized as it is too large (<x 10 9 points for 35 views
of 8Mpix each, in this experiment) to fit in memory. To cope
with the same limitations the merging process is performed
volumetrically, by tessellating the reconstruction volume into
cubical segments. Finally, a thin plate interpolating surface is fit
(Carr et al., 2001), to yield a mesh outputted into the VRML or
KML formats.
In Figure 2, the proposed method is demonstrated for the Dion
(Greece) archaeological site. In the experiments presented in
this paper, images were 2448 x 3264, 16-bit per layer, color
images acquired with a Canon Powershot SLR camera, the
number of iterations was 5 and the initial tessellation was 3x3.
The coarse-to-fine refinement factor was 2, so that in each
iteration: (a) the image rows and columns of the stereo and
backprojection images were doubled and (b) the number of
segments was increased by 4. The above scheme was measured
to provide a speedup of ~50 for the scene of this experiment.
Figure 2. Coarse-to-fine acceleration scheme, for space
sweeping methods. Top row shows the reconstructions for the 3
first iterations of the proposed procedure. In the middle-left an
original image from a ~40cm baseline stereo pair (left) is shown.
Others are the views of the RBF interpolated reconstruction
with and without texture mapping.
Figure 3 shows the result of an experiment that compares the
reconstructions obtained from the proposed method in Harris
and SIFT conditions of the previous section. The images in the
first 2 rows show the result of the reconstruction for an early
frame (20 views): in the SIFT condition, a larger proportion of
the scene is reconstructed. The last row, shows the result of the
SIFT condition after 35 frames.