International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B5. Istanbul 2004
RANSAC estimation of corresponding matches yield a more
reliable recovery of the camera structure.
Moreover, if the left and right shots are simultaneous, the
epipolar geometry between the images is unchanged through
the whole sequence: a first calibration of the relative camera
leads to a precise and reliable computation of the fundamental
matrix that can be used to “robustify” the matching in a guided
procedure. As relative orientation and baselenght are constant
along the sequence, the Euclidean reconstruction achieved
using this information, allows to join different image subsets
and eliminates the scale factor ambiguity. :
Merging the two approaches assures a great improvement in the
procedure performance: while the "along sequence' pair leads
to a great number of correspondences (most of which correct),
the epipolar constraint arising from the *across sequence' pair
tends to eliminate some ambiguity (i.e. points on the road
markings) or troublesome wrong matches (i.e. building facade
texture). This merging is performed through a trifocal tensor
estimation using a robust algorithm. The resultant sub-block
(using two consecutive trifocal tensor computation we obtain a
symmetric four-image block) has a balanced size: the distance
between consecutive frames is approximately 4 m, while the
baseline of a synchronous pair is about 1.7 m. Since the camera
pose estimation is obtained within this symmetric configuration,
matched points can be found also in the vicinity of the
projection centres (so filling more uniformly the image format);
using two trifocal tensors provides a strong filter and leads to a
better constrained and redundant bundle block adjustment at the
end of the pipeline; finally, though less precise, points far away
tends to be tracked along many images of the sequence,
providing ties between sub-blocks.
4. FIRST TESTS AND RESULTS
In this chapter we present a summary of the results achieved
during the test procedure. Although the testing of the
implementation cannot be yet considered completed, we
processed a fairly representative set images, in terms of
arrangements of the images in the processing sequence, in terms
of road traffic, vehicle speed and scene background. Given the
variety of situations along roads, no definitive conclusion can
be drawn yet, but tests have been useful to understand how and
when satisfactory results can be obtained using the proposed
approach. In the first part of this section we illustrate results on
a block along a small countryside road around Parma, as
showed in figure 4; then we evaluate through simulated data set
the error propagation along a sequence of about 250 m and how
points or camera constraints can improve the motion and
structure computation.
4.1 Matching procedure and metric reconstruction
As already said all the tests were performed using two digital
cameras (Basler A101f) with a 8 mm focal length and a
resolution of 1300x1030 pixel. Since the camera lenses we use
produce a strong barrel distortion effect on the images, we first
determined their calibration parameters with a build-up test
field. The estimated distortion model is correct up to 0.5 pixel.
A good camera distortion model is necessary because
attempting an automatic self calibration never gave the desired
and expected results.
About 2000 feature points have been extracted in every image
of the sequence using the Harris operator: the number of feature
to accept was determined, considering the camera resolution
and quality, by finding a compromise: using too many points
the epipolar and trifocal estimation may leads to uncertain
results, whenever they are too close to each other so that an
ambiguous matching arise; on the contrary with fewer points
their ground distribution is poor and the camera pose is affected
by a weak geometry.
Through the disparity threshold a first putative matching is
computed between the images: though the a/ong sequence
approach tends to give more correspondences than the across
sequence one, the difference is negligible (see table 5).
The main difference between the two approaches arise during
the first outlier filtering: here, the perspective differences
between left and right images leads to a more difficult matching
between the putative correspondences; with a least squares
matching approach, the differences arising from the disparity in
the viewing angle may be taken in account and more matches
might be obtained.
In order to limit computation time for the across sequences the
epipolar geometry is evaluated only on the first pair using a
robust algorithm; then, in the other pairs, a guided matching
procedure, using the estimated fundamental matrix between left
and right images, is performed.
Figure 1. Matched points between left and right image of a
synchronous stereo pair (across sequence).
Figure 2. Matched points between consecutive left images
(along sequence).
The algorithm proceeds joining two different epipolar matched
data sets in order to perform the trifocal outlier filtering; the
common points are therefore fewer than those found separately
in the image pairs.
The trifocal tensor manage to eliminate those outlier that
satisfied the epipolar geometry constraints: in figure 3 we can
see how the parallax effect of the branches of the tree on the
background of the white building, arising from the different
standpoint of the left and right cameras, satisfies even the
epipolar geometry but is spotted by the tensor geometry.
me rn
abi
Re-— ^ £M ad
be