4 2004
OM
ds and
image
en the
e and
apable
and of
ved to
ve the
1at the
der to
are the
it error
h as a
t is the
ong a
i good
ng our
; up on
2000),
tage of
, in the
al case
on.
idences
t points
> easily
of the
iens M.
results
ped try
on the
ra pose
g.v.
ose, we
nage a
> of the
on the
in the
yutation
ows; a
mprove
n if the
e use a
adopt a
t is still
e image
nts must
relative
ate the
equence
les R.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B5. Istanbul 2004
1981). Since we do not know how many outliers affect our data
set, to reduce computating time we use an adaptive algorithm
(as suggested in Hartley, R., Zisserman, A., 2000) that starts
considering a 99% of outlier presence and than updates the
number of iterations required to assure the elimination of all the
outliers (at least for the epipolar constraints). When a first
camera geometry has been established we then try to find some
more correspondences through a guided matching (we usc again
cross correlation algorithm); the final estimate for the
fundamental matrix derive from a least squares solution over all
matches.
Since the epipolar constraint cannot filter out all false matches
(see next chapter) the data set undergoes another, more
restrictive, control: joining three (or more) consecutive images
we can estimate the three view geometry (Hartley, R.,
Zisserman, A., 2000) through a robust algorithm, finding a more
reliable camera reconstruction and getting rid of the remaining
outliers. The tests we carried out and the results published in
literature assure that a 99% probability of success in outlier
elimination is reached.
2.2.3 Metric reconstruction and bundle adjustment.
Untii now we have only determined image points
correspondence, filtering the wrong ones; we finally recover the
structure and motion of the whole sequence through a self-
calibration approach (as in Pollefeys, M., 1999). Besides, since
in our mobile mapping van we use calibrated cameras, we can
estimates the metric frame of the reconstruction directly
through the use of the essential matrix. The calibrated approach
gives more reliable results (mainly in the errors estimation)
even if lead to larger residuals. Once the metric reconstruction
of the sequence has been achieved, a bundle adjustment of all
the observation leads to an optimal estimation of all the S&M
(in terms of minimization of a geometric cost function).
In order to limit error propagation and the probability of finding
local minima during bundle adjustment, we adopted a
hierarchical approach to compute an initial estimate of the
ground point coordinates and the exterior orientation parameters
of the cameras. The whole sequence is subdivided in shorter
sub-sequences and the set of points is found which was traced
in every image of the sequence. The optimal number of sub-
sequences may depend on the problem at hand: our goal is to
ensure that the relative geometry of the cameras along the
sequence changes enough to allow a better intersection of the
homologous rays. Of course, if the changes in attitude between
consecutive images are not smooth or if the scene changes very
quickly (as in curved road sections) or if an object moving fast
through the scene (such as a truck on the opposite lane) cuts
most points in the background also this strategy may not be
enough. Nevertheless, we found that this normally improves the
quality of the approximations.
In cach sub-sequence the trifocal geometry among the first, last
and middle frame is computed, with the rationale that these
three images should have the best relative geometry. A metric
reconstruction is performed through the essential matrix,
yielding by triangulation the coordinates of the common set of
points. Based on that, the exterior orientation parameters of the
intermediate frames and the approximate coordinates of the
remaining points along the sequence will be calculated by
alternating resection and intersection using a linear algorithm
and the unit quaternion as in (Horn, B.K.P., 1987) and (Quan,
L, Lan, Z., 1999). Optionally, a ls. bundle block adjustment
(Forlani, G., Pinto, L., 1994 ) with data snooping will be
executed to improve the orientation parameters and discard
remaining outliers.
Finally, all sub-sequences are joined together by using the
points of the last image of the subsequence, which is also the
first of the next sub-sequence. This propagates also the scale of
the metric reconstruction along the whole sequence. Once the
sequence is completed, a final l.s. bundle block adjustment with
data snooping is performed using all images and including all
available information on the object reference system.
Though the all-purpose algorithm implementation has only
recently been completed, test on image sequences around
buildings as well as along a rock face showed good results,
fairly comparable with those of manual orientation of the same
images. As mentioned above, the number of images in the sub-
sequences may vary depending on the scene characteristics and
on the camera motion: while for movements of a hand-held
camera towards a distant subject we found that 15-20 was a
good compromise, in the MM case it is likely to be much
smaller, as will be discussed in the next section. If cutting the
sequence indeed complicates a bit the processing, since an
additional step is needed to put them together, we believe this is
a price worth paying for increased (at least local) stability of the
solution.
3. THE MOBILE MAPPING CAMERA GEOMETRY
As previously pointed out, the geometry of the image
acquisition. of a mobile mapping system presents some
disadvantages for a general structure and motion reconstruction
algorithm. On the other hand, since cameras are calibrated and
their relative orientation is known with sufficient accuracy, we
have in fact two overlapping image strips, a fact we can exploit
to eliminate some of the problem's unknown. We leave open
the possibility for the algorithm to manage sequences along the
motion direction (i.e. a sequence produced by a single camera)
and across the motion direction (i.e. a sequence from a
stereoscopic synchronous system); also, the pipeline structure
of the program allows to merge single camera and stereoscopic
sequences. This flexibility leads to a great improvement in
performance, because the system gain in robustness from the
combination of both approaches.
If we process a sequence of images from a single camera
pointing along the vehicle trajectory, the first difficulty arising
is due to the small overlap between consecutive frames. For
reliability reasons we consider a tracked point as good only if it
has been seen in at least three frames: therefore the accepted
points are almost always located in the middle of the scene,
quite far from the vehicle. Using the procedure described in
Section 2, this would lead to large uncertainties in the
estimation of point coordinates and exterior orientation.
Moreover, the epipolar constraint used to filter outliers often
performs poorly when tracking of well defined points along the
motion direction (for instance lane markings): indeed the
epipolar line tends to overlap with the vanishing line of the road
borders so little discrimination is achieved. This increase the
number of wrong matches, later removed by the trifocal tensor,
on what if often, in the countryside, the best source of interest
points.
Despite this, the forward approach has also advantages: in
straight road sections about % of the image frame depicts the
same scene in three consecutive pictures: this generally leads to
good results of the cross-correlation matching procedure (with
more troubles because the increase in scale for points at the
frame bottom).
Across the leit and right images of the sequence some other
useful constraints apply: in the normal stereo configuration, the
epipolar lines are almost orthogonal to the vanishing direction
of road markings: therefore the fundamental matrix and the