International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B5. Istanbul 2004
survey without major GPS outages, for instance along planes or
gently hilly areas and small villages with low raise buildings.
We noticed indeed that in many cases the loss of lock is very
short in time and space, let say from 20 to 100 m. Often we
have a relatively long sequence of small interruptions each
followed by small sections where the ambiguity is recovered for
some tens of metres or less. In other cases, for instance crossing
a tree row, the loss of lock is just a 30-50 m long. In such
cases, we may use the information recovered by the image
sequence itself.
Although the most important motivation is to bridge over GPS
outages, by applying the procedure to successfully
georeferenced image sequences, we can improve their
orientation parameters, very much like applying integrated
sensor orientation in aerial blocks (Heipke et al, 2002). This
may allow point restitution also among images of either the left
or the right camera or even multiple collimation, to increase
accuracy and reliability when needed.
Our goal has therefore been finding a robust algorithm, capable
to determine automatically the cameras' motion structure along
all the unreferenced image sequence. To this aim we built upon
the theories and applications heavily developed in the last few
years by the CV community. To our understanding their
application to mobile mapping did not received much attention
(Tao et al, 1999; Crosilla, F., Visintini, D, 1998 are two
exception) but we believe they may be appropriate to solve this
task, provided the loss of lock is not too long. It is well known
indeed that, without ground control or auxiliary information, the
error propagation on a strip is rather unfavourable and the
solution quickly deviates significantly, especially in height.
1.2 The imaging geometry of a mobile mapping
There is a number of issue characteristics of the imaging
geometry of an image sequence taken by a van with a pair of
synchronized cameras: a large variation in depth (or image
scale), a small base, fast moving objects, and so on. They will
be discussed later in the detailed description of the method. It is
clear nevertheless that while for a robust and efficient image
matching and S&M recovery the imperative is to take shots not
too different from one another (i.e. the frame rate has to be
quite high compared to the vehicle’s velocity, especially along
curved paths) the position error will rapidly increase with the
number of images processed. So, we need a method satisfying
both constraints: a robust estimation of the cameras pose and
limited systematic errors in the exterior orientation. The basic
geometry of our blocks will therefore be a double strip, with
longitudinal overlap larger than 60-70 percent of the image
format along straight road sections (less on curves), side
overlap of about 80%. The relative orientation of each pair is
known and constant and the strip ends are constrained to the
exterior orientation values provided by the GPS solution (just
before and after the loss of lock). Whenever the loss of lock
lasts too long, some human interaction may be accepted: in
order to constrain the solution of S&M estimation we can bring
in (Crosilla, F., Visintini, D... 1998) point coordinates from a
GIS system. If the number of points in one image is enough,
this may allow a spatial resection; in most cases just a partial
constraint will be enforced, if just a few points are available.
Another (though less reliable) option would be to use the noisy
code solution of the GPS, which may be available along the
sequence.
In the following section we describe how our general M&S
recovery system works; in Section 3 we discuss how we
tailored it to the MM application; finally, in Section 4 we show
and analyze the results obtained during a test session.
804
2. STRUCTURE AND MOTION RECOVERY FROM
IMAGE SEQUENCES
2.1 Introduction.
The last ten years witnessed the growth several methods and
algorithms for recovering structure and motion from an image
sequence, exploiting the geometric relationships between the
images of a sequence and their similarity. The use and
improvement of robust algorithms (MLS, RANSAC,...) capable
to eliminate a great percentage of outliers in a data set and of
correlation procedures more and more reliable, allowed to
develop fully automatic vision systems capable to solve the
S&M problem.
As previously pointed out, these algorithms require that the
images of the sequence do not differ too much in order to
achieve a good match of feature correspondences, which are the
basis for a successful camera pose reconstruction. To limit error
propagation some constraint are usually called in, such as a
closed sequence around an object; besides, a key element is the
ability to trace a consistent number of points along a
sufficiently long section of the sequence, to allow a good
relative geometry among cameras and objects. Developing our
general system for M&S recovery, which largely builds up on
the techniques presented in (Hartley, R.. Zisserman, A., 2000),
we therefore tried to specialize it in order to gain advantage of
some constraints that apply in the mobile mapping case, in the
attempt to overcome some of the restrictions of the general case
and optimizing at the same time the error propagation.
2.2 Robust automatic recovery of structure and motion.
2.2.1 Feature extraction and putative correspondences
evaluation.
The first step in our workflow is the extraction of interest points
from the sequence, possibly ensuring that they can be easily
related to the same image points in other images of the
sequence. We used the Harris operator (Harris C... Stephens M.
1987) but also the Foerstner operator provides good results
(Förstner, W. and E. Gülch, 1997). The algorithm developed try
to achieve a uniform distribution of the extracted point on the
image frame, in order to give better results during camera pose
estimation and reject points without a sufficient gradient g.v.
In order to compute a first geometry of the camera pose, we
need to establish for every extracted point in an image a
potential correspondent point (if any) in the next image of the
sequence. This correspondence is accepted or rejected on the
disparity threshold and on the similarity of the g.v. in the
neighbourood. Currently we use, in order to limit computation
time, a simple cross-correlation between two windows; a
possible improvement might be using LSM to improve
accuracy and correctness in the matched points. Even if the
algorithm eliminates many wrong correspondences (we use a
0.8 threshold on the cross-correlation coefficient and adopt a
bidirectional uniqueness of matching criteria) the data set is still
affected by a great amount of outliers.
2.2.2 Outlier detection.
To achieve an error free set of correspondences in the image
pair we filter the data set taking into account that all points must
satisfy some geometric constraints due to the cameras’ relative
position (normally unknown). First of all we estimate the
epipolar geometry between the first two images of the sequence
with a robust estimation algorithm (Fischler M., Bolles R.