XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
  
  
  
   
  
   
   
   
   
   
   
   
   
  
  
  
  
  
   
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
   
  
  
   
  
   
  
   
  
   
  
  
  
  
  
  
      
Fusion of visual data through dynamic stereo-motion cooperation 
Nassir Navab, and Zhengyou Zhang 
INRIA Sophia Antipolis 
2004 Route des Lucioles 
06565 Valbonne Cedex 
FRANCE 
Abstract 
Integrating information from sequences of stereo im- 
ages can lead us to a robust visual data fusion. Instead of 
considering stereo and temporal matchings as two indepen- 
dent processes, we propose a unified scheme in which each 
borrows dynamically some information from the other. Us- 
ing an iterative approach and statistical error analysis, dif- 
ferent observations are appropriately combined to estimate 
the motion of the stereo rig and build a dynamic 3D model 
of the environment. We also show how motion estima- 
tion and temporal matching can be used to add new stereo 
matches. The algorithm is demonstrated with real images. 
Implemented on a mobile robot, it shows how fusion of vi- 
sual data can be useful for an autonomous vehicle working 
in an unknown environment. 
1 Introduction 
In stereo and motion analysis, most of previous work has 
been conducted using either two or three static cameras [27] 
or a sequence of monocular images obtained by a moving 
camera [4]. Several researchers tried to combine these two 
process to find faster and more robust algorithms [7, 23, 
18,16, 21,17, 22, 19, 2, 111. 
We believe in the efficiency of stereo-motion coopera- 
tion. This paper is another attempt to improve this idea. 
To extract 3D information from real images, “mean- 
ingful” extracted features, such as corner points, edges, 
regions, etc., are often used to reduce the computational 
cast and matching ambiguities. In this paper, we use the 
line segments obtained by an edge detector. Line segments 
are present in most of the real-world scenes such as : high- 
ways,car traffic tunnels, long indoor hallways or industrial 
assembly. 
In [7], [19] and [9] , we tried to make cooperate two 
existing algorithms-a hypothesis-verification based stereo 
matching algorithm [3] and a monocular line tracking al- 
gorithm [5]. Very soon we realized that each of these pro- 
cesses may work faster and better (in terms of robustness) 
if they could borrow dynamically some information from 
each other. And the motion estimation could play an im- 
portant role of intermediary between these two processes. 
If we want a tighter cooperation between stereo and mo- 
tion, we must not consider them as two different processes 
with some interactions from time to time. 
We present a unified iterative algorithm for both tem- 
poral tracking of stereo pairs of segments and camera sys- 
tem ego-motion estimation; which consequently allows us 
to keep track of our 3D reconstructions. The algorithm is 
based on a dynamic interaction between different sources 
of information. 
Figure3 shows the general scheme of the algorithm. 
This scheme is adapted from that of Droid [14, 10]. The 
basic difference is that we use straight lines features as to- 
     
ken, where Droid make use of point feature, and once the 
cameras system ego-motion is estimated, we use that in- 
formation for tacking 2D lines on each camera. 
In section 4, we describe how to use straight line tokens 
to estimate the cameras system ego-motion and its associ- 
ated covariance matrix. The algorithm is decomposed in 
three different steps, as shown in figure3. In section 6, we 
describe these three different steps. Finally, section 7 shows 
briefly the results of the different steps of our algorithm on 
real images obtained by INRIA mobile robot. 
2 Preliminaries 
Vectors are represented in bold face, i.e x. Transposition of 
vectors and matrices is indicated by 7, i.e xT. x denotes 
the time derivative of x, ie X — dx 3D points P are 
represented by vectors P — (X, Y, Z)T. For a given three- 
dimensional vector x we also use X to represent the 3 x 3 
antisymmetric matrix such that Xy — x ^ y for all vectors 
y. L, represents the n x n identity matrix. 
  
  
  
y 
Fig.1. Pinhole model of a camera 
We model our camera with the standard pinhole model 
of figure-1 and assume that everything is referred to the 
camera standard coordinate frame (ozyz). We know from 
work on calibration [24, 6] that it is always possible, up to 
a very good approximation, to go from the real pixel values 
to the standardized values z and y. When using a pair of 
calibrated stereo cameras, everything is written in one of 
the cameras coordinate systems. 
3 The Pluckerian line representa- 
tion 
Different line representations in $? and R°, have been used 
-1z in computer vision works. Though the theoretical re- 
sults may be equivalent, one is more or less suitable for 
a possible implementation. Here, we use the Pluckerian
1
2
...
941
942
943
944
945
...
962
963
Full text: XVIIth ISPRS Congress (Part B5)

Access restriction

Copyright

Note to user