Full text: Close-range imaging, long-range vision

G., 1999a. 
the ARGO 
, Piazzi, A., 
and Control 
ontrol and 
Building an 
Driving on 
2, Part B-5, 
  
VIDEO-TO-3D 
Marc Pollefeys?:^* Luc Van Gool^, Maarten Vergauwen?, Kurt Cornelis^, Frank Verbiest^, Jan Tops? 
" Center for Processing of Speech and Images, K.U.Leuven 
? Dept. of Computer Science, University of North Carolina — Chapel Hill, 
Marc.Pollefeys @cs.unc.edu 
Working Group III/V 
KEY WORDS: 3D modeling, video sequences, structure from motion, self-calibration, stereo matching. 
ABSTRACT 
In this contribution we intend to present a complete system that takes a video sequence of a static scene as input and 
outputs a 3D model. The system can deal with images acquired by an uncalibrated hand-held camera, with intrinsic 
camera parameters possibly varying during the acquisition. In a fist stage features are extracted and tracked throughout 
the sequence. Using robust statistics and multiple view relations the 3D structure of the observed features and the camera 
motion and calibration are computed. In a second stage stereo matching is used to obtain a detailed estimate of the 
geometry of the observed scene. The presented approach integrates state-of-the-art algorithms developed in computer 
vision, computer graphics and photogrammetry. 
1 INTRODUCTION 
In recent years the emphasis for applications of 3D model- 
ing has shifted from measurements to visualization. New 
communication and visualization technology have created 
an important demand for photo-realistic 3D content. In 
most cases virtual models of existing scenes are desired. 
This has created a lot of interest for image-based approaches. 
Applications can be found in e-commerce, real estate, games, 
post-production and special effects, simulation, etc. For 
most of these applications there is a need for simple and 
flexible acquisition procedures. Therefore calibration should 
be absent or restricted to a minimum. Many new applica- 
tions also require robust low cost acquisition systems. This 
stimulates the use of consumer photo- or video cameras. 
The approach presented in this paper allows to captures 
photo-realistic virtual models from images. The user ac- 
quires the images by freely moving a camera around an 
object or scene. Neither the camera motion nor the cam- 
era settings have to be known a priori. There is also no 
need for preliminary models. The approach can also be 
used to combine virtual objects with real video, yielding 
augmented video sequences. 
The approach proposed in this papers builds further on ear- 
lier work, e.g. (Pollefeys et al., 2000). Several important 
improvements were made to the system. To deal more effi- 
ciently with video, we have developed an approach that can 
automatically select key-frames suited for structure and mo- 
tion recovery. The projective structure and motion recov- 
ery stage has been made completely independent of the ini- 
tialization which avoids some instability problems that oc- 
curred with the quasi-Euclidean initialization proposed in 
(Beardsley et al., 1997). Several optimizations have been 
implemented to obtain more efficient robust algorithms (Matas 
and Chum, 2001). To guarantee a maximum likelihood re- 
construction at the different levels a state-of-the-art bundle 
adjustment algorithm was implemented that can be used 
both at the projective and the Euclidean level. A much 
more robust linear self-calibration algorithm was obtained 
  
* corresponding author 
by incorporating general a priori knowledge on meaning- 
ful values for the camera intrinsics. This allows to avoid 
most problems related to critical motion sequences (Sturm, 
1997) (i.e. some motions do not yield a unique solution 
for the calibration of the intrinsics) that caused the ini- 
tial linear algorithm proposed in (Pollefeys et al., 1998) 
to yield poor results under some circumstances. A solu- 
tion was also developed for another problem. Previously 
observing a purely planar scene at some point during the 
acquisition would have caused uncalibrated approaches to 
fail. A solution that detects this case and deals with it ac- 
cordingly has been proposed (Pollefeys et al., 2002). Both 
correction for radial distortion and stereo rectification have 
been integrated in a single image resampling pass. This al- 
lows to minimize the image degradation. Our processing 
pipeline uses a non-linear rectification scheme (Pollefeys 
et al., 1999b) that can deal with all types of camera motion 
(including forward motion). For the integration of multiple 
depth maps into a single surface representation a volumet- 
ric approach has been implemented (Curless and Levoy, 
1996). The texture is obtained by blending the original 
images based on the surface geometry so that the texture 
quality is optimized. The resulting system is much more 
robust and accurate. This makes it possible to efficiently 
use it for many different applications. 
2 FROM VIDEO TO 3D MODELS 
Starting from a sequence of images the first step consists 
of recovering the relative motion between consecutive im- 
ages. This process goes hand in hand with finding corre- 
sponding image features between these images (i.e. im- 
age points that originate from the same 3D feature). In the 
case of video data features are tracked until disparities be- 
come sufficiently large so that an accurate estimation of the 
epipolar geometry becomes possible. 
The next step consists of recovering the motion and cali- 
bration of the camera and the 3D structure of the tracked 
or matched features. This process is done in two phases. At 
—579— 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.