Full text: Papers accepted on the basis of peer-reviewed full manuscripts (Pt. A)

Virtual navigation in remote environments can be achieved by building an image-based model made of multiple panoramas gathered 
from cameras moving around the scene. In such models, it could be useful to acquire knowledge of the 3D structure of the scene. In 
this paper, we propose a method that constructs a sparse but rich 3D representation of a scene, given a set of calibrated panoramic 
images. The proposed method is a heuristic search algorithm that, given calibrated panoramic images, finds 3D points that correspond 
to the surfaces of objects in the scene. The algorithm constructs a set of 3D points by searching for matching edge pixels in pairs of 
images using the epipolar constraint. Empirical results show that the proposed method performs well at locating 3D points of interest 
in different scenes. 
A goal of tele-presence applications is to allow someone to vi 
sually experience a remote environment such that they can freely 
navigate through the environment with the impression of “being 
there”. One way to reach this goal is to create an image-based 
models of the scene composed of a multitude of panoramas cap 
tured in the scene of interest. Starting from a user-selected geo- 
referenced panorama, virtual navigation is then achieved by al 
lowing the user to move from one panorama to a neighboring 
one, thus simulating motion along some path in the scene. Under 
such a framework, knowledge of the 3D structure of the scene is 
not a necessary requirement; however extracting 3D information 
from the scene can be beneficial in many ways: i) it allows to 
more accurately register the panoramic images one with respect 
to the other and with respect to maps or other representations of 
the scene; ii) the image-model can then be augmented with vir 
tual objects or virtual annotation that can be coherently displayed 
on the different panoramic images; iii) 3D measurement in the 
scene can be made and non feasible motions can be invalidated 
(e.g. going through an obstacle); iv) it facilitates the generation of 
photo-realistic virtual views in order to simulate smooth motion 
while navigating through the scene, from a finite set of images. 
The purpose of this work is, given a sparse set of calibrated panoramic 
images, to obtain a rich set of 3D points that correspond to the 
surfaces of the objects in a scene. Towards this goal we have de 
veloped a search method that searches for matches using features 
that appear more frequently in each image than the features used 
during the calibration procedure. Our method uses a multi-start 
search methodology which is a variation of the method proposed 
in (Louchet, 1999). 
The rest of this paper is organized as follows: Section 2 gives 
a brief description of methods that have been developed to esti 
mate the 3D structure of a scene; our proposed heuristic search 
algorithm is presented in Section 3; the results of testing our pro 
posed algorithm on sets of real calibrated images can be found in 
Section 4; and finally, our conclusions are given in Section 5. 
STRUCTURE FROM MOTION 
The purpose of structure from motion algorithms is to estimate 
the position and orientation of each image in a set of images, and 
to estimate the 3D structure of the scene. 
Recent work has been done by Snavely et al. (Snavely et al., 
2008) on calibrating images of a scene taken from different view 
points, and in turn estimating the 3D structure of the scene. In 
both cases, camera calibration is carried out by (1) finding cor 
respondences between pixels among subsets of the images using 
the scale invariant feature transform (SIFT) (Lowe, 2004), (2) es 
timating the camera parameters (internal and external) using the 
epipolar constraint and the RANSAC algorithm, and then (3) us 
ing bundle adjustment to optimize these parameters, minimizing 
the reprojection error over all correspondences. The correspon 
dences constitute a sparse description of the 3D structure of the 
scene. Goesele et al. (Goesele et al., 2007) then proceed to es 
timate the complete 3D structure of the scene from these sparse 
3D points. Both of these methods were tested using large, densely 
located sets of non-panoramic images. 
An alternative to SIFT, called speeded up robust features (SURF), 
is proposed by Bay et al. (Bay et al., 2008). SURF claims to be 
faster to compute and more accurate than SIFT. 
Although this calibration method is very effective at estimating 
the camera parameters, it may result in a set of 3D points that are 
too sparse to adequately describe the 3D structure of the scene. 
Figure 1 shows an example of how SURF detected correspon 
dences may not adequately cover the scene. 
Pollefeys et al. (Pollefeys et al., 2008) and Comelis et al. (Cor 
nells et al., 2008) designed systems that perform 3D reconstruc 
tion of urban environments from video sequences. Camera pose 
estimation is carried out using camera calibration techniques that 
are similar to the technique summarized above. In order to per 
form faster and more accurately in urban environments, both sys 
tems use simplifying geometric assumptions of the scene to model 
the objects, such as roads and buildings. The system designed by
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.