Full text: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing (Part B5-2)

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B5. Beijing 2008 
The scene is then reconstructed with the technique proposed in 
Section 3. In this section, an approach for reconstructing wide 
area-scenes from high-resolution images with the associated 
computational issues is proposed. In our technique, the 
conventional space-sweeping approach (e.g. Zabulis et al. 2003) 
is slightly modified to employ a sweeping spherical, instead of 
planar, back-projection surface. Result is a more accurate and 
memory-conserving technique. Moreover, this extension 
facilitates the acceleration of the methods, based on a coarse-to- 
fine depth map computation. 
The proposed approach offers to the user the ability to 
reconstruct a scene from a few snapshots acquired with an off- 
the-shelf camera, preferably of high resolution. This way, a few 
snapshots suffice for the reconstruction and the image 
acquisition process becomes much simpler than capturing the 
scene with a video camera or with a multicamera apparatus 
(Mordohai et al., 2007). 
The final result is a textured mesh in either the Keyhole Markup 
Language (KML) or Virtual Reality Modeling Language 
(VRML) formats. The KML output allows integration to the 
Google Earth™ platform, thus the reconstructed 3D models and 
their virtual walkthrough applications can easily become a part 
of a large geographical information system (GIS) in the near 
future. Section 4 explains the Web-based virtual tour 
application developed. 
2. ROBUST CAMERA MOTION ESTIMATION 
BASED ON SIFT DETECTION AND MATCHING 
Robust estimation of the camera motion is essential, since the 
accuracy of the produced 3D reconstruction is based on this 
information. Our work is based on the approach proposed 
initially by Beardsley et al. (1997), and, subsequently, extended 
by Pollefeys et al. (1999, 2004) and Tola (2005). The approach 
establishes correspondences across consecutive images of a 
sequence to estimate camera motion. 
Previous approaches used the Harris comer detector (Harris and 
Stephens, 1988) to extract point features in images. The 
matching procedure utilized similarity as well as proximity 
criteria (Tola, 2005) to avoid spurious matches. In this paper, an 
alternative procedure was tested, utilizing SIFT feature 
detection and matching (Lowe, 2004). In both cases 
(Harris/SIFT), a RANSAC framework is then utilized to 
remove spurious correspondences, followed by a Levenberg- 
Marquardt post-processing step to further improve the 
estimation. Intrinsic camera parameters are estimated a priori 
through a simple calibration procedure (Bouguet, 2007). 
Besides reducing the unknowns in the following external 
calibration and bundle adjustment procedures, intrinsic 
calibration is used to compensate for radial distortion. As a 
result, the perspective camera model is better approximated and 
the system produces more accurate results. The output is an 
estimation of the essential matrix E, which is thereafter 
decomposed into rotation matrix (R) and translation vector (t) 
of the new view. Finally, triangulation is used to estimate the 
3D coordinates of the corresponding features. 
When a sequence of views is available, the above technique is 
applied for the first two views and for each new view i, the 
feature detection and matching approaches are applied to 
establish 2-D correspondences with the previous view i-1, 
which are then matched with the already established 3-D points, 
using a RANSAC-based technique that yields a robust estimate 
of the projection matrix Pj of the new view. We have used an 
efficient Bundle Adjustment procedure (Lourakis and Argyros, 
2004) as a final step at each addition of a new view. The 
procedure is illustrated in Figure 1. 
Although several error suppression and outlier removal steps 
are included, results show that the accuracy of the whole chain 
greatly relies on the success of the feature detection and 
matching. Despite the efficiency of the Harris comer detector 
and the neighborhood-based constraints utilized in 
correspondence establishment, we observed that SIFT yields 
better correspondences in terms of number and accuracy. This 
is especially important for camera positions with wider 
baselines. For our problem, robustness to large disparities or 
severe view angle changes is important because the scene is to 
be reconstructed from a few snapshots instead of a high-frame 
rate video. 
A technical issue encountered when high resolution images are 
utilized is that the computation of the SIFT features may require 
more memory than available. The proposed treatment is to 
tessellate the image into blocks, compute the features 
independently in each, and merge the results. To avoid blocking 
artifacts, the blocks in the above tessellation are adequately 
overlapping. Duplicate features are often encountered, either 
due to block overlap or due to collocation of different SIFT that 
occur at different scales; they are all removed at the merging 
stage. 
Figure 1. Illustration of the camera motion estimation procedure. 
3. 3D RECONSTRUCTION 
In this section, an approach for 3D scene reconstruction from 
high-resolution images is proposed and the associated 
computational issues are discussed. In the proposed method, the 
space-sweeping approach is slightly modified to employ a 
sweeping spherical, instead of planar, backprojection surface 
(see Zabulis, Kordelas et al. (2006) for an analytical 
formulation). 
The conventional space-sweeping approach is frequently used 
for multiview stereo reconstruction, due to its computational 
efficiency and its straightforward acceleration by graphics 
hardware (Yang et al., 2002; Li et al., 2004). However, it is less
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.