Full text: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing (Part B5-2)

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B5. Beijing 2008 
634 
reference image, the only unknown is the relative scale which is 
computed as the median of the ratios of the distances to the five 
3D points in the two models. 
We employ image pyramids to make the procedure more 
efficient. Because of this we can afford to use as initial search 
space the whole image, though on the highest pyramid level 
with a typical resolution of 100 X 100 pixels. On the second and 
third highest level the epipolar lines derived from the essential 
matrices and trifocal constraints are employed, respectively. 
After reconstructing triplets they are linked based on the 
overlapping images. E.g., triplets consisting of images 1 -2 -3 
and 2 -3 -4 overlap by images 2 and 3. For those two images 
projection matrices can be computed from the trifocal tensors 
(Hartley and Zisserman, 2003) and from them in turn a 
Euclidean transformation mapping from the first to the second 
triplet. In (Mayer, 2007b) we have shown how to speed up 
linking by conducting it hierarchically, at the same time 
avoiding also a bias in the estimation process due to the 
combination of sequences of very different lengths (e.g., when 
one links 3 images to 90 images). During linking we also track 
points by projecting them into newly linked images and 
determining the image coordinates via LSM, resulting in highly 
precise n-fold points. 
determine planes from the 3D points. Particularly, we follow 
(Mayer, 2007a). Because the vertical direction is predominant in 
urban scenes, we determine it first from the image of the 
vanishing point in the form of the intersection point of the 
projections of vertical lines in the scene into the images 
computed by means of RANSAC. Orienting the whole scene 
vertically helps considerably to determine the boundaries of the 
partially vertical planes. 
The planes themselves are also obtained by RANSAC and 
additionally least squares adjustment. For the planes two 
parameters must be given by the user: A threshold determining 
the distance of points from the plane and the maximum distance 
between points on the plane, the latter avoiding large planes 
consisting of a dense cluster of correct points and few randomly 
distributed points which by chance lie on the plane. 
For each plane texture is determined also for partially occluded 
regions by means of a consensus approach (Mayer, 2007a). The 
latter allows to reconstruct the correct texture even it is visible 
in less than 50% of the images which can see the particular 
region. The results of plane reconstruction have been used for 
facade interpretation (Mayer and Reznik, 2007, Reznik and 
Mayer, 2007). 
The linking of the triplets is done on the second or third highest 
level of the pyramid, depending on the image size. After linking 
the points are projected into the original resolution images, once 
again producing highly accurate relative coordinates by means 
of LSM. 
After all steps we employ robust bundle adjustment (McGlone 
et al., 2004). E.g., also when estimating essential matrices and 
trifocal tensors we compute a bundle solution every couple of 
hundred iterations as we found that only the maximum 
likelihood bundle solution is reliable for difficult sequences 
(Mayer, 2008) 
The outcome of the above process are the relative orientations 
of cameras as well as 3D points. The coordinate system is fixed 
to the first camera and the scale is determined by the base from 
the first to the second camera for which the length is set to one. 
While this gives basic information about the 3D structure of the 
scene, it does not allow, e.g., to compute visibility. Ideal for this 
would be dense depth maps, but there is no standard robust 
approach for their computation available. Recent approaches 
such as (Strecha et al., 2004, Lhuillier and Quan, 2005, 
Hirschmuller, 2008) all have their shortcomings. 
(Pollefeys et al., 2008) have shown dense depth maps computed 
in real-time for extended areas, but the resulting 3D model 
suffers from occlusions and incorrect shapes as no information 
about the imaged objects is included. (Comelis et al., 2008) 
make use of the knowledge that facades or similar objects are 
imaged by employing ruled surfaces parallel to the vertical 
direction. This improves the result, but still some non-vertical 
objects are not reconstructed with their correct shape. Finally 
we note, that (Pollefeys et al., 2008) and (Comelis et al., 2008) 
both employ dense video data, which considerably restricts the 
search space, thus allowing for real-time processing on 
graphical processing units (GPU). 
As we focus on urban scenes where planes are abundant and 
often describe important objects such as walls, we decided to 
3. 3D RECONSTRUCTION FROM IMAGES FROM A 
MICRO UAV 
A Micro UAV is a very small and light UAV. Thus, it is very 
appropriate to explore built-up areas. It renders it possible to fly 
through streets and into courtyards and to take there images of 
buildings and their facades from an off-ground perspective 
independently of ground conditions or obstacles on the ground. 
In our first experiments we investigated if images from a Micro 
UAV can be used for 3D reconstruction. We employed a 
quad-copter, i.e., a UAV with four rotors, with a diameter of 1 
meter, and a weight under 1 kg. It carried a ten Megapixel 
consumer camera. Figure 1 shows the planned image 
configuration ’’Circle”. 
Figure 1. Planned image configuration ’’Circle”.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.