XXII ISPRS Congress 2012: Technical Commission III

        
OM IMAGE 
79, China 
ose Estimation, RANSAC 
try and computer vision 
ace from which the image 
lidean space in which no 
'S obtained in cities. We 
construction system. 
| presented the well known 
ple Consensus (RANSAC) 
the "perspective-n-point" 
hich is an equivalent but 
nt of the LDP is originally 
: Given the relative spatial 
en the angle to every pair 
point called the Center of 
the line segments ("legs") 
ol points. The aim of the 
to determine the position 
intrinsic parameters and a 
3D points and their 2D 
t, Fua, 2007). Therefore, 
sorts to pose estimation in 
To automate the process, 
ted in image I should be 
scene is used to determine 
nage I. And the calculated 
ted to image J to visually 
g images of a scene shot 
asy to create a panorama 
i| images and reconstruct 
pondences among several 
8) presented method that 
Cameras and acquire the 
1 objects using closed or 
; Lowe, 2004) presented 
SIFT) operator to extract 
le and rotation, which can 
oss a substantial range of 
viewpoint. (Zhang and 
te images by finding geo- 
ase. But we don’t assume 
tags in our research for 
)06; Snavely et al., 2008) 
alled Bundler) that can 
ons of unordered images 
ire From Motion (SFM) 
ies (Agarwal et al., 2011) 
age resource website like 
    
  
While systems like Bundler can automatically reconstruct 3D 
scenes using a large number of unordered images, the system 
might produce erroneous result due to lack of enough overlap 
between images and good estimate of focal length of images. 
The second problem can be solved by adding the CCD width 
database to the EXIF reading script. As to the second problem, 
the authors of the system recommend at least 15 degrees 
interval between nearby viewpoints, but this condition cannot 
be satisfied in our case as we don't limit the angle of viewpoints 
between image I and other images. As a result, the image I will 
not be registered with the other photos, because there weren't 
enough matches and angle between camera viewpoints is 
relatively large. 
Based on the above observation, we present a method detailed 
in section 2 to solve the LDP problem in urban environment. 
The experiment result and discussion is described in section 3. 
2. METHODOLOGY 
If we've got an image I depicting a set of landmarks with known 
locations, then we can determine that point in space from which 
the image was obtained by space resection or pose estimation. 
So we can first reconstruct the 3D scene appears in image I 
using overlapping images collected afterwards and extend the 
image sequences to cover the whole building. Because we 
cannot register image I with other images as mentioned above, 
we cannot directly obtain the 3D position corresponding to 
points in image I by 3D construction of the scene. So we have 
to match image I with images used in 3D reconstruction and 
obtain the 3D position of points in image I by transfer: given 
the position of a point in one (or more) image(s), determine 
where it will appear in all other images of the set (Hartley and 
Zisserman, 2004). After we've got the 3D position of the 
viewpoint of image I by pose estimation, we can project it to the 
images covering the building and visually locate the place. So 
our method could be mainly separated to three steps: 
l. 3D reconstruction of the scene: reconstruct the scene using 
image sequences covering the landmarks in image I and the 
nearby environment (e.g. the building). 
2. Point transfer: match image I and images used to do 3D 
reconstruction and transfer image points with known 3D 
position to image I. 
3. Pose estimation and viewpoint projection: solve PnP 
problem using the points in image I and their 
corresponding 3D position obtained in step 2. Project the 
calculated viewpoint of image I to images covering nearby 
environment. 
2.1 3D reconstruction of the scene 
We've compiled Bundler v0.4 under Linux and use the system 
to create a 3D reconstruction. We first extract image 
information (including focal length and image resolution) using 
Perl script. Interest points are detected in the given image I as 
well as each image in image sequences using SIFT operator. 
Images are matched against each other using approximate 
nearest neighbour search. Mismatches often result from clutters 
and shadows which are common in urban scenes. RANSAC is 
used to detect and remove outliers in point correspondences. 
The main program “bundler” solves the Bundle Adjustment 
problem using Levenberg-Marquardt algorithm. After all 
possible images have been registered, Bundler outputs 3D 
reconstruction containing the reconstructed cameras and sparse 
3D points. The estimated extrinsic and extrinsic parameters of 
each registered camera contain: 
f the focal length, 
kl, k2: radial distortion coeffs 
R: 3x3 matrix representing the camera rotation 
t; a 3-vector describing the camera translation 
Parameters of each reconstructed point has the form: 
position: a 3-vector describing the 3D position of the point 
color: a 3-vector describing the RGB color of the point 
view list: a list of cameras the point is visible in 
The view list begins with the number of cameras the point is 
visible in and followed by a list of quadruplets <camera> <key> 
<x> <y>, where <camera> is a camera index, <key> the index 
of the SIFT keypoint detected in that camera, and <x> and <y> 
are the detected 2D positions of that keypoint in that camera. 
We use a pinhole camera model. The origin of the camera 
coordinate system the center of the image, the positive x-axis 
points right, the positive y-axis points up and the positive z-axis 
points backwards. Therefore, the estimated parameters of each 
camera specified above can be used to project a 3D point X into 
a camera (R, t, f) by: 
P=R*X +t (1) 
p=-P/zp (2) 
p=f*dp)"p (3) 
where zp is the third coordinate of P. Equation 1 transforms the 
coordinates of a 3D points from a world coordinate system to 
the current camera coordinate system. Equation 2 commits 
perspective division and Equation 3 converts the coordinates to 
values in pixel. In the last equation, r(p) is a function that 
computes a scaling factor to undo the radial distortion 
(Equation 4): 
r(p) = 1.0 + k1 * [[p]^2  k2 * |ipi^4 (4) 
2.2 Point Transfer 
We can obtain 3D points and their corresponding positions in 
image sequences from output of the first step. To find the 
projections of these 3D points, we must first establish the 
relationship between image I and images in sequences using a 
set of auxiliary point correspondences. If image I is registered 
with other images in bundle adjustment process, we would 
directly get the 3D position of image I by: 
X(D=-R *1 (5) 
And the projection of viewpoint of camera I into each camera 
would be calculated by Equation 1, 2 and 3. Then point transfer 
and pose estimation will not be necessary. But usually image I 
cannot be registered with other images (none in our experiment) 
due to large variations of scale and angles of viewpoint. The 
alternative procedure we take in this research is to transfer 
points from image(s) used in reconstruction to image I and use 
the transferred points to estimate the pose of I.
1
2
...
450
451
452
453
454
...
586
587
Full text: Technical Commission III (B3)

Access restriction

Copyright

Note to user