OM IMAGE
79, China
ose Estimation, RANSAC
try and computer vision
ace from which the image
lidean space in which no
'S obtained in cities. We
construction system.
| presented the well known
ple Consensus (RANSAC)
the "perspective-n-point"
hich is an equivalent but
nt of the LDP is originally
: Given the relative spatial
en the angle to every pair
point called the Center of
the line segments ("legs")
ol points. The aim of the
to determine the position
intrinsic parameters and a
3D points and their 2D
t, Fua, 2007). Therefore,
sorts to pose estimation in
To automate the process,
ted in image I should be
scene is used to determine
nage I. And the calculated
ted to image J to visually
g images of a scene shot
asy to create a panorama
i| images and reconstruct
pondences among several
8) presented method that
Cameras and acquire the
1 objects using closed or
; Lowe, 2004) presented
SIFT) operator to extract
le and rotation, which can
oss a substantial range of
viewpoint. (Zhang and
te images by finding geo-
ase. But we don’t assume
tags in our research for
)06; Snavely et al., 2008)
alled Bundler) that can
ons of unordered images
ire From Motion (SFM)
ies (Agarwal et al., 2011)
age resource website like
While systems like Bundler can automatically reconstruct 3D
scenes using a large number of unordered images, the system
might produce erroneous result due to lack of enough overlap
between images and good estimate of focal length of images.
The second problem can be solved by adding the CCD width
database to the EXIF reading script. As to the second problem,
the authors of the system recommend at least 15 degrees
interval between nearby viewpoints, but this condition cannot
be satisfied in our case as we don't limit the angle of viewpoints
between image I and other images. As a result, the image I will
not be registered with the other photos, because there weren't
enough matches and angle between camera viewpoints is
relatively large.
Based on the above observation, we present a method detailed
in section 2 to solve the LDP problem in urban environment.
The experiment result and discussion is described in section 3.
2. METHODOLOGY
If we've got an image I depicting a set of landmarks with known
locations, then we can determine that point in space from which
the image was obtained by space resection or pose estimation.
So we can first reconstruct the 3D scene appears in image I
using overlapping images collected afterwards and extend the
image sequences to cover the whole building. Because we
cannot register image I with other images as mentioned above,
we cannot directly obtain the 3D position corresponding to
points in image I by 3D construction of the scene. So we have
to match image I with images used in 3D reconstruction and
obtain the 3D position of points in image I by transfer: given
the position of a point in one (or more) image(s), determine
where it will appear in all other images of the set (Hartley and
Zisserman, 2004). After we've got the 3D position of the
viewpoint of image I by pose estimation, we can project it to the
images covering the building and visually locate the place. So
our method could be mainly separated to three steps:
l. 3D reconstruction of the scene: reconstruct the scene using
image sequences covering the landmarks in image I and the
nearby environment (e.g. the building).
2. Point transfer: match image I and images used to do 3D
reconstruction and transfer image points with known 3D
position to image I.
3. Pose estimation and viewpoint projection: solve PnP
problem using the points in image I and their
corresponding 3D position obtained in step 2. Project the
calculated viewpoint of image I to images covering nearby
environment.
2.1 3D reconstruction of the scene
We've compiled Bundler v0.4 under Linux and use the system
to create a 3D reconstruction. We first extract image
information (including focal length and image resolution) using
Perl script. Interest points are detected in the given image I as
well as each image in image sequences using SIFT operator.
Images are matched against each other using approximate
nearest neighbour search. Mismatches often result from clutters
and shadows which are common in urban scenes. RANSAC is
used to detect and remove outliers in point correspondences.
The main program “bundler” solves the Bundle Adjustment
problem using Levenberg-Marquardt algorithm. After all
possible images have been registered, Bundler outputs 3D
reconstruction containing the reconstructed cameras and sparse
3D points. The estimated extrinsic and extrinsic parameters of
each registered camera contain:
f the focal length,
kl, k2: radial distortion coeffs
R: 3x3 matrix representing the camera rotation
t; a 3-vector describing the camera translation
Parameters of each reconstructed point has the form:
position: a 3-vector describing the 3D position of the point
color: a 3-vector describing the RGB color of the point
view list: a list of cameras the point is visible in
The view list begins with the number of cameras the point is
visible in and followed by a list of quadruplets <camera> <key>
<x> <y>, where <camera> is a camera index, <key> the index
of the SIFT keypoint detected in that camera, and <x> and <y>
are the detected 2D positions of that keypoint in that camera.
We use a pinhole camera model. The origin of the camera
coordinate system the center of the image, the positive x-axis
points right, the positive y-axis points up and the positive z-axis
points backwards. Therefore, the estimated parameters of each
camera specified above can be used to project a 3D point X into
a camera (R, t, f) by:
P=R*X +t (1)
p=-P/zp (2)
p=f*dp)"p (3)
where zp is the third coordinate of P. Equation 1 transforms the
coordinates of a 3D points from a world coordinate system to
the current camera coordinate system. Equation 2 commits
perspective division and Equation 3 converts the coordinates to
values in pixel. In the last equation, r(p) is a function that
computes a scaling factor to undo the radial distortion
(Equation 4):
r(p) = 1.0 + k1 * [[p]^2 k2 * |ipi^4 (4)
2.2 Point Transfer
We can obtain 3D points and their corresponding positions in
image sequences from output of the first step. To find the
projections of these 3D points, we must first establish the
relationship between image I and images in sequences using a
set of auxiliary point correspondences. If image I is registered
with other images in bundle adjustment process, we would
directly get the 3D position of image I by:
X(D=-R *1 (5)
And the projection of viewpoint of camera I into each camera
would be calculated by Equation 1, 2 and 3. Then point transfer
and pose estimation will not be necessary. But usually image I
cannot be registered with other images (none in our experiment)
due to large variations of scale and angles of viewpoint. The
alternative procedure we take in this research is to transfer
points from image(s) used in reconstruction to image I and use
the transferred points to estimate the pose of I.