LOCATION DETERMINATION IN URBAN ENVIRONMENT FROM IMAGE
SEQUENCES
Qingming Zhan ***, Yubin Liang ®°, Yinghui Xiao *°
* School of Urban Design, Wuhan University, Wuhan 430072, China
? Research Center for Digital City, Wuhan University, Wuhan 430072, China
* School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
qmzhan@whu.edu.cn; lyb.whu@gmail.com; yhxiaoitc@126.com
KEY WORDS: Location Determination Problem, Bundle Adjustment, Image Matching, Point Transfer, Pose Estimation, RANS AC
ABSTRACT:
Location Determination Problem (LDP) is a classic and interesting problem both for photogrammetry and computer vision
community: Given an image depicting a set of landmarks with known locations, determine that point in space from which the image
was obtained. In this paper we try to use image sequences to automatically solve LDP in local Euclidean space in which no
georeference information is needed. Overlapping image sequences are preferable for matching images obtained in cities. We
implement a method which can semi-automatically solve LDP in urban scenario with state-of-the-art 3D reconstruction system.
1. INTRODUTION
Nowadays Google Maps and other city-scale 3D reconstruction
systems with street view are widely used for visual exploration
of cities. Those systems often rely on structured photos captured
using sensors equipped with GPS and Inertial Navigation Units
which make post-processing much easier. However, these
systems only cover large cities and famous avenues attractive to
tourists. Furthermore, many people do not need absolute
georeference information in daily vision-related applications
such as augmented reality. Only location information in local
space is enough. For example, given an image taken from a
place (Figure la), one can guess that the photo was taken from a
window of a nearby building (Figure 1b) according to viewing
direction of the given image. But it’s difficult to locate the
precise location. The authors of this paper are interested in such
a problem: given an image I, locate the place in another image J
where image I is taken.
Figure la Figure 1b
Figure 1: Given image and building where the image was taken
The problem is defined as ‘space resection’ in photogrammetry
community and ‘pose estimation’ or ‘extrinsic camera
calibration’ in computer vision community. Extrinsic camera
calibration is often carried in calibration field using well-
designed targets/rigs. This is not the case in our problem
because there're no pre-installed rigs and image I is taken
arbitrarily. The difference between space resection and pose
estimation is that the given image points in space resection is
georeferenced, whereas pose estimation is usually in local
Euclidean space. Location Determination Problem is a general
definition both for the photogrammetry and computer vision
communities: Given a set of m control points, whose 3-
dimensional coordinates are known in some coordinate frame,
and given an image in which some subset of the m control
points is visible, determine the location (relative to the
coordinate system of the control points) from which the image
was obtained. (Fischler, Bolles, 1981) presented the well known
model-fitting paradigm Random Sample Consensus (RANSAC)
and use model inliers to solve the "perspective-n-point"
problem (PnP). The PnP problem which is an equivalent but
mathematically more concise statement of the LDP is originally
defined in (Fischler, Bolles, 1981) as: Given the relative spatial
locations of n control points, and given the angle to every pair
of control points from an additional point called the Center of
Perspective (CP), find the lengths of the line segments ("legs")
joining the CP to each of the control points. The aim of the
Perspective-n-Point problem (PnP) is to determine the position
and orientation of a camera given its intrinsic parameters and a
set of n correspondences between 3D points and their 2D
projections (Moreno-Noguer, Lepetit, Fua, 2007). Therefore,
the solution to our problem mainly resorts to pose estimation in
reconstructed local Euclidean space. To automate the process,
3D reconstruction of the scene depicted in image I should be
done first. Then the 3D reconstructed scene is used to determine
the 3D position of the view point of image I. And the calculated
3D coordinate of view point is projected to image J to visually
locate the position.
Nowadays, given a set of overlapping images of a scene shot
from nearby camera locations, it's easy to create a panorama
that seamlessly combines the original images and reconstruct
the 3D scene using extracted correspondences among several
images. (Fitzgibbon, Zisserman, 1998) presented method that
could simultaneously localize the cameras and acquire the
sparse 3D point cloud of the imaged objects using closed or
open image sequences. (Lowe, 1999; Lowe, 2004) presented
Scale Invariant Feature Transform (SIFT) operator to extract
features that are invariant to image scale and rotation, which can
be used to robustly match images across a substantial range of
affine distortion and change in 3D viewpoint. (Zhang and
Koseca, 2006) used SIFT to geo-locate images by finding geo-
tagged image match in pre-built database. But we don't assume
geo-location information such as geo-tags in our research for
generality purpose. (Snavely et al., 2006; Snavely et al., 2008)
presented state-of-the-art system (called Bundler) that can
automatically structure large collections of unordered images
and they have scaled up the Structure From Motion (SFM)
vision algorithms to work on entire cities (Agarwal et al., 201 1)
using photographs obtained from image resource website like
Flickr.
mig]
betv
The
data
the
inte:
bes
betv
not
eno
rela
Bas
in s
The
If v
loc:
the
So
usit
ima
can
we
poi
to 1
obt
the
wh
Zis
vie
ime
our
ust