You are using an outdated browser that does not fully support the intranda viewer.
As a result, some pages may not be displayed correctly.

We recommend you use one of the following browsers:

Full text

Close-range imaging, long-range vision

MI acquisition with GPS-enabled sensors
Approximate camera orientation (azimuth
detection) by comparing a video frame to the
corresponding synthetic panorama image.
Absolute orientation of the frame and filing of the
orientation with time information for further
Change Detection
VR Model Update

Figure 1: Flowchart for VR model updates using motion
imagery (MI).
In this paper we present an innovative approach for the fast
recovery of azimuth information of MI datasets in large urban
scenes, making use of a pre-existing VR model (the second box
of the process outlined in Fig. 1, marked by bold lettering). The
paper is organized as follows. In section 2 we present an
overview of the current state-of-the-art in VR modeling of
large-scale urban scenes. In Section 3 we address the
hierarchical analysis of scene content, and present metrics that
we have developed to compare object configurations
considering distances and relative orientations between objects.
We present experimental results in Section 4 and conclude with
comments and future work plans in Section 5.
Arguably the premier effort in this direction is the collaborative
effort of the groups of Bill Jepson and Richard Muntz at UCLA
for the development of Virtual LA, a large scale virtual model of
the city of Los Angeles (see e.g. [Jepson et al., 1996] and the
web site www.aud.ucla.edu/-bill/UST.html). The photorealistic
3D model of LA was created using aerial and street-level
imagery, and is used to support a variety of cross-disciplinary
simulations (e.g. evaluating urban planning, and rehearsing
emergency response actions). From a research point of view the
major strength of this effort lies in the development of a system
to support interactive navigation over the entire model by
integrating many smaller models (over a dozen models) into a
large virtual environment. The UCLA team is currently working
towards extending this model to cover the complete LA basin,
an area in excess of 4,000 square miles. Further plans are to
extend coverage beyond LA to San Diego and Las Vegas.
Other notable efforts focus on image analysis issues to create
3D urban scene models. They include the work of [Brenner,
2000] on the automatic 3D reconstruction of complex urban
scenes using height data from airborne laser scanning and the
groundplans of buildings as they are provided by existing 2D
GIS or map data. Height data are used to create a digital
elevation model (DEM) of the city, and a photorealistic virtual
city model is generated by projecting onto this DEM aerial or
terrestrial images. This approach has been used to create a
virtual model of the city of Stuttgart (Germany), covering more
than 5000 buildings in an area of 2km x 3km [Haala & Brenner,
1999]. Before the work of the Stuttgart group, the group of
Gruen at ETH (Zurich) had worked on the integration of terrain
imagery and aerial-sensor-derived 3D city models [Gruen et al.,
1996; Gruen & Wang, 1999].
Similar approaches have been followed in the UK to develop
virtual models of the city of Bath, covering several square
kilometers of the historic center of the city at sub-meter
resolution [Day et al., 1996], in Austria to establish models of
the cities of Graz and Vienna [Ranzinger & Gleixner 1997], and
in Australia to develop a 3D GIS model for the city of Adelaide
[Kirkby et al 1997].
Notable work on city modeling has also been performed by the
MIT group of Seth Teller, focusing mostly on image capturing,
sensor calibration, and scene modeling using specially
developed equipment like the Argus camera and the roaming
platform of Rover [Coorg & Teller, 1999; Antone & Teller,
2000]. Argus is a high-resolution digital camera mounted on a
small mobile platform and wheeled around campus. It
incorporates specialized instrumentation to estimate the
geolocation of exposure station and camera orientation
parameters for each image acquired. Rover is a controlled
vehicle used to acquire geo-referenced video images of interiors
and exteriors.
At the center of our work is the development of an efficient
approach for the multiresolutional analysis of scene content, to
compare the content of an incoming image to a synthetic view
provided by the virtual model. What is needed is an efficient
technique to proceed from scenes and abstract object relations
(e.g. topology, orientation) to objects and their specific
properties (e.g. shape, pixel values).
Scale space theory has provided the field of image processing
with a formal framework to describe the decomposition of the
content of raster datasets as function of changes in a scale
parameter [Lindeberg, 1994]. The GIS community has also
produced substantial work on higher-level abstraction concepts
by modeling object relations like topology and direction (see
e.g. [Egenhofer & Franzosa, 1991; Goyal & Egenhofer, 2000]).
However, there still exists a gap between these two
communities that does not allow us to move from pixels to
abstract descriptions of image contents. Our work in this paper
attempts to bridge this gap by using GIS concepts to support
image matching.
We base our work on an extension of our recent efforts on
developing scale- and orientation-independent metrics for scene
similarity [Stefanidis et aL, 2002]. More specifically, we
presented metrics that quantify abstract measures of similarity
between two scenes making use of shape, topology, orientation,
and distance metrics. A combinatorial expression for all these
properties was introduced by [Blaser, 2000] to function within a
general query-by-sketch environment. The function combines
similarity metrics for individual object shapes and relations