Close-range imaging, long-range vision

  
  
  
MI acquisition with GPS-enabled sensors 
Y 
Approximate camera orientation (azimuth 
detection) by comparing a video frame to the 
corresponding synthetic panorama image. 
v 
Absolute orientation of the frame and filing of the 
orientation with time information for further 
processing 
v 
Change Detection 
v 
VR Model Update 
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
Figure 1: Flowchart for VR model updates using motion 
imagery (MI). 
In this paper we present an innovative approach for the fast 
recovery of azimuth information of MI datasets in large urban 
scenes, making use of a pre-existing VR model (the second box 
of the process outlined in Fig. 1, marked by bold lettering). The 
paper is organized as follows. In section 2 we present an 
overview of the current state-of-the-art in VR modeling of 
large-scale urban scenes. In Section 3 we address the 
hierarchical analysis of scene content, and present metrics that 
we have developed to compare object configurations 
considering distances and relative orientations between objects. 
We present experimental results in Section 4 and conclude with 
comments and future work plans in Section 5. 
2. LARGE-SCALE VR MODELS OF URBAN SCENES 
Arguably the premier effort in this direction is the collaborative 
effort of the groups of Bill Jepson and Richard Muntz at UCLA 
for the development of Virtual LA, a large scale virtual model of 
the city of Los Angeles (see e.g. [Jepson et al., 1996] and the 
web site www.aud.ucla.edu/-bill/UST.html). The photorealistic 
3D model of LA was created using aerial and street-level 
imagery, and is used to support a variety of cross-disciplinary 
simulations (e.g. evaluating urban planning, and rehearsing 
emergency response actions). From a research point of view the 
major strength of this effort lies in the development of a system 
to support interactive navigation over the entire model by 
integrating many smaller models (over a dozen models) into a 
large virtual environment. The UCLA team is currently working 
towards extending this model to cover the complete LA basin, 
an area in excess of 4,000 square miles. Further plans are to 
extend coverage beyond LA to San Diego and Las Vegas. 
Other notable efforts focus on image analysis issues to create 
3D urban scene models. They include the work of [Brenner, 
2000] on the automatic 3D reconstruction of complex urban 
scenes using height data from airborne laser scanning and the 
groundplans of buildings as they are provided by existing 2D 
GIS or map data. Height data are used to create a digital 
elevation model (DEM) of the city, and a photorealistic virtual 
city model is generated by projecting onto this DEM aerial or 
terrestrial images. This approach has been used to create a 
virtual model of the city of Stuttgart (Germany), covering more 
than 5000 buildings in an area of 2km x 3km [Haala & Brenner, 
1999]. Before the work of the Stuttgart group, the group of 
Gruen at ETH (Zurich) had worked on the integration of terrain 
imagery and aerial-sensor-derived 3D city models [Gruen et al., 
1996; Gruen & Wang, 1999]. 
Similar approaches have been followed in the UK to develop 
virtual models of the city of Bath, covering several square 
kilometers of the historic center of the city at sub-meter 
resolution [Day et al., 1996], in Austria to establish models of 
the cities of Graz and Vienna [Ranzinger & Gleixner 1997], and 
in Australia to develop a 3D GIS model for the city of Adelaide 
[Kirkby et al 1997]. 
Notable work on city modeling has also been performed by the 
MIT group of Seth Teller, focusing mostly on image capturing, 
sensor calibration, and scene modeling using specially 
developed equipment like the Argus camera and the roaming 
platform of Rover [Coorg & Teller, 1999; Antone & Teller, 
2000]. Argus is a high-resolution digital camera mounted on a 
small mobile platform and wheeled around campus. It 
incorporates specialized instrumentation to estimate the 
geolocation of exposure station and camera orientation 
parameters for each image acquired. Rover is a controlled 
vehicle used to acquire geo-referenced video images of interiors 
and exteriors. 
3. HIERARCHICAL SCENE ANALYSIS 
At the center of our work is the development of an efficient 
approach for the multiresolutional analysis of scene content, to 
compare the content of an incoming image to a synthetic view 
provided by the virtual model. What is needed is an efficient 
technique to proceed from scenes and abstract object relations 
(e.g. topology, orientation) to objects and their specific 
properties (e.g. shape, pixel values). 
Scale space theory has provided the field of image processing 
with a formal framework to describe the decomposition of the 
content of raster datasets as function of changes in a scale 
parameter [Lindeberg, 1994]. The GIS community has also 
produced substantial work on higher-level abstraction concepts 
by modeling object relations like topology and direction (see 
e.g. [Egenhofer & Franzosa, 1991; Goyal & Egenhofer, 2000]). 
However, there still exists a gap between these two 
communities that does not allow us to move from pixels to 
abstract descriptions of image contents. Our work in this paper 
attempts to bridge this gap by using GIS concepts to support 
image matching. 
We base our work on an extension of our recent efforts on 
developing scale- and orientation-independent metrics for scene 
similarity [Stefanidis et aL, 2002]. More specifically, we 
presented metrics that quantify abstract measures of similarity 
between two scenes making use of shape, topology, orientation, 
and distance metrics. A combinatorial expression for all these 
properties was introduced by [Blaser, 2000] to function within a 
general query-by-sketch environment. The function combines 
similarity metrics for individual object shapes and relations 
—162—
1
2
...
175
176
177
178
179
...
640
641
Full text: Close-range imaging, long-range vision

Access restriction

Copyright

Note to user