CMRT09: Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms, and Evaluation
2002 STEREO product imagery is done by image co
registration in ENVI. The 2005 image is resampled according a
first-order polynomial transformation to geometrically align the
multi-temporal imagery. A first-order polynomial
transformation corrects for rotation, translation, scaling and
shearing. As the orientation of the 2005 image has changed after
registration, it was necessary to calculate a posteriori RPCs for
the resampled image, which is not a straightforward task. Ad
hoc RPC generation was done in collaboration with a team of
Prof. Dr. Crespi from the Area di Geodesia e Geomatica, La
Sapienza University of Rome. An algorithm, developed and
embedded in the software package SISAR (Software per
Immagini Satellitari ad Alta Risoluzione), makes it possible to
generate RPCs starting from physical sensor models, image
metadata, transformation parameters and a set of 15 to 20
ground control points with known map coordinates (Bianconi,
2008 and Crespi, 2009). Image coordinates for the GCPs were
collected on the original and resampled 2005 Ikonos image.
Based on this method, RPCs could be generated with an
accuracy of 3.8 pixels in line direction and 5.1 pixels in sample
direction.
4.3 Bundle adjustment for image orientation
During the bundle adjustment process, the rotation along the
three axes and position of the sensor during image capturing is
calculated for all images simultaneously according a least-
squares matching. At the same time the relationship between
image and object space is described. To calculate the best fit for
all images, initial values for internal and external orientation are
needed though. As no information on the physical camera
model of Ikonos is released, rational polynomial coefficients,
provided by the image vendor, are used to calculate initial
values for internal and external image orientation. The rational
polynomial function model uses a general polynomial
transformation to describe the mathematical relationship
between object and image space, instead of a physical sensor
model. The rational function model is the ratio of two
polynomials and is derived from the physical sensor model and
on-board sensor orientation (Grodecki & Dial, 2003).
As RPCs are calculated from on-board sensor orientation data,
satellite ephemeris and star tracker observations, the accuracy of
image orientation can be refined by using ground control points.
During a field trip to Istanbul the necessary GCPs for
photogrammetric processing of the DSM’s were collected in
close collaboration with the Istanbul Metropolitan Planning
Centre (IMP-Bimtas). Because accurate large-scale ortho
images were available for the study area and because of the
difficulties of GPS measurements in the narrow streets of the
densely built-up area, an approach was chosen to derive the
GCP from ortho-images supplemented with 1:5000 scale
topographic maps. 37 clearly visible GCPs were derived,
homogeneous distributed over the study area. In total, 17 points
with known map coordinates and clearly identifiable in all three
images were used to describe the relationship between the
imagery and terrain. The a priori geometric accuracy for the
DSM extraction consists of an overall RMSE value of 0.79 m
for X residuals, 0.78 m for Y residuals and 2.36 m for Z
residuals.
4.4 Epipolar geometry
Before extracting the surface model, the original images are
resampled to an epipolar orientation. Y-parallax is removed,
while leaving the parallax in X-direction unresolved, which can
be interpreted as height differences. This reduces the process of
finding conjugate points in overlapping images from a two-
dimensional to a one-dimensional search algorithm along
epipolar lines.
4.5 Multi-image matching
During the image matching process conjugate features need to
be found automatically between the overlapping images. The
surface model can be processed afterwards by calculation of
height differences based on the measurement of the disparity
between corresponding pixels. The applied algorithm works
according a coarse-to-fine hierarchical matching strategy. Image
pyramids consist of different versions of an image at
exponentially decreasing resolutions. The bottom level of the
pyramid contains the original image. The matching results of
each higher pyramid level are used as approximations in the
successive, lower level. At each level also an intermediate
DSM is generated from the matched features and is refined
through the image pyramid. Based on all data in each pyramid
level, the matching parameters are fine-tuned progressively.
The matching algorithm is a combination of feature point, grid
point and 3D edge matching. This redundancy leads to better
constraints and more reliable results. Grid point matching is
especially valuable in areas with less texture where conjugate
feature points are hard to detect. For each grid point to be
matched in the first image, the matching algorithm searches for
the conjugate pixel in the other images that correlates the most
by shifting a kernel of certain size along the epipolar line. A
correlation constraint is used to identity possible matching
candidates. The geometrically constrained cross-correlation or
GC 3 method is an extension of the standard cross-correlation
technique (Zhang & Gruen, 2006). In case of more than one
matching candidate, the information of multiple images, i.e.
more than two, can provide geometric constraints which assist
to identify a unique matching solution.
3D edge matching is extremely valuable when dealing with
urban areas, as they assist in modelling surface discontinuities.
Edges are detected by the Canny operator (Canny, 1986).
During surface model generation the matched edges will be
taken into account as break lines to avoid smoothing effects. In
Figure 5, illustrating matched edges in an urban area on Ikonos
imagery can be seen that the main shape of most of the
buildings is estimated quite well by detected edges. An
important source of errors in edge detection is caused by
building shadows. As shadow areas are being into large contrast
with the surrounding pixels, edges will be detected at the
shadow borders.
92