Figure 1 shows a brief overview of our UltraMap v3 processing
pipeline. The RawDataCenter is responsible for processing the
UltraCam imagery into a so-called Level-2 data format. This
data contains the digital negative of the camera (radiometrically
and geometrically calibrated). The Aerial Triangulation (AT)
module is responsible for calculating image correspondences in
order to generate a precise exterior orientation for a whole
image block. The Radiometry module is used to remove any
physically-based colour artefacts as well as to adjust the desired
final colour tone.
The DSM Generation module takes the Level-2 images
including the precise exterior orientation information and
generates per-pixel height values. The final Ortho Generation
module takes all available inputs (i.e. Level-2 imagery, AT
result, radiometric settings, and the DSM/DTM) in order to
generate the final ortho mosaic.
The paper is organized as follows: after a brief related work
section about semi-global matching and ortho mosaicking, the
technical part of the UltraMap v3 system including dense
matching, ortho mosaicking, user interaction, and distributed
processing is explained. Before showing some results, we also
outline our processing environment including some words about
the interactive visualization.
2. RELATED WORK
The first part of the UltraMap v3 is the generation of a digital
surface model. Semi-global matching is a known technique in
the photogrammetry community. In 2011, Heiko Hirschmueller
(Hirschmueller, 2011) presented a good overview about the
semi-global matching strategy including different applications.
His approach can be seen as the current state-of-the-art
technique for processing aerial imagery. Another comparable
approach in the computer vision community can be found in
Klaus et al. (Klaus, Sormann, & Karner, 2006). This method
was leading the Middleburry stereo evaluation ranking for a
long period of time (http://vision.middlebury.edu/stereo/eval/).
Related research in the field of ortho image mosaic generation
can be found in the area of visual analysis, which has been well
studied in computer graphics, computer vision and
photogrammetry. Amhar et al. (Amhar & Ecker, 1996)
proposed a methodology, which is based on photogrammetric
principles to create DSMOrtho images from digital terrain
models. Korytnik et al. (Korytnik, Kuzmin, & Long, 2004)
proposed a polygon-based approach for the detection of
occluded areas during the DSMOrtho image generation. In
contrast to the method of Korytnik et al., most of other existing
DSMOrtho image generation approaches are based on the Z-
buffer algorithm, e.g. Chen et al. (Chen, Rau, & Chen, 2002)
and Zhou (Zhou, 2004). Another closely related research is the
well-studied problem of image stitching and compositing by
Uyttendaele et.al. (Uyttendacle, Eden, & Szeliski, 2006),
(Uyttendaele, Szeliski, & Steedly, 2011). Uyttendaele et al.
propose a graph cut based approach for finding seams between
overlapping areas and furthermore apply Poisson blending for
compositing the final image.
3. FULLY AUTOMATED ORTHO PIPELINE
3.1 Dense matching and fusion
Dense matching is the process of finding corresponding pixels
in a pair of images in order to do a 3D reconstruction. As a
prerequisite, the exterior orientation and the intrinsic calibration
of the camera must be known. In order to establish
correspondences, image-based correlation methods are used
(e.g. normalized cross correlation). The output of the stereo
dense matching approach is a range image which stores the
calculated disparity values of a single image pair.
The next step is to perform a range image fusion which takes all
generated range images and calculates on the one hand side a
3D representation (i.e. a point cloud), and finally a 2.5D height
field known as the digital surface model. The range image
fusion can be formulated as a global optimization step
minimizing an objective function.
DTM filtering
The generated DSM can further be post-processed by applying a
constrained filter operation. A gradient-based approach allows
us to filter out buildings while preserving hills. The generated
DTM is then used to generate a DTMOrtho in a fully automated
way.
3.2 Ortho rectification
The first step in the ortho pipeline is called ortho rectification
which re-projects the input images on a defined proxy
geometry. Therefore, we introduce a virtual camera which is
defined as a three dimensional plane emitting parallel rays to the
ground (compare Figure 2). Those rays are intersected with the
scene and therefore generate a 2.5D surface. In Figure 2, the
upper half depicts two input images and the ortho projection
whereas the 2.5D height field profile or surface is illustrated at
the bottom. The process of generating an image from a new
viewpoint is also known as image-based rendering. Due to the
fact that one input image can only cover a certain area of the
ortho projection, some regions are occluded (i.e. tall buildings).
These regions are then filled by using neighbouring image
information.
Input image Inpet Image
\ Orthogonal Projection
M
y
ie À—ÀÀ
ARIES
Figure 2 Concept of an ortho projection with two input images.
3.3 Seamline generation
After the ortho rectification process, the next step is to find
seamlines between projected ortho patches. This step is also
known as contribution mask generation, since the contribution
mask is the dual structure to the seamlines (see Figure 3). Seams
correspond to transitions from one input image to another one.
This process can be defined as an objective function, where the
minimization can be reformulated as a function of the sum of
unary and binary costs. This function incorporates the viewing
angle of the input image including the colour differences. The
optimization for finding the best path is done by applying a
graph-cut (Kolmogorov & Zabih, 2004) algorithm.
142
I
5 € v 0 mM 04 — S C5 TON 55 M r^
(S, toi M Ld