ISPRS Commission III, Vol.34, Part 3A , Photogrammetric Computer Vision“, Graz, 2002
Once the three masks are determined for each LNF image of a facade,
the weight w” at pixel [i, 7] of LNF image 7 is computed by:
W"i, 3] — Mii, j] Moi, 3] Mc [5, 7], (5)
"nag. WD
wi, jl = Swbi (6)
2.2 Iterative Deblurring
The CTF image thus obtained may look blurred (Figure 1(e)) because
the LNF images may not be perfectly registered due to errors in cam-
era parameters. (Note that our algorithm do not require precise input
camera parameters; that is, any two versions of LNF images may not
align accurately to each other.) A deblurring process is used that re-
warps the source LNF images to align with the CTF image, similar
to that of (Szeliski, 1996):
[u, v, 1]T & P[w/, v^, 1], (7)
which warps pixel [u' , v'] to [u, v] using P. Our goal is to find a warp
P that best registers the two images. We use the following constraint
functions in our method:
Ecre — Y le(u,v)]*, (8)
[e(u, v»)? 2 W"[w , v'](Yere[u, v] ^ Yruelv', v]. (9)
Note that the overall weight mask W" is used, reflecting the de-
gree of confidence we have for each pixel of Yyp. The Levenberg-
Marquardt algorithm (Press et al., 1992) is employed to solve the
constrained minimization problem. It is an iterative process (starting
from the identity matrix); in each iteration, P is incremented by
Ap — -(H 4 AI) !g, (10)
where
g — Y e(u,v)[de(u,v)/dp], (11)
H =) [de(u,v)/0p][de(u,v)/Op]", (12)
U,V
in which p is a 8 x 1 vector representation of P (note only 8 param-
eters are needed to describe P), and A is a parameter reduced to 0 as
the procedure converges. After a new P is calculated, the LNF im-
ages are rewarped and the weighted-average algorithm (Section 2.1)
is rerun using the rewarped LNF images to compute for a new CTF
image. Figure 1(f) shows such a CTF image with deblurring.
The deblurring process is also executed in an recursive manner. Re-
call that the correlation mask Mc is dependent on an initial CTF im-
age. After deblurring, the new CTF image is used to compute a more
accurate Mc, which then again updates the CTF image and triggers
another round of deblurring. The convergence of the recursion is en-
sured by stopping when the difference between two successive CTF
images is sufficiently small.
2.3 Experiments
Experiments were carried out to test the consensus texture generation
algorithm against an image dataset acquired at Technology Square,
an office park of four buildings located on the MIT campus. About
4,000 images were captured using the movable platform (Section 1)
Figure 2: CTF textured model.
at 81 nodes in this site. At each node, 47 images were acquired with
distinct rotations. LNF images were extracted for each facade.
Figure 1(f) shows the CTF result of the iterative weighted-average
algorithm on a facade, for which 28 LNF images were extracted from
the database and used to generate the CTF image. Most occlusions
caused by modeled/unmodeled objects were satisfactorily removed;
the luminance is also reasonably consistent across the entire CTF
image. Figure 2 shows a perspective view of the resulting textured
model of this site.
Our experiments also show that only about a dozen original facade
images, with quality shown in Figure 1(d), are needed for texture re-
covery with a satisfactory result. In addition, the iterative deblurring
is a very stable process; only a couple of iterations are necessary to
reach the image quality as shown in Figure 1(f), under the condition
of up to 5-pixel mis-alignment of LNF images due to input camera
pose error. Therefore, the halting of the iterations can be simplified
as to a certain number of iterations, instead of complex criteria.
3 MICROSTRUCTURE DETECTION
In the area of urban site reconstruction, a large body of research
has been focusing on methods of establishing geometric models for
large-scale structures, especially buildings, whose structural features
(corners, edges, etc.) typically possess sufficient image cues to sup-
port direct and reliable 3D reconstruction from the images (Firschein
and Strat, 1996; Mayer, 1999). In this paper, we emphasize the im-
portance of microstructures because they provide rich information of
the buildings and result in added realism for visualization.
Two pieces of evidence are used for microstructure extraction: the
relative 3D depth of the structures and their 2D appearances. The
relative depth of a surface microstructure is typically very small (see
Section 4). Thus directly extracting these structures from noisy 3D
depth data may be beyond the state-of-the-art of current computer
vision algorithms without a priori knowledge. In this section, we use
a 2D-based strategy to detect the locations of microstructures in the
CTF images generated in Section 2.
The CTF images provide a good texture representation of the facade,
free from effects of occlusions and local illumination variations if
enough views are provided. However, symbolic extraction of win-
dows is still difficult due to the existence of noise. One type of noise
is the global illumination variation on the facade. It happens in Fig-
ure 2 that the lower part of the walls is universally darker than their
upper part (sometimes even darker than upper windows). This is
because lower parts of buildings typically receive less sunlight than
upper parts in a densely urbanized area. The pixel-based weighted-
average algorithm is unable to remove global illumination variations,
because the lower part is darker on the majority of LNF images. A
A - 383