boundaries searching for structures that satisfy regularity and
symmetry rules. In addition to that, they extract three-
dimensional models of windows by searching for image
features. Teboul et al. (2010) used shape grammars towards
fixed tree representations which are able to capture a wide
variety of building topologies for detailed facade segmentation.
They obtained very high performance even for buildings which
are partially occluded or which appear under different
illumination conditions. Ripperda (2008) reversible jump
Markov chain Monte Carlo (rjMCMC) for the estimation of
optimal parameters for the windows and uses a formal grammar
to describe their behaviour. Mayer & Reznik (2006) propose
combination of Markov Chain Monte Carlo with information
from Implicit Shape Models and with Plane Sweeping as well.
Tanks to this they achieve a 3D interpretation of building
facades determining windows and their 3D extent. In Mayer &
Reznik (2008) the method is extended with self-diagnosis and
model selection to choose the most appropriate model for the
configuration of windows in terms of rows or columns.
Most of these algorithms are computationally expensive and not
suitable for real time applications. Sirmacek (201 1a) proposed a
segmentation and graph theory based facade classification
method with emphasis on real-time requirements. However, this
method requires very uniform and also correctly ortho-rectified
color images as input.
In Europe there has been a joint research effort on fagade
classification called eTRIMS (Foerstner et al., 2009). A special
role plays the syntactic formulation of Gestalt laws. Inside
eTRIMS such approach has been formulated in (Tylecek &Sara,
2011) using stochastic grammars for the description, and
random sampling as search method.
A similar formulation uses production systems as declarative
knowledge representation and special interpreters for search.
This has the advantage of clear modularity and explicit
declarative inclusion or exclusion of particular constraints or
recursive principles. Thus comparison of their benefits or cons
is facilitated. E.g. Matsuyama & Hwang (1990) have proposed
the SIGMA system for automatic understanding of aerial images
of man-made objects. This system featured declarative
knowledge coding using production rules and a special
interpretation scheme quite similar to the one used here.
Unfortunately this work has not been continued. Another such
approach was called BPI (Stilla & Michaelsen, 1997) and this is
being continued as the GESTALT system (Michaelsen et al.,
2010).
In Sirmacek (201 1b) the usage of L-shaped feature primitives is
proposed for window and door detection from thermal facade
images. Iwaszczuk et al. (2011) suggest using local dynamic
threshold and masked correlation for corner detection and
orders detected window candidates into row and columns. In
this paper we would like to merge the idea of detecting
primitives (corners and L-shapes) with gestalt rules to find
windows from thermal images robustly.
This contribution is organized as follows: Section 2 presents
production systems in general and two special systems are
presented coding the likely organization of windows on facades.
Section 3 comparatively studies the behaviour of these systems
on example data obtained in the city of Munich. The work
closes with a discussion on the results and an outlook for future
work in Section 4.
2. PRODUCTION SYSTEMS
Structural knowledge e.g. about the part-of hierarchies of man-
made objects, about geometric properties of their mutual
arrangements, and about their appearance can be coded in a
declarative way using systems of production rules.
2.1 Extraction of Primitives
Prerequisite to all syntactic work on images is segmentation for
primitive objects. Here a corner detector based on a masked
correlation which consists of “on” and “off” fields and of “don’t
care” areas is applied. Masked correlation was originally
applied by (Stilla, 1993) to recognise stamped characters. The
advantage of this method is, that can cope with blurred edges.
We adapt the idea of masked correlation to search for the
changes in the intensity between facade and window. We place
a “don’t care” area between “on” and “off” fields, which helps
to avoid blurring on the edges. The correlation coefficient c is
calculated using
1
2 2
m ag. m g
Se a TT stg] ti
m9 I m NT. 9.
a-sgn(p.-p):se(3.-83)
Com:
where p.. — value of “on” mask, p. — value of “off” mask, g. -
mean value of intensity values in the image covered by “on”
mask, g_— mean value of intensity values in the image covered
by “off” mask, m; — number of “on” pixels in the mask, m—
number of “off” pixels in the mask, m — number of “on” and
“off” pixels in the mask, 0, — standard deviation of intensity
values covered by “on” mask, co. — standard deviation of
intensity values covered by “off” mask.
Four corner types are assumed: upper left, upper right, lower
right and lower left. Each type is correlated with the whole
image and pixels which result in a correlation coefficient higher
than our detection threshold are selected. The selected pixels are
coded with the orientation attribute of primitive instances
(“upper left”, “upper right”, “lower right” and “lower left”) and
with its correlation coefficient c. In Fig. 2 exemplary corner
detection is presented. For the red, green, blue and yellow
pixels the correlation coefficient was higher than the detection
threshold. Colours encode the orientation attribute. For a typical
facade image some 20,000 such pixels remain from texture of
e.g. in this case 1024 x 524.
Figure 2. Corners extracted in a thermal façade textures: red -
“upper left”, green - “upper right”, blue - “lower right” and
yellow - “lower left”
It is obvious that for each corner perceived by a human observer
several such pixels cluster together. Following Marr’s principle
of avoidance of early decisions these objects would be entered
into the production system as primitives — using an according
clustering production as first knowledge source. However, this
overloac
followin
perform
compute
are disp
Figure
From tl
primitiv
process
22 Tv
The p
product
groupin
1) “can
A faga
horizon
of same
structur
two L-]
A caref
that on
Fig. 2
system.
Accord
instanc
interpr.
2) Exp
other c
standar
axes ol
are inte
be quit
is der
instanti
be ins
produc
instanc
namely
used he
Experi
3) The
rows fi
higher
orienta
genera
estima!
of U-s
and af
height
differe
follow