5
EVALUATION OF CAMERA CALIBRATION APPROACHES FOR VIDEO IMAGE
DETECTION SYSTEMS
S. Bauer, A. Luber, R. Reulke
German Aerospace Center, Institute for Transportation Systems, Rutherfordstr. 2, 12489 Berlin, Germany
(sascha.bauer, andreas.luber, ralf.reulke)@dlr.de
Commission I, WG 1/1
KEY WORDS: Calibration, Multisensor, Fusion, Orientation, Camera, Georeferencing
ABSTRACT:
In modem traffic management Video Image Detection Systems (VIDS) are becoming increasingly important as traffic sensors. They
are getting more affordable and don’t require any road construction like commonly used induction loops. Furthermore, due to the
fact that they are able to monitor a wide area, they potentially offer the derivation of a whole new set of traffic parameters. Good
examples are the derivation of source-destination relations, queue-length, travel-times or general event detection like untypical
movements, accidents, blockages and congestions. Additionally, by using more than one camera the surveillance area can be
enlarged or the detection accuracy can be increased due to redundancy of observations. However, in order to take advantage of a
multiple camera system, the observations from different cameras have to be fused. In the setup that will be presented a geometric
fusion is proposed by projecting the observations into a combined geo-referenced coordinate frame. The basic requirement for this
transformation is the knowledge of the interior and exterior orientation of every camera. Three different approaches for determining
the exterior orientation have been implemented, namely a Newton method, a least squares adjustment based on ground control points
and a method based on line features. Furthermore, direct linear transformation and minimum space resection are applied to calculate
initial estimates. These algorithms are subject to an in depth evaluation in respect to their application as a traffic monitoring sensor.
1. INTRODUCTION
1.1 Motivation
The task of modem traffic management is to utilize the limited
resources in transportation infrastructure as efficient as possible.
In order to meet the requirements of this challenging task, a
precise and up-to-date knowledge of the traffic situation is
needed. Nowadays, a wide variety of traffic sensors that are
commercially available can be applied as a source of such
information (Klein et al., 1997). Induction loops and microwave
radar systems are the most commonly used detectors. They
typically provide traffic parameters like presence, speed and
length of a vehicle as well as time gap between vehicles. Since
these parameters offer only a rudimentary description of the
traffic situation a great deal of research has concentrated on
new detectors capable of providing more complex traffic
parameters.
Video image detection systems (VIDS) constitute an important
group of traffic detectors (Michalopoulos, 1991; Wigan, 1992;
Kastinaki et al., 2003). In contrast to the typical focus on a
single location they are able to monitor a wide area, and hence,
they potentially provide a whole new range of traffic
parameters (Datta et al., 2000; Harlow and Wang, 2001;
Setchell and Dagless, 2001; Yung and Lai, 2001). Such
parameters can be source-destination relations, queue length,
travel times or general event detection like untypical
movements, accidents, blockages and congestions. Furthermore,
a combination of several cameras is often beneficial to enlarge
the surveillance area or to increase the detection accuracy due
to redundancy of observations. However, in order to take
advantage of a multiple camera system, the observations from
different cameras have to be fused in a way that allows their
subsequent comparison.
Information can be fused on different levels. Combining
observations on a geometric level is a common approach for
object detection by multi camera systems. This is done by
projecting the observations into a combined coordinate frame.
In general, a geo-referenced frame is preferred (Ernst et al.,
2005). The knowledge of the interior and exterior orientation of
every camera is an essential requirement for this transformation.
The interior orientation can be deduced before camera
installation using a well know test field. Different strategies can
be applied to compute the unknown parameters of the exterior
orientation of the final setup. The following approaches have
been implemented:
1. Direct Linear Transformation (DLT)
2. Space Resection
for determining initial estimations and
3. Newton Method
4. Adjustment using Gauss-Markov and point features
5. Adjustment using Gauss-Markov and line features
to calculate the position and orientation of the cameras. These
algorithms differ in their complexity, ease of use and their
expected accuracy. The objective of this paper is to give an in-
depth comparison in respect to these properties. Especially the
relationship between needed scene information and achieved
accuracy is highlighted with regard to the requirements of
modem traffic surveillance.