ically is represented as intensity image. This intensity image
might provide a higher level of distinctiveness than shape fea-
tures (Seo et al., 2005) and thus information about the local en-
vironment which is not represented in the range measurements.
Hence, the registration process can efficiently be supported by
using reliable feature correspondences between the respective in-
tensity images. Although different kinds of features can be used
for this purpose, most of the current approaches are based on the
use of feature points or keypoints as these tend to yield the most
robust results for registration without assuming the presence of
regular surfaces in the scene. Distinctive feature points simplify
the detection of point correspondences and for this reason, SIFT
features are commonly used. These features are extracted from
the co-registered camera images (Al-Manasir and Fraser, 2006;
Barnea and Filin, 2007) or from the reflectance images (Wang and
Brenner, 2008; Kang et al., 2009). For all point correspondences,
the respective 2D feature points are projected into 3D space us-
ing the spatial information. This yields a much smaller set of 3D
points for the registration process and thus a much faster estima-
tion of the transformation parameters between two point clouds.
Furthermore, additional constraints considering the reliability of
the point correspondences (Weinmann et al., 2011; Weinmann
and Jutzi, 2011) allow for increasing the accuracy of the registra-
tion results.
Once 2D/2D correspondences are detected between images of
different scans, the respective 3D/3D correspondences can be de-
rived. Thus knowledge about the closest neighbor is available
and the computationally expensive ICP algorithm can be replaced
by a least squares adjustment. Least squares methods involv-
ing all points of a scan have been used for 3D surface matching
(Gruen and Akca, 2005), but since a large overlap between the
point clouds is required which can not always be assumed, typi-
cally sparse 3D point clouds consisting of a very small subset of
points are derived from the original 3D point clouds (Al-Manasir
and Fraser, 2006; Kang et al., 2009). To further exclude unre-
liable 3D/3D correspondences, filtering schemes based on the
RANSAC algorithm (Fischler and Bolles, 1981) have been pro-
posed in order to estimate the rigid transformation aligning two
point clouds (Seo et al., 2005; Bóhm and Becker, 2007; Barnea
and Filin, 2007).
For dynamic environments, terrestrial laser scanners which per-
form a time-dependent spatial scanning of the scene are not suited.
Furthermore, due to the background illumination, monitoring out-
door environments remains challenging with devices based on
structured light such as the Microsoft Kinect device which uses
random dot patterns of projected infrared points for getting re-
liable and dense close-range measurements in real-time. Hence,
this paper is focused on airborne scene monitoring with range
imaging devices mounted on a sensor platform. Although the
captured point clouds are corrupted with noise and the field of
view is very limited, a fast, but still reliable approach for point
cloud registration is presented. The approach involves an ini-
tial camera calibration for increased accuracy of the respective
3D point clouds and the extraction of distinctive 2D features.
The detection of 2D/2D correspondences between two succes-
sive frames and the subsequent projection of the respective 2D
points into 3D space yields 3D/3D correspondences. Using such
sparse point clouds significantly increases the performance of the
registration process, but the influence of outliers has to be con-
sidered. Hence, a new weighting scheme derived from the re-
spective point quality is introduced for adapting the influence of
each 3D/3D correspondence on a weighted rigid transformation.
Additionally, an extension of this approach is presented which is
based on the already detected features and focuses on a decou-
pling of sensor and object motion.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
420
The remainder of this paper is organized as follows. In Section 2,
the proposed methodology for successive pairwise registration in
dynamic environments is described as well as a simple extension
for decoupling sensor and object motion. The configuration of
the sensor platform is outlined in Section 3. Subsequently, the
performance of the presented approach is tested in Section 4. The
derived results are discussed in Section 5. Finally, in Section 6
the content of the entire paper is concluded and suggestions or
future work are outlined.
2 METHODOLOGY
The proposed methodology provides fast algorithms which are
essential for time-critical surveillance applications and should be
capable for a real-time implementation on graphic processors,
After data acquisition (Section 2.1), a preprocessing has to be
carried out in order to get the respective 3D point cloud (Section
2.2). However, the point cloud is corrupted with noise and hence,
a quality measure is calculated for each point of the point cloud
(Section 2.3). Subsequently extracting distinctive features from
2D images allows for detecting reliable 2D/2D correspondences
between different frames (Section 2.4), and projecting the respec-
tive 2D points into 3D space yields 3D/3D correspondences of
which each 3D point is assigned a value for the respective point
quality (Section 2.5). The point cloud registration is then carried
out by estimating the rigid transformation between two sparse
point clouds where the weights of the 3D/3D correspondences are
derived from the point quality of the respective 3D points (Sec-
tion 2.6). Finally, a feature-based method for object detection and
segmentation is introduced (Section 2.7) which can be applied for
decoupling sensor and object motion.
2.1 Data Acquisition
In contrast to the classical stereo observation techniques with pas-
sive sensors, where data from at least two different viewpoints
has to be captured, the monostatic sensor configuration of the
PMD[vision] CamCube 2.0 preserves information without the
need of a co-registration of the captured data. A PMDy[vision]
CamCube 2.0 simultaneously captures various types of data, i.e.
geometric and radiometric information, by images with a single
shot. The images have a size of 204 x 204 pixels which corre-
sponds to a field of view of 40? x 40?. Thus, the device provides
measurements with an angular resolution of approximately 0.2°.
For each pixel, three features are measured, namely the respec-
tive range R, the active intensity I, and the passive intensity Ip.
The active intensity depends on the illumination emitted by the
sensor, whereas the passive intensity depends on the background
illumination arising from the sun or other external light sources.
As a single frame consisting of a range image Ir, an active in-
tensity image La and a passive intensity image I, can be updated
with high frame rates of more than 25 releases per second, this
device is well-suited for capturing dynamic scenes.
2.2 Preprocessing
In a first step, the intensity information of each frame, i.e. I, and
I,, has to be adapted. This is achieved by applying a histogram
normalization of the form
I-— Tin
Tas m T un
I, = . 255 ()
which adapts the intensity information I of each pixel to the in-
terval [0, 255]. The modified frames thus consist of a normalized
active intensity image I, ,, a normalized passive intensity image
I. and the range image Ir which are illustrated in Figure 1.
For a
tion à
decen
be ad
Henc:
This |
cipal
point
tion i
meas
and a
wher
ing fc
and t
point
tortic
grid |
Figui
norm
23
For
desc
from
whei
the o
ular
3x.
ing |
This
ence
the €
map
Figu
tatio
in b
24
Ase
ular