ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume ll-3/W1, 2013
VCM 2013 - The ISPRS Workshop on 3D Virtual City Modeling, 28 May 2013, Regina, Canada
TOWARDS 4D VIRTUAL CITY RECONSTRUCTION FROM LIDAR POINT CLOUD
SEQUENCES
Oszkár Józsa, Attila Bórcs and Csaba Benedek
Distributed Events Analysis Research Laboratory, Computer and Automation Research Institute
H-1111 Budapest, Kende u. 13-17, Hungary
E-mail: firstname.lastname@sztaki.mta.hu
KEY WORDS: Lidar, point cloud registration, scene analysis, reconstruction, moving object detection
ABSTRACT:
In this paper we propose a joint approach on virtual city reconstruction and dynamic scene analysis based on point cloud sequences of
a single car-mounted Rotating Multi-Beam (RMB) Lidar sensor. The aim of the addressed work is to create 4D spatio-temporal models
of large dynamic urban scenes containing various moving and static objects. Standalone RMB Lidar devices have been frequently
applied in robot navigation tasks and proved to be efficient in moving object detection and recognition. However, they have not been
widely exploited yet for geometric approximation of ground surfaces and building facades due to the sparseness and inhomogeneous
density of the individual point cloud scans. In our approach we propose an automatic registration method of the consecutive scans
without any additional sensor information such as IMU, and introduce a process for simultaneously extracting reconstructed surfaces,
motion information and objects from the registered dense point cloud completed with point time stamp information.
1 INTRODUCTION
Vision based understanding of large dynamic scenes and 3D vir-
tual city reconstruction have been two research fields obtaining
great interest in the recent years. Although these tasks have usu-
ally been separately handled, connecting the two modalities may
lead us to realistic 4D video flows about large-scale real world
scenarios, which can be viewed and analyzed from an arbitrary
viewpoint, can be virtually modified by user interaction, result-
ing in a significantly improved visual experience for the observer.
However, the proposed integration process faces several technical
and algorithmic challenges. On one hand, moving object detec-
tion, classification, tracking and event recognition from optical
videos or 2.5D range image sequences are still challenging prob-
lems, in particular if the measurements are provided by moving
sensors. Most existing approaches extract key features first, such
as characteristic points, edges, blob centroids, trajectories or his-
tograms, and the recognition process works in a feature space
with significantly reduced dimension (Lai and Fox, 2010) com-
pared to the original data. On the other hand, virtual 3D city
visualization needs dense registered information extracted from
the scene, enabling the realistic reconstruction of fine details of
building facades, street objects etc. SICK Lidar systems are able
to provide dense and accurate point clouds from the environment
with homogeneous scanning of the surfaces and a nearly linear
increase of points as a function of the distance (Behley et al.,
2012). However, since the measurement recording frequency is
typically less then 1Hz (often significantly less), these sensors are
not well suited to dynamic event analysis.
As an example for alternative solutions of Time-of-Flight (ToF)
technologies, (Kim et al., 2012) introduced a portable stereo sys-
tem for capturing and 3D reconstruction of dynamic outdoor sce-
nes. However in this case, the observed scenario should be sur-
rounded by several (8-9) calibrated cameras beforehand, which
fact does not allow quick data acquisition over large urban areas.
In addition, the reconstruction process is extremely computation-
intensive, dealing with a short 10sec sketch takes several hours,
and full automation is difficult due to usual stereo artifacts such
as featureless regions and occlusions.
In this paper, we jointly focus on understanding and reconstruc-
tion of dense dynamic urban scenes using a single Rotating Multi-
Beam (RMB) Lidar sensor (Velodyne HDL-64E), which is moun-
ted on the top of a moving car. Velodyne's RMB Lidar system
is able to provide a stream of full 360^ point cloud scans with
a frame-rate of 20 Hz, yielding that we can capture the scene
from view points at about every 30-60 centimeters of distance as
the car travels with typical urban traffic speed. Due to its scan-
ning frequency, this configuration is highly appropriate for ana-
lyzing moving objects in the scene. However, a single scan is
quite sparse, consisting of around 65K points with a radius of
120 meters, moreover we can also observe a significant drop in
the sampling density at larger distances from the sensor and we
also can see a ring pattern with points in the same ring much
closer to each other than points of different rings (Benedek et al.,
2012).
A number of automatic point cloud analysis methods have been
proposed in the literature for RMB Lidar streams. These ap-
proaches mainly focus on research towards real time point cloud
classification for robot navigation and quick intervention rather
than complex situation interpretation, and scene visualization,
which are addressed in our current work. (Douillard et al., 2011)
presents a set of clustering methods for various types of 3D point
clouds, including dense 3D data (e.g. Riegl scans) and sparse
point sets (e.g. Velodyne scans), where the main goal is to ap-
proach close to real-time performance. The object recognition
problem from a segmented point cloud sequence is often addressed
with machine learning techniques relying on training samples.
A boosting framework has been introduced in (Teichman et al.,
2011) for the classification of arbitrary object tracks obtained
from the Lidar streams. This step needs accurately separated ob-
stacles or obstacle groups as input, but it deals neither with the
context of the objects nor with large surface elements such as
wall segments. In (Xiong et al., 2011) the authors model the con-
textual relationships among the 3D points, and train this proce-
dure to use point cloud statistics and learn relational information,
e.g. tree-trunks are below vegetation, over fine and coarse scales.
This point cloud segmentation method shows its advantage on
the classes that can provide enough training samples, however
domain adaption remains a difficult challenge. (Quadros et al.,
2012) presented a feature called the line image to support object
classification that outperforms the widely used NARF descrip-