ISPRS Workshop on 3D Virtual City Modeling (VCM 2013)

  
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume ll-3/W1, 2013 
VCM 2013 - The ISPRS Workshop on 3D Virtual City Modeling, 28 May 2013, Regina, Canada 
   
TOWARDS 4D VIRTUAL CITY RECONSTRUCTION FROM LIDAR POINT CLOUD 
SEQUENCES 
Oszkár Józsa, Attila Bórcs and Csaba Benedek 
Distributed Events Analysis Research Laboratory, Computer and Automation Research Institute 
H-1111 Budapest, Kende u. 13-17, Hungary 
E-mail: firstname.lastname@sztaki.mta.hu 
KEY WORDS: Lidar, point cloud registration, scene analysis, reconstruction, moving object detection 
ABSTRACT: 
In this paper we propose a joint approach on virtual city reconstruction and dynamic scene analysis based on point cloud sequences of 
a single car-mounted Rotating Multi-Beam (RMB) Lidar sensor. The aim of the addressed work is to create 4D spatio-temporal models 
of large dynamic urban scenes containing various moving and static objects. Standalone RMB Lidar devices have been frequently 
applied in robot navigation tasks and proved to be efficient in moving object detection and recognition. However, they have not been 
widely exploited yet for geometric approximation of ground surfaces and building facades due to the sparseness and inhomogeneous 
density of the individual point cloud scans. In our approach we propose an automatic registration method of the consecutive scans 
without any additional sensor information such as IMU, and introduce a process for simultaneously extracting reconstructed surfaces, 
motion information and objects from the registered dense point cloud completed with point time stamp information. 
1 INTRODUCTION 
Vision based understanding of large dynamic scenes and 3D vir- 
tual city reconstruction have been two research fields obtaining 
great interest in the recent years. Although these tasks have usu- 
ally been separately handled, connecting the two modalities may 
lead us to realistic 4D video flows about large-scale real world 
scenarios, which can be viewed and analyzed from an arbitrary 
viewpoint, can be virtually modified by user interaction, result- 
ing in a significantly improved visual experience for the observer. 
However, the proposed integration process faces several technical 
and algorithmic challenges. On one hand, moving object detec- 
tion, classification, tracking and event recognition from optical 
videos or 2.5D range image sequences are still challenging prob- 
lems, in particular if the measurements are provided by moving 
sensors. Most existing approaches extract key features first, such 
as characteristic points, edges, blob centroids, trajectories or his- 
tograms, and the recognition process works in a feature space 
with significantly reduced dimension (Lai and Fox, 2010) com- 
pared to the original data. On the other hand, virtual 3D city 
visualization needs dense registered information extracted from 
the scene, enabling the realistic reconstruction of fine details of 
building facades, street objects etc. SICK Lidar systems are able 
to provide dense and accurate point clouds from the environment 
with homogeneous scanning of the surfaces and a nearly linear 
increase of points as a function of the distance (Behley et al., 
2012). However, since the measurement recording frequency is 
typically less then 1Hz (often significantly less), these sensors are 
not well suited to dynamic event analysis. 
As an example for alternative solutions of Time-of-Flight (ToF) 
technologies, (Kim et al., 2012) introduced a portable stereo sys- 
tem for capturing and 3D reconstruction of dynamic outdoor sce- 
nes. However in this case, the observed scenario should be sur- 
rounded by several (8-9) calibrated cameras beforehand, which 
fact does not allow quick data acquisition over large urban areas. 
In addition, the reconstruction process is extremely computation- 
intensive, dealing with a short 10sec sketch takes several hours, 
and full automation is difficult due to usual stereo artifacts such 
as featureless regions and occlusions. 
In this paper, we jointly focus on understanding and reconstruc- 
  
tion of dense dynamic urban scenes using a single Rotating Multi- 
Beam (RMB) Lidar sensor (Velodyne HDL-64E), which is moun- 
ted on the top of a moving car. Velodyne's RMB Lidar system 
is able to provide a stream of full 360^ point cloud scans with 
a frame-rate of 20 Hz, yielding that we can capture the scene 
from view points at about every 30-60 centimeters of distance as 
the car travels with typical urban traffic speed. Due to its scan- 
ning frequency, this configuration is highly appropriate for ana- 
lyzing moving objects in the scene. However, a single scan is 
quite sparse, consisting of around 65K points with a radius of 
120 meters, moreover we can also observe a significant drop in 
the sampling density at larger distances from the sensor and we 
also can see a ring pattern with points in the same ring much 
closer to each other than points of different rings (Benedek et al., 
2012). 
A number of automatic point cloud analysis methods have been 
proposed in the literature for RMB Lidar streams. These ap- 
proaches mainly focus on research towards real time point cloud 
classification for robot navigation and quick intervention rather 
than complex situation interpretation, and scene visualization, 
which are addressed in our current work. (Douillard et al., 2011) 
presents a set of clustering methods for various types of 3D point 
clouds, including dense 3D data (e.g. Riegl scans) and sparse 
point sets (e.g. Velodyne scans), where the main goal is to ap- 
proach close to real-time performance. The object recognition 
problem from a segmented point cloud sequence is often addressed 
with machine learning techniques relying on training samples. 
A boosting framework has been introduced in (Teichman et al., 
2011) for the classification of arbitrary object tracks obtained 
from the Lidar streams. This step needs accurately separated ob- 
stacles or obstacle groups as input, but it deals neither with the 
context of the objects nor with large surface elements such as 
wall segments. In (Xiong et al., 2011) the authors model the con- 
textual relationships among the 3D points, and train this proce- 
dure to use point cloud statistics and learn relational information, 
e.g. tree-trunks are below vegetation, over fine and coarse scales. 
This point cloud segmentation method shows its advantage on 
the classes that can provide enough training samples, however 
domain adaption remains a difficult challenge. (Quadros et al., 
2012) presented a feature called the line image to support object 
classification that outperforms the widely used NARF descrip-
1
2
...
20
21
22
23
24
...
56
57
Full text: ISPRS Workshop on 3D Virtual City Modeling (VCM 2013)

Access restriction

Copyright

Note to user