Full text: Commission IV (Part 4)

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012 
XXII ISPRS Congress, 25 August - 01 September 2012, Melbourne, Australia 
213 
A ROBUST PARALLEL FRAMEWORK FOR MASSIVE SPATIAL DATA PROCESSING 
ON HIGH PERFORMANCE CLUSTERS 
Xuefeng Guan a ’ * 
a State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 
129 Luoyu Road, Wuhan 430079, P. R. China-guanxuefeng@whu.edu.cn 
Commission IV, WG IV/5 
KEY WORDS: Data parallel processing, Split-and-Merge paradigm, Parallel framework, LiDAR 
ABSTRACT: 
Massive spatial data requires considerable computing power for real-time processing. With the help of the development of multicore 
technology and computer component cost reduction in recent years, high performance clusters become the only economically viable 
solution for this requirement. Massive spatial data processing demands heavy I/O operations however, and should be characterized 
as a data-intensive application. Data-intensive application parallelization strategies are ¡incompatible with currently available 
procssing frameworks, which are basically designed for traditional compute-intensive applications. In this paper we introduce a 
Split-and-Merge paradigm for spatial data processing and also propose a robust parallel framework in a cluster environment to 
support this paradigm. The Split-and-Merge paradigm efficiently exploits data parallelism for massive data processing. The 
proposed framework is based on the open-source TORQUE project and hosted on a multicore-enabled Linux cluster. One common 
LiDAR point cloud algorithm, Delaunay triangulation, was implemented on the proposed framework to evaluate its efficiency and 
scalability. Experimental results demonstrate that the system provides efficient performance speedup. 
1. INTRODUCTION 
1.1 Introduction 
Spatial datasets in many fields, such as laser scanning, continue 
to increase with the improvements of data acquisition 
technologies. The size of LiDAR point clouds has increased 
from gigabytes to terabytes, even to petabytes, requiring a 
significant number of computing resources to process them in a 
short time. This is definitely beyond the capability for a single 
desktop personal computer (PC). 
A practical solution to meet this resource requirement is to 
design parallel algorithms and run them on a distributed 
platform. Parallelism can be exploited by decomposing the 
domain into smaller subsets that can be executed concurrently. 
Multicore-enabled Central Processing Units (CPU) are 
becoming ubiquitous from the single desktop PC to clusters 
(Borkar and Chien, 2011); while the costs to build a powerful 
computing cluster are getting lower and lower. It is natural and 
necessary that spatial analysts employ high performance clusters 
(HPC) to efficiently process massive LiDAR point clouds. 
Nowadays data processing algorithms were designed without 
any consideration in concurrency. For applied scientists, 
adapting these serial programs into a distributed platform is 
challenging and error-prone. They usually do not have much 
knowledge and experience in parallelization for the distributed 
context. Furthermore, processing massive LiDAR point cloud is 
inherently different from classical compute-intensive 
applications. Such applications devote most of their processing 
time to Input/Ouput (I/O) and manipulation of input data. This 
type of application should be characterized as a data-intensive 
application, as opposed to traditional compute-intensive 
application. Thus, the manipulation of input data must be taken 
into consideration during decomposition, scheduling, and load- 
balance. 
Such a framework could be helpful and desirable, in which low- 
level thread/process operation routines are hided and high-level 
functions/classes are supplied in an application programming 
interface (API) library. This paper proposes a general parallel 
framework on a HPC platform to facilitate this transition from a 
single-core PC to a HPC context. This framework defines a 
Split-and-Merge programming paradigm for users/programmers. 
With the help of this paradigm, our framework can 
automatically parallelize and schedule user’s tasks. Finally, we 
evaluate this robust framework with one typical massive LiDAR 
point cloud processing example, Delaunay triangulation. 
Section 2 presents related work on the research on parallel data 
processing framework. Section 3 introduces the Split-and-Merge 
paradigm for the parallel framework. Section 4 respectively 
describes the detailed implementation of our parallel framework. 
Section 5 presents the results and discussion of the experiments. 
Section 6 closes the paper with our conclusions. 
2. RELATED WORK 
Parallel data processing has been an active research field for 
many years. Presently, a large body of work on parallel 
frameworks for data-intensive applications can be found in the 
literature. 
Hawick et al. (2003) have used the grid computing techniques to 
build an operational infrastructure for data processing and data 
mining. Dean and Ghcmawat (2008) proposed a programming 
* Corresponding author. Email: guanxuefeng@whu.edu.cn, Tel: +86 27 68778311, Fax: +86 27 68778969
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.