XXII ISPRS Congress 2012: Technical Commission IV

A 
- 
d- 
el 
1g 
el 
model, called MapReduce, for processing and generating large 
datasets. Users’ common processing algorithms are abstracted 
into two functions, Map and Reduce. Programs written in this 
model are automatically parallelized and executed by the 
runtime framework on a large cluster of commodity machines. 
Tehraniana et al. (2006) presented an architectural model and a 
system prototype for massive satellite data processing. Their 
prototype system shows that considerable performance 
improvement can be gained for the specific science algorithms 
in real-time. 
Jonker described a data and task parallel framework to smoothly 
program parallel applications for image processing (Jonker and 
Nicolescu, 2008). In their proposed framework they also 
introduced a special paradigm for parallel processing, called the 
distributed bucket processing (DBP) paradigm, which is able to 
dynamically distribute data sets over processors. 
Guan developed a general-purpose parallel raster processing 
programming library (pRPL) and applied it to speed up a 
commonly used cellular automaton model (Guan and Clarke, 
2010). 
Wang et al. (2011) presented a common parallel computing 
framework based on the message passing interface (MPI) 
protocol for modeling hydrological river-basin processes. This 
framework is computationally efficient, and independent of the 
type of physical models chosen. 
All the frameworks target distributed environments and aim to 
improve the resources usage efficiency. Google’s MapReduce, 
distributed bucket processing, and pRRL provide users with a 
higher-level abstract model and greatly reduce the 
implementation burden for programmers. However, Google’s 
MapReduce model is too simple and strict for spatial algorithms. 
For example, if some spatial algorithms are expressed in the 
MapReduce model, the Reduce steps must be executed 
hierarchically, thus conflicting with the original linear and out- 
of-order execution pattern. As for distributed bucket processing 
and pRRL, these two frameworks are specifically designed for 
raster image processing, and does not support vector data 
processing. Thus, a general parallel framwork should own the 
efficiency of distributed bucket processing and pRRL, and also 
support both images and vector data. 
3. THE SPLIT-AND-MERGE PARADIGM 
3.1 Data locality 
Jonker and Nicolescu (2008) distinguished image processing 
into three levels: image oriented operations (point or local 
neighborhood), symbolic processing, and knowledge-based 
processing. In P. P. Jonker's classification, the kernel of point 
operations focused on a single pixel or feature; while for local 
neighborhood operations the neighboring elements also 
participate in current element processing. 
A common characteristic between point operations and local 
neighborhood operations is data locality. Data locality means 
that the kernel of these algorithms requires no other elements or 
only the adjacent elements when processing one element. Data 
locality provides fine-grain data parallelism. It can be illustrated 
by k-nearest neighbor search in Fig. 1. 
- AT ; TE 
* a x 
A 5 PTT TT A > 
cur POM qi 
8... / \ 
mi Ua = 
i M eo 
V \ 7 & 
A No 7) 
va 
"A. bus 
Figure 1. Data locality illustrated with k-nearest neighbor search 
This characteristic provides a basis for LiDAR point cloud 
processing in a Split-and-Merge paradigm. In this paradigm, the 
entire LiDAR point cloud is first decomposed into many 
discrete blocks; then these blocks are individually processed by 
user ' s program S; and finally the intermediate results are 
merged into the actual output by user ' s program M. 
3.2 The Split-and-Merge Paradigm 
In the split-and-merge paradigm, the whole process is abstracted 
into two types of tasks: Split task S and Merge task M. Split 
tasks are mainly controlled by two factors: data decomposition 
and neighbour definition. All the Merge tasks are connected into 
a tree. 
Regular decomposition usually divides the spatial domain into 
rows, columns, or blocks, illustrated in Fig.2. Currently the 
proposed Split-and-Merge paradigm only supports regular 
domain decomposition. The size of rows, columns, or blocks 
represents the granularity of parallel tasks. The grid index 
method is used to index decomposed blocks, which are named 
in this style, dataname colnum rownum. 
Figure 2. Three regular domain decomposition methods 
(rows, columns, blocks) 
For different algorithms when processing one element, the 
requirement for neighboring elements varies. The Split-and- 
Merge paradigm provides an option for users to specify the 
neighborhood configuration for each algorithm. The first 
attribute of neighbor definition is the shape. Two shapes are 
supported in this paradigm, cross or square. The second attribute 
is the size of the neighbor, defined by the block number in one 
direction. The neighbor definition is illustrated in Fig.3. 
  
  
r T eT ri 
| | | | 
| 3 |"a100 7° a) 
r4 L— 
| | | | 
bite] Oiled 1 ui 0 | 1 
| ffi 
| a | 1 91.4 1 
| | 
Figure 3. Two types of neighbor definition 
(left: cross; right: square) 
After decomposing a LiDAR point cloud and concurrently 
processing each block, all the intermediate results are merged 
203
1
2
...
214
215
216
217
218
...
544
545
Full text: Technical Commission IV (B4)

Access restriction

Copyright

Note to user