Full text: Technical Commission IV (B4)

nes 
  
nene 
  
010 
and 
pility 
lized 
actly 
such 
than 
d by 
  
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012 
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia 
More specifically, the GPU is especially well-suited to address 
problems that can be expressed as data-parallel computations — 
the same program is executed on many data elements in parallel 
(with high arithmetic intensity) — the ratio of arithmetic 
operations to memory operations. Because the same program is 
executed for each data element, there is a lower requirement for 
sophisticated flow control; and because it is executed on many 
data elements and has high arithmetic intensity, the memory 
access latency can be hidden with calculations instead of big 
data caches. 
Data-parallel processing maps data elements to parallel 
processing threads. Many applications that process large data 
sets can use a data-parallel programming model to speed up the 
computations. In 3D rendering, large sets of pixels and vertices 
are mapped to parallel threads. Similarly, image and media 
processing applications such as post-processing of rendered 
images, video encoding and decoding, image scaling, stereo 
vision, and pattern recognition can map image blocks and pixels 
to parallel processing threads. In fact, many algorithms outside 
the field of image rendering and processing are accelerated by 
data-parallel processing, from general signal processing or 
physics simulation to computational finance or computational 
biology. 
3. CUDA (COMPUTE UNIFIED DEVICE 
ARCHITECTURE 
Long time ago, the developers have tried to use GPUs for 
parallel computing. The initial use of these initiatives (such as 
rasterizing and Z-buffering) is very primitive and limited to 
fully utilize the hardware functions. But the shading 
calculations have accelerated the matrix calculations. 
There was a session called “GPGPU” for “GPU computing” in 
SIGGRAPH conference in 2003. But this session has been 
almost no participation. In this session the best known topic was 
“BrookGPU” as an stream programming language. Before the 
publication of this programming language there were two 
software development applications known Direct3D and 
OpenGL. However, limited number of GPU applications can be 
developed with these languages. After that, “Brook project” 
made it possible to using GPUs as a parallel processor and can 
be programming with C language. This project was developed 
by Stanford University and has attracted attention of graphic 
cards companies “NVIDIA” and “ATI” who are the two 
different designer and manufacturer. Later, some people who 
developed “Brook”, was joined to NVIDIA Company and 
started offering a new marketing strategy as a unit of parallel 
computation. Thus, direct use of graphics hardware has 
emerged, and on behalf of a structure called the NVIDIA 
CUDA. 
Although announcements were made earlier, Nvidia introduced 
CUDA to the public in February, 2007. This technology was 
designed to meet several important requirements for a wide 
audience's use. One of the most important requirement is the 
ability to program GPUs easily. Simplicity is necessary to ease 
GPU parallel programming and enable its use in more 
disciplines. Before CUDA, GPU parallel programming was 
limited to shader models of the graphics APIs. Thus, only the 
problems well-suited to the nature of vertex and fragment 
shaders were computed by using GPU parallel processing. 
Additionally, expressing general algorithms in terms of textures 
and GPU provided 3D operations by using only float numbers 
were among the issues that limited the popularity of the GPU 
computing. To achieve the goal of making GPU parallel 
programming easy and practical, Nvidia offered to use C 
programming language with minimal extensions. Another 
important issue is the heterogeneous computing model, which 
takes it possible to use CPU and GPU resources together. 
CUDA lets programmers divide the code and data into sub- 
parts, considering their suitability to the CPU/GPU architecture 
and respective programming techniques. Such a division is 
possible because the host and device have their own memories. 
In this sense, it also becomes possible to port existing 
implementations gradually, from the CPU to the GPU (Yilmaz, 
2010). Briefly, CUDA technology is a software-hardware 
computing architecture developed by NVIDIA and based on the 
C programming language for parallel calculation to controls 
GPU commands and video memory. 
CUDA works with all Nvidia GPUs from the G8x series 
onwards and new series including GeForce, Quadro and the 
Tesla line. The data-parallel and thread-parallel architecture 
introduces scalability. Since no extra effort is necessary to run 
existing solution, the new GPUs are capable of running more 
processing threads. It means that the code designed for the 
Nvidia 8 series runs faster in Nvidia GTX series without any 
additional coding. Nvidia states that programs developed for the 
G8x series will also work without modification on all future 
Nvidia video cards, due to binary compatibility. 
The three abstractions offered by Nvidia ensure the granularity 
required for good data parallelism and thread parallelism. These 
below listed abstractions are designed to make CUDA 
programmers life easy. 
* Thread Group hierarchy: Threads are packed into 
blocks which are also packed into a single grid. 
* Shared memories: CUDA let threads use six different 
memories that are designed to meet different 
requirements. 
e Barrier synchronization: This abstraction 
synchronizes threads within a single block and makes 
a thread wait the others to finish related computing, 
before going further. 
C for CUDA makes it possible to write functions that run on the 
GPU by using C language. These functions are called “kernels”, 
which are executed for each thread in a parallel manner unlike 
the conventional serial programming functions that run only 
once. 
CUDA’s architecture offers thread hierarchy in top-down order 
as follow: 
1. Grid: A grid contains one or two dimensional blocks. 
2. Blocks: A block contains one, two or three 
dimensional threads. Current GPUs allow a block to 
contain 512 threads at most. The blocks are executed 
independently, and they are directed to available 
processors to provide scalability. 
3. Thread: A thread is the basic execution element. 
This hierarchy and the structure are depicted by Figure 4. For 
example if it is assumed that 1048576 pixels to be processed 
independently in parallel manner and the block size is 
determined as 512, the there are 2048 grids. 
157 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.