Full text: Proceedings, XXth congress (Part 3)

   
USA - 
However, the 
en too high for 
ed and the high 
problems into 
ic workflows. 
proliferation of 
ry into parallel 
advances seen 
summarizes a 
obtained in the 
matching. We 
ting model in a 
nerated; these 
in, 2000). The 
ata and adding 
supplied by a 
x Corporation. 
plane to create 
evel 1 images. 
rocess which is 
y requirements, 
MA, LGGM's 
images using 
gulated images 
eing processed 
o is the ground 
s the workflow 
to download, 
ages from the 
sists of highly 
gned to handle 
user to perform 
tasks. 
sors, especially 
nd radiometry 
However, the 
> has been à 
ngle computer. 
Vest Geomatics 
ht to product 
m can only be 
International Archives of the Photogrammetry, 
addressed using High Performance Computing (HPC). High 
Performance Computing is defined as the technology that is 
used to provide solutions to problems that either require 
significant computational power or need to process very large 
amounts of data quickly. In HPC, enormous processing power, 
fast networks, and huge amounts of data storage are applied to 
solve complex problems by utilizing industry standard, volume 
components. HPC clusters exploit recent performance advances 
in commodity processor and networking technology to make 
supercomputing affordable and accessible. 
This paper endeavors to summarize the challenges of dealing 
with the considerable amount of data being generated by digital 
sensors and the need for fast turn around times. It also shows 
how the computationally intensive processes in ADS40 ground 
processing are distributed to an HPC cluster. We conclude by 
indicating our current processing capacity and the long term 
plans of how to meet these challenges. 
2. SOLUTION 
North West Geomatics were one of the early adopter of the 
ADS40. They were quick to point out that if they were to fulfill 
their business plan for fast turn around of projects they needed 
more than a highly optimized solution. To address this problem 
Leica started looking at various technologies for parallel or 
distributed computing. The following criteria were used to 
select the best HPC technology that addresses our customer's 
problems and give us fast to market solutions: 
* Minimum changes to current software 
* Easy to set up and configure on the client's computer 
e Be able to schedule jobs 
e Work on Microsoft Windows platform 
* Prefer to use idle cycles of workstations 
The minimum changes to the current software requirement 
results from the fact that we already had a working solution and 
didn’t want to burden a simple user with the set up and 
configuration of a cluster of computers. GPro’s applications are 
highly threaded and optimized, so for production shop that dealt 
only with a small amount of data at a time they could continue 
using the already existing solution with out complicating their 
workflow. The next requirement, ease of deployment was a 
major criteria in our selection process. Most of our users do not 
have a large IT department with the knowledge and the budget 
to deploy, configure and fine-tune a dedicated cluster. One of 
our main goals in testing has been to see how easily the solution 
works out of the box, what the minimum hardware and software 
requirements are. 
As in any time consuming process, being able to schedule jobs 
for processing, setting their priorities, ability to cancel them is 
critical in fully utilizing a computational resource. Large 
volume production projects come in stages and there is a large 
contention for the resources. A good job scheduling tool will 
allow easy prioritization of jobs, monitoring and above all fault 
tolerance since hardware failure is imminent in a cluster 
environment. In addition, since Microsoft Windows is our main 
development environment the tool selected for distribution has 
to be mature and well supported on this platform. Most of the 
tools in HPC are available only for Linux and finding mature 
implementations of libraries for Windows was a challenge. 
Lastly, production shops have high-end workstations that are 
used for triangulation, feature collection and quality control. 
  
Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004 
These high-end machines are busy during the day when the 
operators are working but sit idle during the night. It was 
considered an added bonus if the solution was able to take 
advantage of this computational power during off hours. 
Based on these criteria's the following technologies were 
selected for investigation: 
*  MPI: Message Passing Interface 
e PVM: Parallel Virtual Machine 
e Condor: High Throughput Job Scheduler 
e  DCOM: Distributed Component Object Model 
There are varieties of other distributed and parallel computing 
technologies not covered here. The above were selected since 
they are extensively used in the scientific community to solve 
large memory and CPU intensive numerical problems and are 
most relevant for the Microsoft Windows platform. 
There | were two choices in using the different 
parallel/distribution technologies. The first was to write proxies 
that use these technologies to distribute the load across 
machines and continue to use the current programs with the 
minimum change as possible. This, however, implied that the 
granularity of distribution has to be large enough to fit the 
existing applications that work on image-by-image basis. The 
other solution was to rewrite all our computationally intensive 
application using parallel methodologies. Parallization will 
have resulted in the largest scalability since it allows a 
production shop to reduce the turn around time from flight to 
product generation by just adding more computation hardware. 
However, this revolutionary approach was rejected from the 
start because of time constraints since it will require a large 
redesign and implementation effort. 
The following sections will point out the advantages and 
disadvantages for each proposal. Except for Distributed 
Component Object Model, which was purely based on literature 
survey, all other options were installed and experimented with. 
2.1 Message Passing Interface (MPI) 
MPI addresses the message-passing model of parallel 
computation, in which processes with separate address spaces 
synchronize with one another and move data from the address 
space of one process to that of another by sending and receiving 
messages (Sterling, 2002). MPI is a standardized interface (not 
an implementation) that is wildly used to write parallel 
programs. It was designed by a forum that sought to define an 
interface that attempts to establish a practical, portable, 
efficient, and flexible standard for message passing. There are a 
lot of MPI implementations ranging from free open source 
multi-platform libraries to highly optimized ones for a specific 
hardware. This range of choices allows an MPI program to be 
portable across multiple HPC platforms from simple cluster to a 
supercomputer. 
Writing proxies based on MPI turned out to be easy for the 
simplistic approach of distributing the work based on individual 
images or strips. The prototypes were developed using the 
public domain implementation MIPICH (Ashton, 2002) for 
windows. Configuration and set up of clusters using MPI was 
straightforward for the Windows platform. MPICH even had a 
plain GUI for feedback and control of running jobs. Image 
based job distribution was carried out with minimum 
modification to the existing software. This implementation was 
  
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.