Full text: Proceedings, XXth congress (Part 3)

also instrumental in introducing us to the large potential 
available in parallel computing. One major disadvantage of the 
MPI implementation was that it required dedicated nodes for 
computation, which renders most of the idle high-end 
workstations unusable. Even though, simple programs could be 
written easily it requires a significant learning curve to get the 
maximum benefit out of parallel computing model of MPI. 
Some of the issues that need to be addressed in parallel 
programs are minimization of communication, load balancing 
and fault tolerance. The scheduling, monitoring and managing 
of jobs are not addressed and would require integration with 
another tool. 
2.2 Parallel Virtual Machine (PVM) 
PVM is an integrated set of software tools and libraries that 
emulates a general purpose, flexible, heterogeneous parallel 
computing framework on interconnected computers of varied 
architecture (Sterling, 2002). It is a portable message-passing 
programming system, designed to link separate host machines 
to form a "virtual machine" which is a single, manageable 
computing resource (Geist, 1994). 
The biggest advantage of PVM is that it has a large user base 
supporting the libraries and tools. Writing our proxies using 
PVM allowed us to leverage this knowledge that resulted in the 
least amount of modification to the existing applications. It also 
introduced us to parallel programming. The disadvantages were 
that it was not easy to set up and not mature in the Windows 
platform. It has also been superseded by MPI and is considered 
not as fast. As in the MPI case, job scheduling and management 
is not addressed. 
2.3 Condor: High Throughput Job Scheduler 
Condor is a sophisticated and unique distributed job scheduler 
developed by the Condor research project at the University of 
Wisconsin-Madison Department of Computer Science (Sterling, 
2002). "Condor is a specialized workload management system 
for compute-intensive jobs. Like other full-featured batch 
systems, Condor provides a job queuing mechanism, scheduling 
policy, priority scheme, resource monitoring, and resource 
management. Users submit their serial or parallel jobs to 
Condor, Condor places them into a queue, chooses when and 
where to run the jobs based upon a policy, carefully monitors 
their progress, and ultimately informs the user upon 
completion." (Condor Team, 2003). It provides a high 
throughput computing environment that delivers large amounts 
of computational power over a long period of time even in case 
of machine failures. Condor has more extensive features that 
are well documented in the manual and appear in many 
publications from the research team. 
Condor proxies were developed to submit jobs to the Condor 
pool and monitor its progress using the Condor command line 
executables. The most impressive thing about Condor was that 
set up was a breeze and everything worked right out of the box. 
The ability to effectively harness wasted CPU power from idle 
desktop workstations was an added benefit that gave Condor an 
edge over the others. The scheduling, monitoring and inbuilt 
fault tolerances were also features that could not be matched by 
any of the other HPC models. Moreover, since it could control 
both serial and parallel (MPI, PVM) programs it provides us 
with growth potential. 
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004 
The vanilla universe (which is Condor terminology for serial 
job management) directly matched our distribution model. In 
writing the proxies all we had to do were minor changes to each 
application to work on the subset of the input XML files and 
modify the job submission text files of Condor to kick of a 
batch file that sets up the shared drives. The major difficulty in 
using Condor was the absence of an API that allowed easy third 
party integration to monitor/manipulate jobs and machines. 
Another shortcoming was the lack of GUI's for job control and 
feedback. Clipped functionalities on the windows platform (no 
transparent authentication & failure handling, automatic drive 
mappings etc) were some of the minor problem we came across 
at the beginning that has been partially addressed since then in 
newer versions of Condor. 
2.4 Distributed Component Object Model (DCOM) 
DCOM is a protocol that is an extension of the Component 
Object Model (COM) from Microsoft that uses remote 
procedure calls to provide location independence by allowing 
components to run on different machine from the client. It 
enables software components to communicate directly over a 
network in a reliable, secure, and efficient manner. The theory 
of DCOM is great, but the implementation has proved less than 
stellar (Templeman, 2003). By the time we started evaluating 
distributed computing, DCOM has lost favor to newer 
technologies such as web services (Templeman, 2003). 
The job management for COM components could be handled 
by Microsoft Application Center 2000 that is designed to enable 
Web applications to achieve on-demand scalability and 
mission-critical availability through software scaling, while 
reducing operational complexity and costs (Microsoft, 2001). 
As a full GUI based product, it appears to be uncomplicated to 
configure and set up Application Center clusters. However, it is 
mainly geared towards web service and our research did not 
turn up how it could be used for other types of applications. It 
was also not clear how one would write a component that could 
be load balanced. Moreover, this solution will also require a 
dedicated computation farm and has a high software price for 
each server added to the cluster. 
2.5 Selected Solution 
Condor with its ease of set up and high throughput computing 
was finally selected for implementation. Proxies were 
developed for each computationally intensive process. These 
proxies submit the job, get the status reported from each node 
and summarize the results as shown in Figure 1. The installation 
assumes that all the nodes have fast access to the file servers. 
The maximum output from this configuration will be utilized by 
high-end fiber array Storage Area Network (SAN) that provide 
high read and write performance. Using Windows Distributed 
File System (DFS) or different input and output locations from 
various file servers could also avoid I/O bottleneck. 
The following section will show a typical set up that uses 
Condor and the proxies developed at LGGM. We will finally 
summarize the actual result attained for some projects. 
     
  
   
     
  
   
  
   
   
  
  
  
  
    
  
  
  
   
  
  
  
  
   
  
  
  
   
  
  
   
  
  
  
  
  
  
  
  
  
  
   
  
   
      
   
   
  
  
   
  
   
  
  
   
   
  
  
  
   
    
  
   
   
Inte 
Subn 
3.1 
The 
valid 
map| 
Conc 
Pool 
matc| 
impo 
envir 
Subn 
subm 
as th 
mach 
run ti 
Com] 
comp 
mach 
netwc 
submi 
File S 
The s 
OS or 
will o 
Licen 
floatir 
compi 
somet 
than t 
handle 
runs tc 
Accor 
have t 
going 
operat 
combi 
it has t
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.