Grid
Computing is a relatively new term describing a not so new
activity. We’ve been using networks of computers for decades.
What’s new is the cheap bandwidth allowing large numbers of
computers to converse. A typical business has network capacity
exceeding what NASA had when they landed on the moon. The
combined computing power of a company’s desktops can exceed that
of a supercomputer. Products are appearing that exploit this
network and computational capacity to solve business,
scientific, and engineering problems. For certain classes of
problems, the increase in performance is linear with the number
of computers on the grid.
Suppose you have a number
crunching application (or a data crunching application) that
takes minutes or hours to complete. Can a grid complete it
faster? This paper described a particular type of grid and
explains what kinds of problems can or cannot be solved on it.
It then describes the steps involved in adapting a single
computer application to the grid.
Look around. See the Grid?
A grid could be a set of
servers in a computer center that are configured to share
computational tasks in a flexible way. That’s not what we’re
talking about. The grid we’re discussing is the set of desktop
computers connected by your company local area network. Suppose
software exists (it does) that allows any idle desktop computer
to contribute free CPU cycles to a grid. Now think about how
many desktops aren’t being used at any given moment. People go
to meetings. People go to lunch. People go home at night.
You’ve got a grid.
Can you use it for a given
application? Here’s how to tell:
Does the entire application
complete in a few seconds on one computer? If so, there is no
advantage in distributing it on the grid. There is a second or
two overhead in sending a task to an idle computer. Distributing
a fast application will slow it down.
Can the application be broken
into pieces? Look for loops and arrays. These are the features
that suggest a program could be adapted to distributed
execution. Could iterations of a loop be executed in any order?
That’s good. Does one iteration depend on data from another?
That’s bad.
Is the application slow ONLY
because it’s consuming huge amounts of data from a file or
database? Is the actual computation time trivial? Even if you
can make the computation parallel, the I/O time may not improve.
It’s possible to get some benefit from the grid, but it depends
on the nature of the data accesses. If different parts of the
file/database are accessed by different pieces, you’ll see a
benefit. If the whole file must be read by each piece, probably
not.
An exception to the above is if
each piece is reading the same file multiple times. If the file
isn’t too big, you can distribute a copy of the file to each
participating computer and still get an improvement in execution
time.
Does one piece of the
application depend on other pieces? This is where things get
tricky.
Consider a 3D graphics
application that traces rays from a light source, bouncing off
objects, and eventually reaching a viewer. This is a great
application to distribute, because light rays (photons) don’t
interact with each other. Give each piece a range of angles to
work on, distribute the pieces, and sum the results.
What if you want to simulate a
billiards game? In each time slice, the movement of each ball
depends on whether it’s touching another ball. It’s tough to
distribute this kind of computation because after each simulated
time slice you’d have to communicate every ball’s position to
all the computers. The communication cost overwhelms the
processing gain.
If the billiard game had
millions of tiny balls, you could get some benefit by breaking
up the table into a few squares. Each square could be processed
separately. After each simulation slice, you’d only have to
communicate information about the balls near the edges of the
squares, which would be an order of magnitude less data. In
general, distributing programs that require a lot of interaction
between pieces is better done on a dedicated grid, rather than
an ad hoc grid of desktop computers.
Even with the above
restrictions, there are many applications that can benefit from
distribution on a grid of PC’s. The next sections describe a
general plan to adapt an existing program to the grid.
What if I don’t have the source
code?
You may be using a program for which you don’t have the source
code. If so, think about whether you could call that program
multiple times (with subsets of the input data), and then
generate a combined result manually, in a spreadsheet, or
through a simple program or script. If so, then you could
distribute the program, perhaps with a simple wrapper program
that would adapt its I/O interfaces to the framework. If it’s a
commercial program, be careful about license issues.
Adapting a Program for
Distributed Execution
If you have the source code,
see if you can reorganize the program so that code within an
outer loop is reduced to a single function call. That function
should not directly access any global data. It should operate
only on parameters passed to it. Code before the loop may
prepare data. Code after the loop will display and/or combine
the results of each iteration.
Test to see if you broke
anything.
Next, make the function accept
only two parameters – an input filename and an output filename.
The calling program will put all parameters into the input file
and extract results from the output file. If possible, format
the files in ASCII to ease debugging.
Test again.
Next, put the function in its
own executable module. Inside the loop, the calling program
should invoke the new module using “system”, “spawn”, “execl”,
or the like.
Test again.
Finally, replace the local
invocation of the new module with a call to a grid framework.
The framework will distribute
the module and the input files, and collect the output files.
Particular frameworks will have
unique interface requirements, but the basic functionality is
the same.
Test again.
You are done!
This step-by-step adaptation
will help you isolate problems as early as possible. Debugging
distributed programs can be very difficult. You should always
retain the ability to execute the same core logic in a
single-threaded, non-distributed mode. This will allow stepping
through the program in a debugger, and inspecting the input and
output files as they are generated.
Two final topics to consider
are error handling and forced termination. Distributed programs,
particularly on ad hoc grids, must expect that executions of one
or more pieces will be interrupted or fail. The framework will
handle retries, but the application code must help by behaving
properly. Handling forced termination means the application
should check for some signal from the framework, and then exit.
Below is the C source for a
trivial application before and after adaptation for grid
computing. In calculates the sum and difference between two
values in each element of an array. The framework specific final
step is not shown.
ORIGINAL PROGRAM
/* File
SumDiffMem.c
//
// Sample
of science/engineering program that processes an array of data
structures.
//
//
The processData
routine is called sequentially with each element of the array.
// Basic
processing is independent of other elements, but...
// In
addition, there is a total structure that is modified.
//
*/
#include
<stdio.h>
typedef
struct {
int
value1;
int
value2;
int sum;
int diff;
}SD_DATA;
#define
NUM_JOBS 3
SD_DATA
dataArray[NUM_JOBS]
= {
{1,3,0,0},
{2,5,0,0},
{4,1,0,0}
};
typedef
struct {
int sum;
int diff;
}SD_TOTAL;
SD_TOTAL
total;
void
processData(SD_DATA *dp,
SD_TOTAL *tot)
{
dp->sum =
dp->value1 + dp->value2;
dp->diff
= dp->value1 -
dp->value2;
tot->sum += dp->sum;
tot->diff += dp->diff;
}
int
main(int argc,
char **argv)
{
int
i;
total.diff = 0;
total.sum = 0;
for(i =
0; i < NUM_JOBS;
i++){
processData(dataArray
+ i, &total);
}
for(i =
0; i < NUM_JOBS;
i++){
printf("%d
%d %d %d %d\n",
i,
dataArray[i].value1,
dataArray[i].value2,
dataArray[i].sum,
dataArray[i].diff);
}
printf("Total
%d %d\n",total.sum,total.diff);
}
ADAPTED PROGRAM - CALLING MODULE
/* File
SumDiffFile.c
//
// Sample
of science/engineering program that processes an array of data
structures.
//
//
Together with SumDiffProcess.exe, implements same processing as
SumDiffMem.exe
//
//
As a step toward distributed
processing, SumDiffMem.c has been
broken into
// this
outer program (SumDiffFile.c) and
SumDiffProcess.c, which operates
// on a
single data structure.
//
// Data
is communicated to/from SumDiffProcess
by invoking a command:
// SumDiffProcess
inputFile
outputFile
//
//
Unlike the memory version,
calculation of totals is handled in this calling program.
//
*/
#include
<stdio.h>
#include
<process.h>
#include
<stdlib.h>
typedef
struct {
int
value1;
int
value2;
int sum;
int diff;
}SD_DATA;
SD_DATA
dataArray[3]
= {
{1,3,0,0},
{2,5,0,0},
{4,1,0,0}
};
typedef
struct {
int sum;
int diff;
}SD_TOTAL;
SD_TOTAL
total;
int
externalProcessData(int
index,SD_DATA *dp)
{
char inFileName[100];
char outFileName[100];
char cmdLine[100];
char buf[100];
FILE *f;
//
make unique filenames for debugging and future parallel
processing
sprintf(inFileName,"dataFile%d.txt",index);
sprintf(outFileName,"resultFile%d.txt",index);
f
= fopen(inFileName,"w+t");
fprintf(f,"%d\n",dp->value1);
fprintf(f,"%d\n",dp->value2);
fclose(f);
//
build command line to pass to system
sprintf(cmdLine,"SumDiffProcess
%s %s",inFileName,
outFileName);
system(cmdLine);
//
get results from output file
f
= fopen(outFileName,"rt");
if (!f)return 0;
fgets(buf,100,f);
dp->sum =
atoi(buf);
fgets(buf,100,f);
dp->diff
= atoi(buf);
fclose(f);
return 1;
}
int
main(int argc,
char **argv)
{
int
i;
total.diff = 0;
total.sum = 0;
for(i =
0; i < 3; i++){
externalProcessData(i,dataArray
+ i);
}
for(i =
0; i < 3; i++){
// totals calculated here, not in individual
element processing
total.sum +=
dataArray[i].sum;
total.diff +=
dataArray[i].diff;
printf("%d
%d %d %d %d\n",
i,
dataArray[i].value1,
dataArray[i].value2,
dataArray[i].sum,
dataArray[i].diff);
}
printf("Total
%d %d\n",total.sum,total.diff);
}
ADAPTED PROGRAM -
CALLED MODULE
/* File
SumDiffProcess.c
//
// Sample
of science/engineering program that processes an array of data
structures.
//
//
Together with SumDiffFile.exe, implements same processing as
SumDiffMem.exe.
//
//
As a step toward distributed
processing, SumDiffMem.c has been
broken into
// an
outer program (SumDiffFile.exe) and this one, which operates
// on a
single data structure.
//
//
This program is invoked by
SumDiffFile via the system command.
// Data
is communicated via an input file (argv[1])
and an output file (argv[2]);
//
//
Unlike the memory version,
calculation of totals is handled
// in the
calling program (SumDiffFile)
//
The processData
routine still modifies totals, but the data goes nowhere.
//
This approach allows the
processData routine to remain
unchanged
// from
memory to file versions.
//
In real applications, it may be
complex code that is better left untouched.
//
Feeding it dummy data references may
be safer.
//
*/
#include
<stdio.h>
#include
<stdlib.h>
typedef
struct {
int
value1;
int
value2;
int sum;
int diff;
}SD_DATA;
typedef
struct {
int sum;
int diff;
}SD_TOTAL;
//
The following routine should be
called on the order of once per second when
//
processing is going to take a long time.
// Most
real applications will have a loop where you can put a call to
checkAbort().
// If
not, a separate thread should be created to call
checkAbort()
every second.
void
checkAbort()
{
FILE *f = fopen("DGabort.flg","r");
if (f){
fclose(f);
exit(0);
}
}
void
processData(SD_DATA *dp,
SD_TOTAL *tot)
{
dp->sum =
dp->value1 + dp->value2;
dp->diff
= dp->value1 -
dp->value2;
tot->sum += dp->sum;
tot->diff += dp->diff;
checkAbort();
// silly in this example
}
int
main(int argc,
char ** argv)
{
FILE *fin;
FILE *fout;
char buf[100];
SD_TOTAL tot; // dummy - never output
SD_DATA data;
//
Clear the abort flag
//
DeskGrid wrapper
pgm should have already done this
unlink("DGabort.flg");
//
Input file is first command line argument
fin = fopen(argv[1],"rt");
//
value1 is in first line of input file
fgets(buf,100,fin);
data.value1 = atoi(buf);
//
value2 is in second line of input file
fgets(buf,100,fin);
data.value2 = atoi(buf);
fclose(fin);
processData(&data,&tot);
//
Output file is second command line
argument
fout =
fopen(argv[2],"w+t");
fprintf(fout,"%d\n",data.sum);
fprintf(fout,"%d\n",data.diff);
fclose(fout);
}