Home Unix/Linux Mainframe Other Servers Software Reviews
 

DeskGrid - Grid Computing on Desktop PCs

 John F. Doyle

Founder

               DeskGrid

 

In a thirty year career, John F. Doyle designed mission critical multiprocessor systems for NASA (telemetry system for Kennedy Space Center), state and regional lotteries, on and off-track betting, and interactive television. He served as Engineering VP for General Instrument and ICTV. As Info Designs’ principal he has developed a commercial telemetry system, and lead the performance tuning of a major corporation’s Java web application running on dozens of Unix servers. He is the sole or principal inventor on four U.S. patents, including one that is the basis for DeskGrid. He can be reached at jdoyle@deskgrid.com


Grid Computing on Desktop PCs Can Help You

Grid Computing is a relatively new term describing a not so new activity. We’ve been using networks of computers for decades. What’s new is the cheap bandwidth allowing large numbers of computers to converse. A typical business has network capacity exceeding what NASA had when they landed on the moon. The combined computing power of a company’s desktops can exceed that of a supercomputer. Products are appearing that exploit this network and computational capacity to solve business, scientific, and engineering problems. For certain classes of problems, the increase in performance is linear with the number of computers on the grid.

Suppose you have a number crunching application (or a data crunching application) that takes minutes or hours to complete. Can a grid complete it faster? This paper described a particular type of grid and explains what kinds of problems can or cannot be solved on it. It then describes the steps involved in adapting a single computer application to the grid.

Look around. See the Grid?

A grid could be a set of servers in a computer center that are configured to share computational tasks in a flexible way. That’s not what we’re talking about. The grid we’re discussing is the set of desktop computers connected by your company local area network.  Suppose software exists (it does) that allows any idle desktop computer to contribute free CPU cycles to a grid.  Now think about how many desktops aren’t being used at any given moment. People go to meetings. People go to lunch. People go home at night.

You’ve got a grid.

Can you use it for a given application? Here’s how to tell:

Does the entire application complete in a few seconds on one computer? If so, there is no advantage in distributing it on the grid. There is a second or two overhead in sending a task to an idle computer. Distributing a fast application will slow it down.

Can the application be broken into pieces?  Look for loops and arrays. These are the features that suggest a program could be adapted to distributed execution. Could iterations of a loop be executed in any order? That’s good. Does one iteration depend on data from another? That’s bad.

Is the application slow ONLY because it’s consuming huge amounts of data from a file or database?  Is the actual computation time trivial? Even if you can make the computation parallel, the I/O time may not improve. It’s possible to get some benefit from the grid, but it depends on the nature of the data accesses. If different parts of the file/database are accessed by different pieces, you’ll see a benefit. If the whole file must be read by each piece, probably not.

An exception to the above is if each piece is reading the same file multiple times. If the file isn’t too big, you can distribute a copy of the file to each participating computer and still get an improvement in execution time.

Does one piece of the application depend on other pieces? This is where things get tricky.

Consider a 3D graphics application that traces rays from a light source, bouncing off objects, and eventually reaching a viewer. This is a great application to distribute, because light rays (photons) don’t interact with each other.  Give each piece a range of angles to work on, distribute the pieces, and sum the results.

What if you want to simulate a billiards game? In each time slice, the movement of each ball depends on whether it’s touching another ball. It’s tough to distribute this kind of computation because after each simulated time slice you’d have to communicate every ball’s position to all the computers. The communication cost overwhelms the processing gain.

If the billiard game had millions of tiny balls, you could get some benefit by breaking up the table into a few squares. Each square could be processed separately. After each simulation slice, you’d only have to communicate information about the balls near the edges of the squares, which would be an order of magnitude less data. In general, distributing programs that require a lot of interaction between pieces is better done on a dedicated grid, rather than an ad hoc grid of desktop computers.

Even with the above restrictions, there are many applications that can benefit from distribution on a grid of PC’s. The next sections describe a general plan to adapt an existing program to the grid.

What if I don’t have the source code?
You may be using a program for which you don’t have the source code. If so, think about whether you could call that program multiple times (with subsets of the input data), and then generate a combined result manually, in a spreadsheet, or through a simple program or script. If so, then you could distribute the program, perhaps with a simple wrapper program that would adapt its I/O interfaces to the framework. If it’s a commercial program, be careful about license issues.

Adapting a Program for Distributed Execution

If you have the source code, see if you can reorganize the program so that code within an outer loop is reduced to a single function call. That function should not directly access any global data. It should operate only on parameters passed to it. Code before the loop may prepare data. Code after the loop will display and/or combine the results of each iteration.

Test to see if you broke anything.

Next, make the function accept only two parameters – an input filename and an output filename. The calling program will put all parameters into the input file and extract results from the output file.  If possible, format the files in ASCII to ease debugging.

Test again.

Next, put the function in its own executable module. Inside the loop, the calling program should invoke the new module using “system”, “spawn”, “execl”, or the like.

Test again.

Finally, replace the local invocation of the new module with a call to a grid framework.

The framework will distribute the module and the input files, and collect the output files.

Particular frameworks will have unique interface requirements, but the basic functionality is the same.

Test again.

You are done!

This step-by-step adaptation will help you isolate problems as early as possible. Debugging distributed programs can be very difficult. You should always retain the ability to execute the same core logic in a single-threaded, non-distributed mode. This will allow stepping through the program in a debugger, and inspecting the input and output files as they are generated.

Two final topics to consider are error handling and forced termination. Distributed programs, particularly on ad hoc grids, must expect that executions of one or more pieces will be interrupted or fail. The framework will handle retries, but the application code must help by behaving properly. Handling forced termination means the application should check for some signal from the framework, and then exit.

Below is the C source for a trivial application before and after adaptation for grid computing.  In calculates the sum and difference between two values in each element of an array. The framework specific final step is not shown.

 

ORIGINAL PROGRAM

/* File SumDiffMem.c

//

// Sample of science/engineering program that processes an array of data structures.

//

// The processData routine is called sequentially with each element of the array.

// Basic processing is independent of other elements, but...

// In addition, there is a total structure that is modified.

//

*/

#include <stdio.h>

 

typedef struct {

       int value1;

       int value2;

       int sum;

       int diff;

}SD_DATA;

 

#define NUM_JOBS 3

 

SD_DATA dataArray[NUM_JOBS] = {

       {1,3,0,0},

       {2,5,0,0},

       {4,1,0,0}

};

 

typedef struct {

       int sum;

       int diff;

}SD_TOTAL;

SD_TOTAL total;

 

void processData(SD_DATA *dp, SD_TOTAL *tot)

{

       dp->sum = dp->value1 + dp->value2;

       dp->diff = dp->value1 - dp->value2;

       tot->sum += dp->sum;

       tot->diff += dp->diff;

}

 

int main(int argc, char **argv)

{

       int i;

       total.diff = 0;

       total.sum = 0;

       for(i = 0; i < NUM_JOBS; i++){

              processData(dataArray + i, &total);     

       }

       for(i = 0; i < NUM_JOBS; i++){

              printf("%d %d %d %d %d\n",

                     i,

                     dataArray[i].value1,

                     dataArray[i].value2,

                     dataArray[i].sum,

                     dataArray[i].diff);

       }

       printf("Total %d %d\n",total.sum,total.diff);

 

}

 

 

ADAPTED PROGRAM - CALLING MODULE

/* File SumDiffFile.c

//

// Sample of science/engineering program that processes an array of data structures.

//

// Together with SumDiffProcess.exe, implements same processing as SumDiffMem.exe

//

// As a step toward distributed processing, SumDiffMem.c has been broken into

// this outer program (SumDiffFile.c) and SumDiffProcess.c, which operates

// on a single data structure.

//

// Data is communicated to/from SumDiffProcess by invoking a command:

//            SumDiffProcess inputFile outputFile

//

// Unlike the memory version, calculation of totals is handled in this calling program.

//

*/

 

 

#include <stdio.h>

#include <process.h>

#include <stdlib.h>

 

typedef struct {

       int value1;

       int value2;

       int sum;

       int diff;

}SD_DATA;

 

SD_DATA dataArray[3] = {

       {1,3,0,0},

       {2,5,0,0},

       {4,1,0,0}

};

 

typedef struct {

       int sum;

       int diff;

}SD_TOTAL;

SD_TOTAL total;

 

int externalProcessData(int index,SD_DATA *dp)

{

       char inFileName[100];

       char outFileName[100];

       char cmdLine[100];

       char buf[100];

       FILE *f;

 

       // make unique filenames for debugging and future parallel processing

       sprintf(inFileName,"dataFile%d.txt",index);

       sprintf(outFileName,"resultFile%d.txt",index);

       f = fopen(inFileName,"w+t");

       fprintf(f,"%d\n",dp->value1);

       fprintf(f,"%d\n",dp->value2);

       fclose(f);

 

       // build command line to pass to system

       sprintf(cmdLine,"SumDiffProcess %s %s",inFileName, outFileName);

       system(cmdLine);

 

       // get results from output file

       f = fopen(outFileName,"rt");

       if (!f)return 0;

       fgets(buf,100,f);

       dp->sum = atoi(buf);

 

       fgets(buf,100,f);

       dp->diff = atoi(buf);

       fclose(f);

       return 1;

}

 

int main(int argc, char **argv)

{

       int i;

       total.diff = 0;

       total.sum = 0;

       for(i = 0; i < 3; i++){

              externalProcessData(i,dataArray + i);   

       }

       for(i = 0; i < 3; i++){

              // totals calculated here, not in individual element processing

              total.sum += dataArray[i].sum;

              total.diff += dataArray[i].diff;

 

              printf("%d %d %d %d %d\n",

                     i,

                     dataArray[i].value1,

                     dataArray[i].value2,

                     dataArray[i].sum,

                     dataArray[i].diff);

       }

       printf("Total %d %d\n",total.sum,total.diff);

 

}

ADAPTED PROGRAM - CALLED MODULE

/* File SumDiffProcess.c

//

// Sample of science/engineering program that processes an array of data structures.

//

// Together with SumDiffFile.exe, implements same processing as SumDiffMem.exe.

//

// As a step toward distributed processing, SumDiffMem.c has been broken into

// an outer program (SumDiffFile.exe) and this one, which operates

// on a single data structure.

//

// This program is invoked by SumDiffFile via the system command.

// Data is communicated via an input file (argv[1]) and an output file (argv[2]);

//

// Unlike the memory version, calculation of totals is handled

// in the calling program (SumDiffFile)

// The processData routine still modifies totals, but the data goes nowhere.

// This approach allows the processData routine to remain unchanged

// from memory to file versions.

// In real applications, it may be complex code that is better left untouched.

// Feeding it dummy data references may be safer.

//

*/

 

 

#include <stdio.h>

#include <stdlib.h>

 

typedef struct {

       int value1;

       int value2;

       int sum;

       int diff;

}SD_DATA;

 

 

typedef struct {

       int sum;

       int diff;

}SD_TOTAL;

 

// The following routine should be called on the order of once per second when

// processing is going to take a long time.

// Most real applications will have a loop where you can put a call to checkAbort().

// If not, a separate thread should be created to call checkAbort() every second.

void checkAbort()

{

       FILE *f = fopen("DGabort.flg","r");

       if (f){

              fclose(f);

              exit(0);

       }

}

 

void processData(SD_DATA *dp, SD_TOTAL *tot)

{

       dp->sum = dp->value1 + dp->value2;

       dp->diff = dp->value1 - dp->value2;

       tot->sum += dp->sum;

       tot->diff += dp->diff;

       checkAbort(); // silly in this example

}

 

 

int main(int argc, char ** argv)

{

       FILE *fin;

       FILE *fout;

       char buf[100];

       SD_TOTAL tot; // dummy - never output

       SD_DATA data;

 

       // Clear the abort flag

       // DeskGrid wrapper pgm should have already done this

       unlink("DGabort.flg");

 

       // Input file is first command line argument

       fin = fopen(argv[1],"rt");

       // value1 is in first line of input file

       fgets(buf,100,fin);

       data.value1 = atoi(buf);

 

       // value2 is in second line of input file

       fgets(buf,100,fin);

       data.value2 = atoi(buf);

       fclose(fin);

 

       processData(&data,&tot);

 

       // Output file is second command  line argument

       fout = fopen(argv[2],"w+t");

       fprintf(fout,"%d\n",data.sum);

       fprintf(fout,"%d\n",data.diff);

       fclose(fout);

}