A parallel program is a computer program written to run on a cluster of computers, instead of just one computer. A cluster of computers is many everyday computers networked together in a specific way so that they can communicate. This is done so that programs with a lot of computations can run faster. Beowulf clusters are a cheaper alternative to supercomputers. They are however less convenient because you have to explicitly program what the different computers do and how they interact. This website will guide you in running some sample parallel programs using a Beowulf cluster.
To write a parallel program in C++ you first of all include the library "mpi.h" in your code. Then you continue to write a program. When you compile this program using the command mpic++ or mpiCC and run it using the command mpirun, a copy of this excecutable will be run simultaneously on every computer in the cluster.
In order to make the computers work together one uses functions from the
library "mpi.h" that do things such as:
-Identify each computer with a number by setting an
argument of the function to that
number (this is called the computer's "rank")
-Send or recieve data from one
specified rank to another specified rank
-Set a barrier that each computer
stops at untill all the computers have reached it, then they all
continue on and they are syncronized
To learn more about MPI check out these links:
a nice
introductory MPI tutorial PDF
a comprehensive list of
MPI functions
an
extrordinarily detailed documentation for MPI
Some random notes from my experience parallel programming with
MPI:
-Sending information from computer to computer takes the most time, so keep
that to a absolute minimum! Because of this, sometimes it is more
time efficient to perform the same calculation
on every computer even though that might seem wasteful, just so you dont
have to send the results out.
-Every data object
you declare is like an array of that object, one on each
computer.
-It is difficult to get into the
mindset when writing code, that this code is being run multiple times, once on
every computer.
-It helps to keep
strict track of each function and whether it needs to be called
by all machines at once or if it can be called on its
own.