Update your repository using git fetch and
git pull.
Often there are situations where it is desirable to have more than one implementation or version of a software package installed on a cluster or computer. The Lmod Environment Modules package facilitates this as it gives users a way to quickly change from one version to another.
To see what modules are available:
module availTo see what modules are currently loaded:
module listTo use the module
openmpi you would load it with
the command
module load openmpiLater, if you wanted to use MPICH, a different implementation of MPI, you can load it with
module load mpichThis will replace the MPI module that is already loaded since only one MPI implementation can be active at a time.
You can type man module or module help
for more information. Here is list of the most common commands:
module avail | display available modules |
module list | list currently installed modules |
module load <module name> | load specified module |
module unload <module name> | unload specified module |
module help | display help information for the module command |
module help <module name> | display help for specified module |
module whatis <module name> | display one-line description of specified module |
Type module list to see what modules are currently
loaded. Be sure that gcc/native is loaded; if it's
note then type module load gcc/native; module
save.
Type module unload openmpi mpich; module load hdf5 to make
sure that the HDF5 module is loaded but no MPI module is
loaded. Confirm that this is the case by typing module
list. The HDF5 module currently loaded is designed for
using with single or multithreaded programs, but not MPI
programs.
To confirm this, type module avail and notice that in
the section headed
by /shared/modulefiles/Compilers/gcc/native you will
see (L,D) next to the HDF5 module. The L
means the module is currently loaded and the D means
this it is the default HDF5 module to load if no module version is
specified.
Now type module load openmpi and notice that you have
message saying that the HDF5 module has been reloaded.
Type module avail and notice that now there is a new
section headed by something
like /shared/modulefiles/MPI/gcc/native/openmpi/4.0.2
and that it now contains the loaded, default HDF5 module.
You may want to once again save your modules so the OpenMPI module will be loaded by default, but be aware that you must unload this module before compiling a non-MPI program that uses HDF5.
Our MPI implementations provide front-ends for the native compilers
that make it easy to build MPI programs. Rather than
using gcc or g++ to compile C or C++
programs, you should use mpicc or mpic++
to compile and link your programs.
When developing your program you will probably just want to run it
on your workstation. Do this with the mpiexec.
Typical usage to run the program my_prog would be
something like
mpiexec -n 4 ./my_progThis runs the program in parallel using the available cores on the workstation.
There are a couple of ways you can run your program so that it uses more than one workstation. One it specify which machines to run the application on. This gets cumbersome, however, when a cluster has large number of machines. Cluster Resource Managers are usually used to do this. In addition, they have an overall view of the resources available in a cluster and control how parallel jobs are run so that programs that need it will have exclusive use of the necessary resources.
The Minor Prophets cluster uses SLURM (Simple Linux Utility
for Resource Management) to manage the cluster's resources. To
run my_prog on 16 cores one would type
salloc -Q -n 16 mpiexec ./mp_progIf you're using MPICH rather than OpenMPI you can use the
salloc command as we did here, but you can also use the
slightly simpler command
srun -n 16 ./mp_progBoth of these work when using MPICH on the Minor Prophets workstations, but you should use only the
salloc command
if using OpenMPI. Regardless of which MPI module is loaded, however,
you can use srun to run non-MPI commands on cluster hosts.
Change to the cps343-hoe/05-intro-to-mpi
directory and use a directory listing to see the files there.
Examine the source code in hello.cc. Notice it
does three things common to nearly all MPI programs:
MPI_Init()
and two other functionsMPI_Finalize()Compile hello.cc and run it:
smake hello.cc salloc -Q -n 4 mpiexec ./hello salloc -Q -n 16 mpiexec ./hello salloc -Q -n 32 mpiexec ./helloYou should see output that indicates 3, 15, and 31 helper processes are running, in addition to the original master (rank 0) process. Compare the output you see with the source code until you understand how the program works.
Compile the pi_mpi.cc program and run it:
smake pi_mpi.cc salloc -Q -n 4 mpiexec ./pi_mpi salloc -Q -n 16 mpiexec ./pi_mpi salloc -Q -n 32 mpiexec ./pi_mpiRun each of the commands several times and notice the variability in performance. In general, however. you should notice pretty close to linear speedup as the number of processors increases. How many process can you start?
Compile the pass-msg.cc program and run it:
smake pass.cc salloc -Q -n 4 mpiexec ./pass-msg salloc -Q -n 16 mpiexec ./pass-msg salloc -Q -n 32 mpiexec ./pass-msgYou should see messages that indicate the message starts at the rank 0 process (the root or master process) and is passed from process to process until it is received by the highest rank process.
Finally, you might be interested to note that that you can use
the salloc/mpiexec combination or
the srun to run non-MPI programs as well. Try
salloc -Q -n 16 mpiexec hostnameand
srun -n 16 hostnameBoth of these commands run 16 instances of the
hostname program on multiple machines in the
cluster. This example is actually useful if you want to find
out what nodes (computers in a cluster) SLURM is assigning
jobs to.
The commands srun and salloc are both
part of SLURM (Simple Linux Utility for Resource Management). Jobs
are usually run on clusters through a workload manager
or resource manager, which is responsible for allocating
the cluster's resources to jobs. Submitted jobs are placed on a
queue and the workload manager (also called a job
scheduler) assigns them to a processor or processors as they
become available. At Gordon we use SLURM as the workload
manager.
In addition to commands used to submit jobs, it is also easy to check on the status of jobs. The command
idle, idle~, mix,
or alloc. The state idle means the node is
powered on and available and all of its CPU cores are currently free
and can be allocated to a job. The state idle~ indicates
the node is currently powered off but will be automatically started
when it is needed. The states mix and alloc
indicate that some or all of the node's cores are allocated to jobs.
The command
squeue but with a graphical user interface.
ring-pass1.cc.
Using pass-msg.cc as starter code, write
an MPI program called ring-pass1.cc that
uses MPI_Send() and MPI_Recv() to pass a
value around a ring of processes using the algorithm below. The
three main variables are
message - an integer, initialized to 1000prev - an integer, the rank of preceding
process in the ringnext - and integer, the rank of the
succeeding process in the ring
initialize MPI
initialize message to 1000
compute prev and next, the ranks of the two neighbors.
if this is the server process:
send message to next
receive message from prev
print message
else:
receive message from prev
print process rank and received message
increment message
send message to next
end if
finalize MPI
How does one determine prev and next? Suppose we're thinking about how process rank = i in a collection of N processes will behave. Let's define the ring so that prev = i − 1 and next = i + 1. This is fine for most processes, but will fail if the rank is 0 or N − 1. We can easily handle these special cases using modular arithmetic.
Test your program with different numbers of processes.
ring-pass2.cc that
introduces timing.
Make copy of your program, calling it ring-pass2.cc,
and modify it so that the master (server) task determines the
elapsed time it takes to pass the message around the ring. The
elapsed time should be displayed by the master process before it
terminates. Use the MPI function MPI_Wtime() to get the
timing information. Read the the manual page (type man
mpi_wtime) for more information.
ring-pass3.cc that
passes a message around the ring a specified number of times.
Copy ring-pass2.cc to ring-pass3.cc and
modify it so that it optionally reads an integer from the command
line that specifies the number of times the message should be passed
(and incremented) around the ring. The default value for this
optional parameter should be 1. You will run your program with a
command like
$ mpiexec -n 8 ./ring-pass3 5This should pass the message around the ring until the master process receives it for the fifth time. Note that all processes will have access to the command line parameter and should make use of this to know when to stop. Try various ring sizes as you test your program.
Complete all three versions of the ring pass program and write a short report that presents your results. Submit your report along with your well commented source code for the third ring-pass program.