Python & MPI
Launching Programs on the HPC using MPI
Here at William & Mary, we use something called Message Passing Interface (i.e., MPI), which allows nodes to communicate with one another. This is a relatively simple interface
Packages
In order to use MPI from a python environment, we need a package called mpi4py. You can install it with conda by using conda install -c conda-forge mpi4py
.
Please note the below was tested with mpi4py version 2.0.0.
Job Script
Our initial job script is very similar to other job scripts you will have seen in this tutorial. A few things to note include:
We are asking for 2 nodes and 12 processors per node (ppn). That means we should have 24 cores allocated.
We are loading the mvapich2-ib library, which allows us to use mvp2run. mvp2run is a wrapper around MPI, which handles the inter-node communications.
Note you will probably have to change the two "cd" lines to match whatever directory you dropped your scripts into.
Python Script
Here we have an example python script that shows some of the basics of how MPI works. A few things to note:
Anything inside the "if" statements will run only on certain processors - i.e., "rank 0" is the first processor.
In this example, we're only tasking two of our total 24 processors - i.e., every processors will run the "print" statement at the top, then processor 0 will declare it is rank 0 and create a parameter list. It will then send it to rank 1 (dest=1), with a tag of 11 (you can tag whatever you want).
Then, on rank 1 processor, it will load a pandas dataset, and then receive the list of parameters from rank 0 (specifically asking for tag 11).
You can arbitrarily extend this communication logic to implement a wide range of distribution strategies - for example, a Random Search.
Last updated