Introduction to Message Passing Interface (MPI)

Linh B. Ngo

Message Passing

Processes communicate via messages
Messages can be
- Raw data to be used in actual calculations
- Signals and acknowledgements for the receiving processes regarding the workflow

History of MPI

Early 80s:

Various message passing environments were developed
Many similar fundamental concepts
N-cube (Caltech), P4 (Argonne), PICL and PVM (Oakridge), LAM (Ohio SC)

1992:

More than 80 reseachers from different institutions in US and Europe agreed to develop and implement a common standard for message passing
First meeting colocated with Supercomputing 1992

After finalization:

MPI becomes the de-factor standard for distributed memory parallel programming
Available on every popular operating system and architecture
Interconnect manufacturers commonly provide MPI implementations optimized for their hardware
MPI standard defines interfaces for C, C++, and Fortran
- Language bindings available for many popular languages (quality varies)

1994: MPI-1

Communicators
- Information about the runtime environments
- Creation of customized topologies
Point-to-point communication
- Send and receive messages
- Blocking and non-blocking variations
Collectives
- Broadcast and reduce
- Gather and scatter

1998: MPI-2

One-sided communication (non-blocking)
- Get & Put (remote memory access)
Dynamic process management
- Spawn
Parallel I/O
- Multiple readers and writers for a single file
- Requires file-system level support (LustreFS, PVFS)

2012: MPI-3

Revised remote-memory access semantic
Fault tolerance model
Non-blocking collective communication
Access to internal variables, states, and counters for performance evaluation purposes

Set up MPI on Palmetto for C/C++

Open a terminal and run the following commands. For ssh-keygen, hit Enter for everything and don't put in a password when asked.

$ cd ~
$ sudo yum -y group install "Development Tools"
$ wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.2.tar.gz
$ tar xzf openmpi-3.1.2.tar.gz
$ cd openmpi-3.1.2
$ ./configure --prefix=/opt/openmpi/3.1.2
$ sudo make
$ sudo make all install
$ echo "export PATH='$PATH:/opt/openmpi/3.1.2/bin'" >> ~/.bashrc
$ echo "export LD_LIBRARY_PATH='$LD_LIBRARY_PATH:/opt/openmpi/3.1.2/lib/'" >> ~/.bashrc
$ source ~/.bashrc
$ cd ~
$ ssh-keygen
$ ssh-copy-id localhost

After the above steps are completed successfully, you will need to return to the VM, run the command source ~/.bashrc and restart the Jupyter notebook server.

The next cells can be run from inside the Jupyter notebook.



In [1]:

    
%%writefile codes/openmpi/first.c
#include <stdio.h>
#include <sys/utsname.h>
#include <mpi.h>
int main(int argc, char *argv[]){
  MPI_Init(&argc, &argv);
  struct utsname uts;
  uname (&uts);
  printf("My process is on node %s.\n", uts.nodename);
  MPI_Finalize();
  return 0;
}









    



Overwriting codes/openmpi/first.c



In [1]:

    
!mpicc codes/openmpi/first.c -o ~/first
!mpirun -np 2 ~/first









    



My process is on node localhost.localdomain.
My process is on node localhost.localdomain.

The working of MPI in a nutshell

All processes are launched at the beginning of the program execution
- The number of processes are user-speficied
- Typically, this number is matched to the total number of cores available across the entire cluster
All processes have their own memory space and have access to the same source codes

Basic parameters available to individual processes:

MPI.COMM_WORLD
MPI_Comm_rank()
MPI_Comm_size()
MPI_Get_processor_name()

MPI defines communicator groups for point-to-point and collective communications
- Unique IDs (rank) are defined for individual processes within a communicator group
- Communications are performed based on these IDs
- Default global communication (MPI_COMM_WORLD) contains all processes
- For $N$ processes, ranks go from $0$ to $N-1$



In [11]:

    
%%writefile codes/openmpi/hello.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  char proc_name[MPI_MAX_PROCESSOR_NAME];
  int proc_name_len;
    
  /* Initialize the MPI environment */
  MPI_Init(&argc, &argv);
  
  /* Get the number of processes */
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  /* Get the rank of the process */
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  /* Get the name of the processor */
  MPI_Get_processor_name(proc_name, &proc_name_len);

  /* Print off a hello world message */
  printf("Hello world from processor %s, rank %d out of %d processes\n",
           proc_name, my_rank, size);

  /* Finalize the MPI environment. */
  MPI_Finalize();
}









    



Overwriting codes/openmpi/hello.c



In [14]:

    
!mpicc codes/openmpi/hello.c -o ~/hello
!mpirun -np 2 ~/hello









    



Hello world from processor localhost.localdomain, rank 0 out of 2 processes
Hello world from processor localhost.localdomain, rank 1 out of 2 processes

Important

On many VM, you might not have enough physical cores located to the VM in VirtualBox. To enable the simulation of multiple processes, you need to add --map-by core:OVERSUBSCRIBE to your mpirun commands



In [68]:

    
!mpirun -np 8 ~/hello
!mpirun -np 8 --map-by core:OVERSUBSCRIBE ~/hello









    



--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 8 slots
that were requested by the application:
  /home/lngo/hello

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
Hello world from processor localhost.localdomain, rank 3 out of 8 processes
Hello world from processor localhost.localdomain, rank 6 out of 8 processes
Hello world from processor localhost.localdomain, rank 7 out of 8 processes
Hello world from processor localhost.localdomain, rank 0 out of 8 processes
Hello world from processor localhost.localdomain, rank 1 out of 8 processes
Hello world from processor localhost.localdomain, rank 4 out of 8 processes
Hello world from processor localhost.localdomain, rank 5 out of 8 processes
Hello world from processor localhost.localdomain, rank 2 out of 8 processes

Ranks are used to enforce execution/exclusion of code segments within the original source code



In [17]:

    
%%writefile codes/openmpi/evenodd.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int my_rank; 
    
  /* Initialize the MPI environment */
  MPI_Init(&argc, &argv);

  /* Get the rank of the process */
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  if (my_rank % 2 == 1) {
    printf ("Process %d is even \n", my_rank);
  } else {
    printf ("Process %d is odd \n", my_rank);
  }
  MPI_Finalize();
}









    



Overwriting codes/openmpi/evenodd.c



In [18]:

    
!mpicc codes/openmpi/evenodd.c -o ~/evenodd
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/evenodd









    



Process 2 is odd 
Process 3 is even 
Process 0 is odd 
Process 1 is even

Ranks and size are used means to calculate and distributed workload (data) among the processes



In [24]:

    
%%writefile codes/openmpi/rank_size.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  int A[16] = {2,13,4,3,5,1,0,12,10,8,7,9,11,6,15,14};
  int i;
    
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  for (i = 0; i < 16; i++){
    if (i % size == my_rank){
      printf ("Process %d has elements %d at index %d \n",
               my_rank, A[i], i);
    }
  }

  /* Finalize the MPI environment. */
  MPI_Finalize();
}









    



Overwriting codes/openmpi/rank_size.c



In [28]:

    
!mpicc codes/openmpi/rank_size.c -o ~/rank_size
!mpirun -np 4 ~/rank_size









    



Process 2 has elements 4 at index 2 
Process 2 has elements 0 at index 6 
Process 2 has elements 7 at index 10 
Process 2 has elements 15 at index 14 
Process 3 has elements 3 at index 3 
Process 3 has elements 12 at index 7 
Process 3 has elements 9 at index 11 
Process 3 has elements 14 at index 15 
Process 0 has elements 2 at index 0 
Process 0 has elements 5 at index 4 
Process 0 has elements 10 at index 8 
Process 0 has elements 11 at index 12 
Process 1 has elements 13 at index 1 
Process 1 has elements 1 at index 5 
Process 1 has elements 8 at index 9 
Process 1 has elements 6 at index 13

Individual processes rely on communication (message passing) to enforce workflow
- Point-to-point Communication
- Collective Communication

Point-to-Point: Send and Receive

Original MPI C Syntax: MPI_Send

int MPI_Send(void *buf, 
    int count, 
    MPI_Datatype datatype, 
    int dest, 
    int tag, 
    MPI_Comm comm)

MPI_Datatype may be MPI_BYTE, MPI_PACKED, MPI_CHAR, MPI_SHORT, MPI_INT, MPI_LONG, MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE, MPI_UNSIGNED_CHAR
dest is the rank of the process the message is sent to
tag is an integer identify the message. Programmer is responsible for managing tag

Original MPI C Syntax: MPI_Recv

int MPI_Recv(
    void *buf, 
    int count, 
    MPI_Datatype datatype, 
    int source, 
    int tag, 
    MPI_Comm comm,
    MPI_Status *status)

MPI_Datatype may be MPI_BYTE, MPI_PACKED, MPI_CHAR, MPI_SHORT, MPI_INT, MPI_LONG, MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE, MPI_UNSIGNED_CHAR
source is the rank of the process from which the message was sent.
tag is an integer identify the message. MPI_Recv will only place data in the buffer if the tag from MPI_Send matches. The constant MPI_ANY_TAG may be used when the source tag is unknown or not important.



In [ ]:

    
We want to write an MPI program that exchange the ranks of two processes, 0 and 1.



In [29]:

    
%%writefile codes/openmpi/send_recv.c
#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) 
{
  int my_rank;       
  int size;             
  int tag=0;
  int buf,i;
  int des1,des2;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  /*  set up data */
  buf = my_rank; 

  printf("Process %2d has original value %2d \n",my_rank,buf);
    
  if (my_rank == 0){
    MPI_Send(&buf,1,MPI_INT,1,tag,MPI_COMM_WORLD);
    MPI_Recv(&buf,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
  }
  
  if (my_rank == 1){
    MPI_Recv(&buf,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
    MPI_Send(&buf,1,MPI_INT,0,tag,MPI_COMM_WORLD);  
  }    
  printf("Process %2d now has value %2d\n",my_rank,buf);

  MPI_Finalize();
} /* end main */









    



Writing codes/openmpi/send_recv.c



In [30]:

    
!mpicc codes/openmpi/send_recv.c -o ~/send_recv
!mpirun -np 2 ~/send_recv









    



Process  0 has original value  0 
Process  0 now has value  0
Process  1 has original value  1 
Process  1 now has value  0

What went wrong?



In [31]:

    
%%writefile ~/send_recv_fixed.c
#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) 
{
  int my_rank;       
  int size;             
  int tag=0;
  int buf,i;
  int des1,des2;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  /*  set up data */
  buf = my_rank; 

  printf("Process %2d has original value %2d \n",my_rank,buf);
    
  if (my_rank == 0){
    MPI_Send(&buf,1,MPI_INT,1,tag,MPI_COMM_WORLD);
    MPI_Recv(&buf,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
  }
  
  if (my_rank == 1){
    MPI_Recv(&buf,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
    MPI_Send(&buf,1,MPI_INT,0,tag,MPI_COMM_WORLD);  
  }    
  printf("Process %2d now has value %2d\n",my_rank,buf);

  MPI_Finalize();
} /* end main */









    



Writing /home/lngo/send_recv_fixed.c



In [32]:

    
!mpicc ~/send_recv_fixed.c -o ~/send_recv_fixed
!mpirun -np 2 ~/send_recv_fixed









    



Process  0 has original value  0 
Process  0 now has value  0
Process  1 has original value  1 
Process  1 now has value  0

How do we do point-to-point communication at scale?

Rely on rank and size



In [32]:

    
%%writefile codes/openmpi/multi_send_recv.c
#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) 
{
  int my_rank;       
  int size;             
  int tag=0;
  int buf,i;
  int des1,des2;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  /*  set up data */
  buf = my_rank; 

  //printf("Process %2d has original value %2d \n",my_rank,buf);
    
  /* set up source and destination */
  des1 = (my_rank + 1) % size;
  des2 = (my_rank + size - 1) % size;
  //printf("Process %2d has des1 %2d and des2 %2d\n",my_rank,des1,des2);
    
  /* shift the data n/2 steps */
  for (i = 0; i < size/2; i++){
    MPI_Send(&buf,1,MPI_INT,des1,tag,MPI_COMM_WORLD);
    MPI_Recv(&buf,1,MPI_INT,MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,&status);
    MPI_Barrier(MPI_COMM_WORLD);
  }

  MPI_Send(&buf,1,MPI_INT,des2,tag,MPI_COMM_WORLD);
  MPI_Recv(&buf,1,MPI_INT,MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,&status);
  
  MPI_Barrier(MPI_COMM_WORLD);
  printf("Process %2d now has value %2d\n",my_rank,buf);

  /* Shut down MPI */
  MPI_Finalize();

} /* end main */









    



Overwriting codes/openmpi/multi_send_recv.c



In [39]:

    
!mpicc codes/openmpi/multi_send_recv.c -o ~/multi_send_recv
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/multi_send_recv









    



Process  1 now has value  0
Process  2 now has value  1
Process  3 now has value  2
Process  0 now has value  3

Blocking risks

Send data larger than available network buffer (Blocking send)
Lost data (or missing sender) leading to receiver hanging indefinitely (Blocking receive)



In [42]:

    
%%writefile codes/openmpi/deadlock_send_recv.c
#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) 
{
  int my_rank;       /* rank of process     */
  int size;             /* number of processes */
  int source;        /* rank of sender      */
  int dest;          /* rank of receiver    */

  int tag=0;         /* tag for messages    */
  char message[100]; /* storage for message */
  MPI_Status status; /* return status for receive */

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  fprintf(stderr,"I am here!  ID = %d\n", my_rank);
  sprintf(message, "Greetings from process %d!", my_rank);

  if (my_rank == 0) {
    dest = 1;
    MPI_Recv(message, 100, MPI_CHAR, dest, tag, MPI_COMM_WORLD, &status);
    MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
    printf("Process 0 printing:  %s\n", message);
  }
  else { 
    /* my rank == 1 */
    dest = 0;
    MPI_Recv(message, 100, MPI_CHAR, dest, tag, MPI_COMM_WORLD, &status);
    MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
    printf("Process 1 printing:  %s\n", message);
  }
  
  MPI_Finalize();
} /* end main */









    



Overwriting codes/openmpi/deadlock_send_recv.c



In [44]:

    
!mpicc codes/openmpi/deadlock_send_recv.c -o ~/deadlock_send_recv
!mpirun -np 2 ~/deadlock_send_recv

# The [*] is indicative of a running notebook shell, and if it does not turn into a number, 
# it means the cell is hanged (deadlocked by MPI).
# To escape a hanged cell, click the Square (Stop) button in the tool bar









    



I am here!  ID = 0
I am here!  ID = 1
^C

To correct the above error, we need to change the order of the MPI_Recv and MPI_Send in the one of the communication code block

Collective Communication

Must involve ALL processes within the scope of a communicator
Unexpected behavior, including programming failure, if even one process does not participate
Types of collective communications:
- Synchronization: barrier
- Data movement: broadcast, scatter/gather
- Collective computation (aggregate data to perform computation): Reduce

_{*https://computing.llnl.gov/tutorials/mpi/*}

int MPI_Bcast(
    void *buf, 
    int count, 
    MPI_Datatype datatype, 
    int root, 
    MPI_Comm comm);

Don’t need to specify a TAG or DESTINATION
Must specify the SENDER (root)
Blocking call for all processes



In [19]:

    
%%writefile codes/openmpi/bcast.c
#include <stdio.h>
#include <mpi.h>

int main(int argc, char* argv[]) 
{
  int my_rank;       
  int size;
  int value;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); 
  
  value = my_rank;
  printf("process %d: Before MPI_Bcast, value is %d\n", my_rank, value); 

  MPI_Bcast(&value, 1, MPI_INT, 0, MPI_COMM_WORLD);
  printf("process %d: After MPI_Bcast, value is %d\n", my_rank, value);

  MPI_Finalize();
  return 0;
}









    



Overwriting codes/openmpi/bcast.c



In [20]:

    
!mpicc codes/openmpi/bcast.c -o ~/bcast
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/bcast









    



process 0: Before MPI_Bcast, value is 0
process 0: After MPI_Bcast, value is 0
process 3: Before MPI_Bcast, value is 3
process 2: Before MPI_Bcast, value is 2
process 2: After MPI_Bcast, value is 0
process 1: Before MPI_Bcast, value is 1
process 1: After MPI_Bcast, value is 0
process 3: After MPI_Bcast, value is 0

Original MPI C Syntax: MPI_Scatter

int MPI_Scatter(
    void *sendbuf, 
    int sendcount, 
    MPI_Datatype sendtype, 
    void *recvbuf,
    int recvcnt,
    MPI_Datatype recvtype,
    int root, 
    MPI_Comm comm);



In [25]:

    
%%writefile codes/openmpi/scatter.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  int sendbuf[16] = {2,13,4,3,5,1,0,12,10,8,7,9,11,6,15,14};
  int recvbuf[5];
  int i;
    
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  MPI_Scatter(&sendbuf, 5, MPI_INT, &recvbuf, 5, MPI_INT, 0, MPI_COMM_WORLD); 
  for (i = 0; i < 5; i++){
    printf ("Process %d has element %d at index %d in its recvbuf \n",
               my_rank, recvbuf[i], i);
  }

  /* Finalize the MPI environment. */
  MPI_Finalize();
}









    



Overwriting codes/openmpi/scatter.c



In [26]:

    
!mpicc codes/openmpi/scatter.c -o ~/scatter
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/scatter









    



Process 0 has element 2 at index 0 in its recvbuf 
Process 0 has element 13 at index 1 in its recvbuf 
Process 0 has element 4 at index 2 in its recvbuf 
Process 0 has element 3 at index 3 in its recvbuf 
Process 0 has element 5 at index 4 in its recvbuf 
Process 1 has element 1 at index 0 in its recvbuf 
Process 1 has element 0 at index 1 in its recvbuf 
Process 1 has element 12 at index 2 in its recvbuf 
Process 1 has element 10 at index 3 in its recvbuf 
Process 1 has element 8 at index 4 in its recvbuf 
Process 2 has element 7 at index 0 in its recvbuf 
Process 2 has element 9 at index 1 in its recvbuf 
Process 2 has element 11 at index 2 in its recvbuf 
Process 2 has element 6 at index 3 in its recvbuf 
Process 2 has element 15 at index 4 in its recvbuf 
Process 3 has element 14 at index 0 in its recvbuf 
Process 3 has element 916566832 at index 1 in its recvbuf 
Process 3 has element 0 at index 2 in its recvbuf 
Process 3 has element 4 at index 3 in its recvbuf 
Process 3 has element 0 at index 4 in its recvbuf

Original MPI C Syntax: MPI_Gather

int MPI_Gather(
    void *sendbuff, 
    int sendcount, 
    MPI_Datatype sendtype, 
    void *recvbuff,
    int recvcnt,
    MPI_Datatype recvtype,
    int root, 
    MPI_Comm comm);



In [27]:

    
%%writefile codes/openmpi/gather.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  int sendbuf[2];
  int recvbuf[8] = {-1,-1,-1,-1,-1,-1,-1,-1};
  int i;
    
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  for (i = 0; i < 2; i++){
    sendbuf[i] = my_rank;
  }
  MPI_Gather(&sendbuf, 2, MPI_INT, &recvbuf, 2, MPI_INT, 0, MPI_COMM_WORLD); 
  for (i = 0; i < 8; i++){
    printf ("Process %d has element %d at index %d in its recvbuf \n",
               my_rank, recvbuf[i], i);
  }

  /* Finalize the MPI environment. */
  MPI_Finalize();
}









    



Overwriting codes/openmpi/gather.c



In [28]:

    
!mpicc codes/openmpi/gather.c -o ~/gather
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/gather









    



Process 1 has element -1 at index 0 in its recvbuf 
Process 1 has element -1 at index 1 in its recvbuf 
Process 1 has element -1 at index 2 in its recvbuf 
Process 1 has element -1 at index 3 in its recvbuf 
Process 1 has element -1 at index 4 in its recvbuf 
Process 1 has element -1 at index 5 in its recvbuf 
Process 1 has element -1 at index 6 in its recvbuf 
Process 1 has element -1 at index 7 in its recvbuf 
Process 2 has element -1 at index 0 in its recvbuf 
Process 2 has element -1 at index 1 in its recvbuf 
Process 2 has element -1 at index 2 in its recvbuf 
Process 2 has element -1 at index 3 in its recvbuf 
Process 2 has element -1 at index 4 in its recvbuf 
Process 2 has element -1 at index 5 in its recvbuf 
Process 2 has element -1 at index 6 in its recvbuf 
Process 2 has element -1 at index 7 in its recvbuf 
Process 3 has element -1 at index 0 in its recvbuf 
Process 3 has element -1 at index 1 in its recvbuf 
Process 3 has element -1 at index 2 in its recvbuf 
Process 3 has element -1 at index 3 in its recvbuf 
Process 3 has element -1 at index 4 in its recvbuf 
Process 3 has element -1 at index 5 in its recvbuf 
Process 3 has element -1 at index 6 in its recvbuf 
Process 3 has element -1 at index 7 in its recvbuf 
Process 0 has element 0 at index 0 in its recvbuf 
Process 0 has element 0 at index 1 in its recvbuf 
Process 0 has element 1 at index 2 in its recvbuf 
Process 0 has element 1 at index 3 in its recvbuf 
Process 0 has element 2 at index 4 in its recvbuf 
Process 0 has element 2 at index 5 in its recvbuf 
Process 0 has element 3 at index 6 in its recvbuf 
Process 0 has element 3 at index 7 in its recvbuf

Original MPI C Syntax: MPI_Reduce

int MPI_Reduce(
    void *sendbuf, 
    void *recvbuff,
    int count, 
    MPI_Datatype datatype, 
    MPI_OP op,
    int root, 
    MPI_Comm comm);

MPI_Op may be MPI_MIN, MPI_MAX, MPI_SUM, MPI_PROD (twelve total)
Programmer may add operations, must be commutative and associative
If count > 1, then operation is performed element-wise



In [30]:

    
%%writefile codes/openmpi/reduce.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  int rank_sum;
  int i;
    
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  
  rank_sum = my_rank;

  MPI_Reduce(&my_rank, &rank_sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); 
  printf ("The total sum of all ranks at process %d is %d \n", my_rank, rank_sum);

  /* Finalize the MPI environment. */
  MPI_Finalize();
}









    



Overwriting codes/openmpi/reduce.c



In [31]:

    
!mpicc codes/openmpi/reduce.c -o ~/reduce
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/reduce









    



The total sum of all ranks at process 2 is 2 
The total sum of all ranks at process 3 is 3 
The total sum of all ranks at process 1 is 1 
The total sum of all ranks at process 0 is 6



In [67]:

    
!mpicc codes/openmpi/reduce.c -o ~/reduce
!mpirun -np 8 --map-by core:OVERSUBSCRIBE ~/reduce









    



The total sum of all ranks at process 3 is 3 
The total sum of all ranks at process 5 is 5 
The total sum of all ranks at process 1 is 1 
The total sum of all ranks at process 2 is 2 
The total sum of all ranks at process 6 is 6 
The total sum of all ranks at process 7 is 7 
The total sum of all ranks at process 0 is 28 
The total sum of all ranks at process 4 is 4