Introduction to Message Passing Interface (MPI)

Linh B. Ngo

Message Passing

  • Processes communicate via messages
  • Messages can be
    • Raw data to be used in actual calculations
    • Signals and acknowledgements for the receiving processes regarding the workflow

History of MPI

Early 80s:

  • Various message passing environments were developed
  • Many similar fundamental concepts
  • N-cube (Caltech), P4 (Argonne), PICL and PVM (Oakridge), LAM (Ohio SC)

1992:

  • More than 80 reseachers from different institutions in US and Europe agreed to develop and implement a common standard for message passing
  • First meeting colocated with Supercomputing 1992

After finalization:

  • MPI becomes the de-factor standard for distributed memory parallel programming
  • Available on every popular operating system and architecture
  • Interconnect manufacturers commonly provide MPI implementations optimized for their hardware
  • MPI standard defines interfaces for C, C++, and Fortran
    • Language bindings available for many popular languages (quality varies)

1994: MPI-1

  • Communicators
    • Information about the runtime environments
    • Creation of customized topologies
  • Point-to-point communication
    • Send and receive messages
    • Blocking and non-blocking variations
  • Collectives
    • Broadcast and reduce
    • Gather and scatter

1998: MPI-2

  • One-sided communication (non-blocking)
    • Get & Put (remote memory access)
  • Dynamic process management
    • Spawn
  • Parallel I/O
    • Multiple readers and writers for a single file
    • Requires file-system level support (LustreFS, PVFS)

2012: MPI-3

  • Revised remote-memory access semantic
  • Fault tolerance model
  • Non-blocking collective communication
  • Access to internal variables, states, and counters for performance evaluation purposes

Set up MPI on Palmetto for C/C++

Open a terminal and run the following commands. For ssh-keygen, hit Enter for everything and don't put in a password when asked.

$ cd ~
$ sudo yum -y group install "Development Tools"
$ wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.2.tar.gz
$ tar xzf openmpi-3.1.2.tar.gz
$ cd openmpi-3.1.2
$ ./configure --prefix=/opt/openmpi/3.1.2
$ sudo make
$ sudo make all install
$ echo "export PATH='$PATH:/opt/openmpi/3.1.2/bin'" >> ~/.bashrc
$ echo "export LD_LIBRARY_PATH='$LD_LIBRARY_PATH:/opt/openmpi/3.1.2/lib/'" >> ~/.bashrc
$ source ~/.bashrc
$ cd ~
$ ssh-keygen
$ ssh-copy-id localhost

After the above steps are completed successfully, you will need to return to the VM, run the command source ~/.bashrc and restart the Jupyter notebook server.

The next cells can be run from inside the Jupyter notebook.


In [1]:
%%writefile codes/openmpi/first.c
#include <stdio.h>
#include <sys/utsname.h>
#include <mpi.h>
int main(int argc, char *argv[]){
  MPI_Init(&argc, &argv);
  struct utsname uts;
  uname (&uts);
  printf("My process is on node %s.\n", uts.nodename);
  MPI_Finalize();
  return 0;
}


Overwriting codes/openmpi/first.c

In [1]:
!mpicc codes/openmpi/first.c -o ~/first
!mpirun -np 2 ~/first


My process is on node localhost.localdomain.
My process is on node localhost.localdomain.

The working of MPI in a nutshell

  • All processes are launched at the beginning of the program execution
    • The number of processes are user-speficied
    • Typically, this number is matched to the total number of cores available across the entire cluster
  • All processes have their own memory space and have access to the same source codes

Basic parameters available to individual processes:

MPI.COMM_WORLD
MPI_Comm_rank()
MPI_Comm_size()
MPI_Get_processor_name()
  • MPI defines communicator groups for point-to-point and collective communications
    • Unique IDs (rank) are defined for individual processes within a communicator group
    • Communications are performed based on these IDs
    • Default global communication (MPI_COMM_WORLD) contains all processes
    • For $N$ processes, ranks go from $0$ to $N-1$

In [11]:
%%writefile codes/openmpi/hello.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  char proc_name[MPI_MAX_PROCESSOR_NAME];
  int proc_name_len;
    
  /* Initialize the MPI environment */
  MPI_Init(&argc, &argv);
  
  /* Get the number of processes */
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  /* Get the rank of the process */
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  /* Get the name of the processor */
  MPI_Get_processor_name(proc_name, &proc_name_len);

  /* Print off a hello world message */
  printf("Hello world from processor %s, rank %d out of %d processes\n",
           proc_name, my_rank, size);

  /* Finalize the MPI environment. */
  MPI_Finalize();
}


Overwriting codes/openmpi/hello.c

In [14]:
!mpicc codes/openmpi/hello.c -o ~/hello
!mpirun -np 2 ~/hello


Hello world from processor localhost.localdomain, rank 0 out of 2 processes
Hello world from processor localhost.localdomain, rank 1 out of 2 processes

Important

On many VM, you might not have enough physical cores located to the VM in VirtualBox. To enable the simulation of multiple processes, you need to add --map-by core:OVERSUBSCRIBE to your mpirun commands


In [68]:
!mpirun -np 8 ~/hello
!mpirun -np 8 --map-by core:OVERSUBSCRIBE ~/hello


--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 8 slots
that were requested by the application:
  /home/lngo/hello

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
Hello world from processor localhost.localdomain, rank 3 out of 8 processes
Hello world from processor localhost.localdomain, rank 6 out of 8 processes
Hello world from processor localhost.localdomain, rank 7 out of 8 processes
Hello world from processor localhost.localdomain, rank 0 out of 8 processes
Hello world from processor localhost.localdomain, rank 1 out of 8 processes
Hello world from processor localhost.localdomain, rank 4 out of 8 processes
Hello world from processor localhost.localdomain, rank 5 out of 8 processes
Hello world from processor localhost.localdomain, rank 2 out of 8 processes
  • Ranks are used to enforce execution/exclusion of code segments within the original source code

In [17]:
%%writefile codes/openmpi/evenodd.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int my_rank; 
    
  /* Initialize the MPI environment */
  MPI_Init(&argc, &argv);

  /* Get the rank of the process */
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  if (my_rank % 2 == 1) {
    printf ("Process %d is even \n", my_rank);
  } else {
    printf ("Process %d is odd \n", my_rank);
  }
  MPI_Finalize();
}


Overwriting codes/openmpi/evenodd.c

In [18]:
!mpicc codes/openmpi/evenodd.c -o ~/evenodd
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/evenodd


Process 2 is odd 
Process 3 is even 
Process 0 is odd 
Process 1 is even 
  • Ranks and size are used means to calculate and distributed workload (data) among the processes

In [24]:
%%writefile codes/openmpi/rank_size.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  int A[16] = {2,13,4,3,5,1,0,12,10,8,7,9,11,6,15,14};
  int i;
    
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  for (i = 0; i < 16; i++){
    if (i % size == my_rank){
      printf ("Process %d has elements %d at index %d \n",
               my_rank, A[i], i);
    }
  }

  /* Finalize the MPI environment. */
  MPI_Finalize();
}


Overwriting codes/openmpi/rank_size.c

In [28]:
!mpicc codes/openmpi/rank_size.c -o ~/rank_size
!mpirun -np 4 ~/rank_size


Process 2 has elements 4 at index 2 
Process 2 has elements 0 at index 6 
Process 2 has elements 7 at index 10 
Process 2 has elements 15 at index 14 
Process 3 has elements 3 at index 3 
Process 3 has elements 12 at index 7 
Process 3 has elements 9 at index 11 
Process 3 has elements 14 at index 15 
Process 0 has elements 2 at index 0 
Process 0 has elements 5 at index 4 
Process 0 has elements 10 at index 8 
Process 0 has elements 11 at index 12 
Process 1 has elements 13 at index 1 
Process 1 has elements 1 at index 5 
Process 1 has elements 8 at index 9 
Process 1 has elements 6 at index 13 
  • Individual processes rely on communication (message passing) to enforce workflow
    • Point-to-point Communication
    • Collective Communication

Point-to-Point: Send and Receive

Original MPI C Syntax: MPI_Send

int MPI_Send(void *buf, 
    int count, 
    MPI_Datatype datatype, 
    int dest, 
    int tag, 
    MPI_Comm comm)
  • MPI_Datatype may be MPI_BYTE, MPI_PACKED, MPI_CHAR, MPI_SHORT, MPI_INT, MPI_LONG, MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE, MPI_UNSIGNED_CHAR
  • dest is the rank of the process the message is sent to
  • tag is an integer identify the message. Programmer is responsible for managing tag

Original MPI C Syntax: MPI_Recv

int MPI_Recv(
    void *buf, 
    int count, 
    MPI_Datatype datatype, 
    int source, 
    int tag, 
    MPI_Comm comm,
    MPI_Status *status)
  • MPI_Datatype may be MPI_BYTE, MPI_PACKED, MPI_CHAR, MPI_SHORT, MPI_INT, MPI_LONG, MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE, MPI_UNSIGNED_CHAR
  • source is the rank of the process from which the message was sent.
  • tag is an integer identify the message. MPI_Recv will only place data in the buffer if the tag from MPI_Send matches. The constant MPI_ANY_TAG may be used when the source tag is unknown or not important.

In [ ]:
We want to write an MPI program that exchange the ranks of two processes, 0 and 1.

In [29]:
%%writefile codes/openmpi/send_recv.c
#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) 
{
  int my_rank;       
  int size;             
  int tag=0;
  int buf,i;
  int des1,des2;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  /*  set up data */
  buf = my_rank; 

  printf("Process %2d has original value %2d \n",my_rank,buf);
    
  if (my_rank == 0){
    MPI_Send(&buf,1,MPI_INT,1,tag,MPI_COMM_WORLD);
    MPI_Recv(&buf,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
  }
  
  if (my_rank == 1){
    MPI_Recv(&buf,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
    MPI_Send(&buf,1,MPI_INT,0,tag,MPI_COMM_WORLD);  
  }    
  printf("Process %2d now has value %2d\n",my_rank,buf);

  MPI_Finalize();
} /* end main */


Writing codes/openmpi/send_recv.c

In [30]:
!mpicc codes/openmpi/send_recv.c -o ~/send_recv
!mpirun -np 2 ~/send_recv


Process  0 has original value  0 
Process  0 now has value  0
Process  1 has original value  1 
Process  1 now has value  0
  • What went wrong?

In [31]:
%%writefile ~/send_recv_fixed.c
#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) 
{
  int my_rank;       
  int size;             
  int tag=0;
  int buf,i;
  int des1,des2;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  /*  set up data */
  buf = my_rank; 

  printf("Process %2d has original value %2d \n",my_rank,buf);
    
  if (my_rank == 0){
    MPI_Send(&buf,1,MPI_INT,1,tag,MPI_COMM_WORLD);
    MPI_Recv(&buf,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
  }
  
  if (my_rank == 1){
    MPI_Recv(&buf,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
    MPI_Send(&buf,1,MPI_INT,0,tag,MPI_COMM_WORLD);  
  }    
  printf("Process %2d now has value %2d\n",my_rank,buf);

  MPI_Finalize();
} /* end main */


Writing /home/lngo/send_recv_fixed.c

In [32]:
!mpicc ~/send_recv_fixed.c -o ~/send_recv_fixed
!mpirun -np 2 ~/send_recv_fixed


Process  0 has original value  0 
Process  0 now has value  0
Process  1 has original value  1 
Process  1 now has value  0

How do we do point-to-point communication at scale?

  • Rely on rank and size

In [32]:
%%writefile codes/openmpi/multi_send_recv.c
#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) 
{
  int my_rank;       
  int size;             
  int tag=0;
  int buf,i;
  int des1,des2;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  /*  set up data */
  buf = my_rank; 

  //printf("Process %2d has original value %2d \n",my_rank,buf);
    
  /* set up source and destination */
  des1 = (my_rank + 1) % size;
  des2 = (my_rank + size - 1) % size;
  //printf("Process %2d has des1 %2d and des2 %2d\n",my_rank,des1,des2);
    
  /* shift the data n/2 steps */
  for (i = 0; i < size/2; i++){
    MPI_Send(&buf,1,MPI_INT,des1,tag,MPI_COMM_WORLD);
    MPI_Recv(&buf,1,MPI_INT,MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,&status);
    MPI_Barrier(MPI_COMM_WORLD);
  }

  MPI_Send(&buf,1,MPI_INT,des2,tag,MPI_COMM_WORLD);
  MPI_Recv(&buf,1,MPI_INT,MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,&status);
  
  MPI_Barrier(MPI_COMM_WORLD);
  printf("Process %2d now has value %2d\n",my_rank,buf);

  /* Shut down MPI */
  MPI_Finalize();

} /* end main */


Overwriting codes/openmpi/multi_send_recv.c

In [39]:
!mpicc codes/openmpi/multi_send_recv.c -o ~/multi_send_recv
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/multi_send_recv


Process  1 now has value  0
Process  2 now has value  1
Process  3 now has value  2
Process  0 now has value  3

Blocking risks

  • Send data larger than available network buffer (Blocking send)
  • Lost data (or missing sender) leading to receiver hanging indefinitely (Blocking receive)

In [42]:
%%writefile codes/openmpi/deadlock_send_recv.c
#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) 
{
  int my_rank;       /* rank of process     */
  int size;             /* number of processes */
  int source;        /* rank of sender      */
  int dest;          /* rank of receiver    */

  int tag=0;         /* tag for messages    */
  char message[100]; /* storage for message */
  MPI_Status status; /* return status for receive */

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  fprintf(stderr,"I am here!  ID = %d\n", my_rank);
  sprintf(message, "Greetings from process %d!", my_rank);

  if (my_rank == 0) {
    dest = 1;
    MPI_Recv(message, 100, MPI_CHAR, dest, tag, MPI_COMM_WORLD, &status);
    MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
    printf("Process 0 printing:  %s\n", message);
  }
  else { 
    /* my rank == 1 */
    dest = 0;
    MPI_Recv(message, 100, MPI_CHAR, dest, tag, MPI_COMM_WORLD, &status);
    MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
    printf("Process 1 printing:  %s\n", message);
  }
  
  MPI_Finalize();
} /* end main */


Overwriting codes/openmpi/deadlock_send_recv.c

In [44]:
!mpicc codes/openmpi/deadlock_send_recv.c -o ~/deadlock_send_recv
!mpirun -np 2 ~/deadlock_send_recv

# The [*] is indicative of a running notebook shell, and if it does not turn into a number, 
# it means the cell is hanged (deadlocked by MPI).
# To escape a hanged cell, click the Square (Stop) button in the tool bar


I am here!  ID = 0
I am here!  ID = 1
^C

To correct the above error, we need to change the order of the MPI_Recv and MPI_Send in the one of the communication code block

Collective Communication

  • Must involve ALL processes within the scope of a communicator
  • Unexpected behavior, including programming failure, if even one process does not participate
  • Types of collective communications:
    • Synchronization: barrier
    • Data movement: broadcast, scatter/gather
    • Collective computation (aggregate data to perform computation): Reduce
*https://computing.llnl.gov/tutorials/mpi/*
int MPI_Bcast(
    void *buf, 
    int count, 
    MPI_Datatype datatype, 
    int root, 
    MPI_Comm comm);
  • Don’t need to specify a TAG or DESTINATION
  • Must specify the SENDER (root)
  • Blocking call for all processes

In [19]:
%%writefile codes/openmpi/bcast.c
#include <stdio.h>
#include <mpi.h>

int main(int argc, char* argv[]) 
{
  int my_rank;       
  int size;
  int value;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); 
  
  value = my_rank;
  printf("process %d: Before MPI_Bcast, value is %d\n", my_rank, value); 

  MPI_Bcast(&value, 1, MPI_INT, 0, MPI_COMM_WORLD);
  printf("process %d: After MPI_Bcast, value is %d\n", my_rank, value);

  MPI_Finalize();
  return 0;
}


Overwriting codes/openmpi/bcast.c

In [20]:
!mpicc codes/openmpi/bcast.c -o ~/bcast
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/bcast


process 0: Before MPI_Bcast, value is 0
process 0: After MPI_Bcast, value is 0
process 3: Before MPI_Bcast, value is 3
process 2: Before MPI_Bcast, value is 2
process 2: After MPI_Bcast, value is 0
process 1: Before MPI_Bcast, value is 1
process 1: After MPI_Bcast, value is 0
process 3: After MPI_Bcast, value is 0

Original MPI C Syntax: MPI_Scatter

int MPI_Scatter(
    void *sendbuf, 
    int sendcount, 
    MPI_Datatype sendtype, 
    void *recvbuf,
    int recvcnt,
    MPI_Datatype recvtype,
    int root, 
    MPI_Comm comm);

In [25]:
%%writefile codes/openmpi/scatter.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  int sendbuf[16] = {2,13,4,3,5,1,0,12,10,8,7,9,11,6,15,14};
  int recvbuf[5];
  int i;
    
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  MPI_Scatter(&sendbuf, 5, MPI_INT, &recvbuf, 5, MPI_INT, 0, MPI_COMM_WORLD); 
  for (i = 0; i < 5; i++){
    printf ("Process %d has element %d at index %d in its recvbuf \n",
               my_rank, recvbuf[i], i);
  }

  /* Finalize the MPI environment. */
  MPI_Finalize();
}


Overwriting codes/openmpi/scatter.c

In [26]:
!mpicc codes/openmpi/scatter.c -o ~/scatter
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/scatter


Process 0 has element 2 at index 0 in its recvbuf 
Process 0 has element 13 at index 1 in its recvbuf 
Process 0 has element 4 at index 2 in its recvbuf 
Process 0 has element 3 at index 3 in its recvbuf 
Process 0 has element 5 at index 4 in its recvbuf 
Process 1 has element 1 at index 0 in its recvbuf 
Process 1 has element 0 at index 1 in its recvbuf 
Process 1 has element 12 at index 2 in its recvbuf 
Process 1 has element 10 at index 3 in its recvbuf 
Process 1 has element 8 at index 4 in its recvbuf 
Process 2 has element 7 at index 0 in its recvbuf 
Process 2 has element 9 at index 1 in its recvbuf 
Process 2 has element 11 at index 2 in its recvbuf 
Process 2 has element 6 at index 3 in its recvbuf 
Process 2 has element 15 at index 4 in its recvbuf 
Process 3 has element 14 at index 0 in its recvbuf 
Process 3 has element 916566832 at index 1 in its recvbuf 
Process 3 has element 0 at index 2 in its recvbuf 
Process 3 has element 4 at index 3 in its recvbuf 
Process 3 has element 0 at index 4 in its recvbuf 

Original MPI C Syntax: MPI_Gather

int MPI_Gather(
    void *sendbuff, 
    int sendcount, 
    MPI_Datatype sendtype, 
    void *recvbuff,
    int recvcnt,
    MPI_Datatype recvtype,
    int root, 
    MPI_Comm comm);

In [27]:
%%writefile codes/openmpi/gather.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  int sendbuf[2];
  int recvbuf[8] = {-1,-1,-1,-1,-1,-1,-1,-1};
  int i;
    
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  for (i = 0; i < 2; i++){
    sendbuf[i] = my_rank;
  }
  MPI_Gather(&sendbuf, 2, MPI_INT, &recvbuf, 2, MPI_INT, 0, MPI_COMM_WORLD); 
  for (i = 0; i < 8; i++){
    printf ("Process %d has element %d at index %d in its recvbuf \n",
               my_rank, recvbuf[i], i);
  }

  /* Finalize the MPI environment. */
  MPI_Finalize();
}


Overwriting codes/openmpi/gather.c

In [28]:
!mpicc codes/openmpi/gather.c -o ~/gather
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/gather


Process 1 has element -1 at index 0 in its recvbuf 
Process 1 has element -1 at index 1 in its recvbuf 
Process 1 has element -1 at index 2 in its recvbuf 
Process 1 has element -1 at index 3 in its recvbuf 
Process 1 has element -1 at index 4 in its recvbuf 
Process 1 has element -1 at index 5 in its recvbuf 
Process 1 has element -1 at index 6 in its recvbuf 
Process 1 has element -1 at index 7 in its recvbuf 
Process 2 has element -1 at index 0 in its recvbuf 
Process 2 has element -1 at index 1 in its recvbuf 
Process 2 has element -1 at index 2 in its recvbuf 
Process 2 has element -1 at index 3 in its recvbuf 
Process 2 has element -1 at index 4 in its recvbuf 
Process 2 has element -1 at index 5 in its recvbuf 
Process 2 has element -1 at index 6 in its recvbuf 
Process 2 has element -1 at index 7 in its recvbuf 
Process 3 has element -1 at index 0 in its recvbuf 
Process 3 has element -1 at index 1 in its recvbuf 
Process 3 has element -1 at index 2 in its recvbuf 
Process 3 has element -1 at index 3 in its recvbuf 
Process 3 has element -1 at index 4 in its recvbuf 
Process 3 has element -1 at index 5 in its recvbuf 
Process 3 has element -1 at index 6 in its recvbuf 
Process 3 has element -1 at index 7 in its recvbuf 
Process 0 has element 0 at index 0 in its recvbuf 
Process 0 has element 0 at index 1 in its recvbuf 
Process 0 has element 1 at index 2 in its recvbuf 
Process 0 has element 1 at index 3 in its recvbuf 
Process 0 has element 2 at index 4 in its recvbuf 
Process 0 has element 2 at index 5 in its recvbuf 
Process 0 has element 3 at index 6 in its recvbuf 
Process 0 has element 3 at index 7 in its recvbuf 

Original MPI C Syntax: MPI_Reduce

int MPI_Reduce(
    void *sendbuf, 
    void *recvbuff,
    int count, 
    MPI_Datatype datatype, 
    MPI_OP op,
    int root, 
    MPI_Comm comm);
  • MPI_Op may be MPI_MIN, MPI_MAX, MPI_SUM, MPI_PROD (twelve total)
  • Programmer may add operations, must be commutative and associative
  • If count > 1, then operation is performed element-wise

In [30]:
%%writefile codes/openmpi/reduce.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  int size;
  int my_rank; 
  int rank_sum;
  int i;
    
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  
  rank_sum = my_rank;

  MPI_Reduce(&my_rank, &rank_sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); 
  printf ("The total sum of all ranks at process %d is %d \n", my_rank, rank_sum);

  /* Finalize the MPI environment. */
  MPI_Finalize();
}


Overwriting codes/openmpi/reduce.c

In [31]:
!mpicc codes/openmpi/reduce.c -o ~/reduce
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/reduce


The total sum of all ranks at process 2 is 2 
The total sum of all ranks at process 3 is 3 
The total sum of all ranks at process 1 is 1 
The total sum of all ranks at process 0 is 6 

In [67]:
!mpicc codes/openmpi/reduce.c -o ~/reduce
!mpirun -np 8 --map-by core:OVERSUBSCRIBE ~/reduce


The total sum of all ranks at process 3 is 3 
The total sum of all ranks at process 5 is 5 
The total sum of all ranks at process 1 is 1 
The total sum of all ranks at process 2 is 2 
The total sum of all ranks at process 6 is 6 
The total sum of all ranks at process 7 is 7 
The total sum of all ranks at process 0 is 28 
The total sum of all ranks at process 4 is 4