Partitioning: Divide and Conquer

Linh B. Ngo

Partitioning

Partitioning simply divides the problem into parts and then compute the parts and combine results

  • The basis of all parallel programming, in one form or another.

  • Pleasantly parallel used partitioning without any interaction between the parts.

  • Most partitioning formulation require the results of the parts to be combined to obtain the desired results.

  • Partitioning can be applied to the program data. This is call data partitioning or domain decomposition.

  • Partitioning can also be applied to the functions of a program. This is called functional decomposition.

Divide and Conquer

  • Characterized by dividing problem into sub-problems of same form as larger problem. Further divisions into still smaller sub-problems, usually done by recursion.

  • Recursive divide and conquer amenable to parallelization because separate processes can be used for divided pairs. Also usually data is naturally localized.

Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.

In [3]:
%%writefile codes/openmpi/divide.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "mpi.h"

int main(int argc, char * argv[] ) {
  int rank;     /* rank of each MPI process */
  int size;     /* total number of MPI processes */
  int i;        /* counter */
  int distance; /* distance between sender and receiver */
    
  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);
    
  /* Am I sender or receiver? */
  /* Who am I sending/receiving to/from */
  distance = 1;
  i = 1;
  while (distance <= size / 2){      
    if (rank < distance) {
      printf ("At time step %d, sender %d sends to %d\n", i, rank, rank + distance);
    }
    if ((rank >= distance) && (rank < distance * 2)){
      printf ("At time step %d, receiver %d receives from %d\n", i, rank, rank - distance);
    }
    printf ("Process %d has distance value %d and time step %d\n", rank, distance, i);
    distance = distance * 2;
    i += 1;
  }
    
  MPI_Finalize();
  return 0;  
}


Overwriting codes/openmpi/divide.c

In [4]:
!mpicc codes/openmpi/divide.c -lm -o ~/divide
!mpirun -np 8 --map-by core:OVERSUBSCRIBE ~/divide


At time step 1, sender 0 sends to 1
Process 0 has distance value 1 and time step 1
At time step 2, sender 0 sends to 2
Process 0 has distance value 2 and time step 2
At time step 3, sender 0 sends to 4
Process 0 has distance value 4 and time step 3
At time step 1, receiver 1 receives from 0
Process 1 has distance value 1 and time step 1
At time step 2, sender 1 sends to 3
Process 1 has distance value 2 and time step 2
At time step 3, sender 1 sends to 5
Process 1 has distance value 4 and time step 3
Process 3 has distance value 1 and time step 1
At time step 2, receiver 3 receives from 1
Process 3 has distance value 2 and time step 2
At time step 3, sender 3 sends to 7
Process 3 has distance value 4 and time step 3
Process 4 has distance value 1 and time step 1
Process 4 has distance value 2 and time step 2
At time step 3, receiver 4 receives from 0
Process 4 has distance value 4 and time step 3
Process 5 has distance value 1 and time step 1
Process 5 has distance value 2 and time step 2
At time step 3, receiver 5 receives from 1
Process 5 has distance value 4 and time step 3
Process 2 has distance value 1 and time step 1
At time step 2, receiver 2 receives from 0
Process 2 has distance value 2 and time step 2
At time step 3, sender 2 sends to 6
Process 2 has distance value 4 and time step 3
Process 6 has distance value 1 and time step 1
Process 6 has distance value 2 and time step 2
At time step 3, receiver 6 receives from 2
Process 6 has distance value 4 and time step 3
Process 7 has distance value 1 and time step 1
Process 7 has distance value 2 and time step 2
At time step 3, receiver 7 receives from 3
Process 7 has distance value 4 and time step 3

In [5]:
%%writefile codes/openmpi/conquer.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "mpi.h"

int main(int argc, char * argv[] ) {
  int rank;     /* rank of each MPI process */
  int size;     /* total number of MPI processes */
  int i;        /* counter */
  int distance; /* distance between sender and receiver */
    
  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);
    
  /* Am I sender or receiver? */
  /* Who am I sending/receiving to/from */
  distance = (int)(size / 2);
    
  i = 1;
  while (distance >= 1){      
    if ((rank >= distance) && (rank < distance * 2)){
      printf ("At time step %d, sender %d sends to %d\n", i, rank, rank - distance);
    }
    if (rank < distance) {
      printf ("At time step %d, receiver %d receives from %d\n", i, rank, rank + distance);
    }
    distance = distance / 2;
    i += 1;
  }
    
  MPI_Finalize();
  return 0;  
}


Overwriting codes/openmpi/conquer.c

In [6]:
!mpicc codes/openmpi/conquer.c -lm -o ~/conquer
!mpirun -np 8 --map-by core:OVERSUBSCRIBE ~/conquer


At time step 1, receiver 0 receives from 4
At time step 2, receiver 0 receives from 2
At time step 3, receiver 0 receives from 1
At time step 1, sender 4 sends to 0
At time step 1, sender 6 sends to 2
At time step 1, receiver 3 receives from 7
At time step 2, sender 3 sends to 1
At time step 1, receiver 2 receives from 6
At time step 2, sender 2 sends to 0
At time step 1, receiver 1 receives from 5
At time step 2, receiver 1 receives from 3
At time step 3, sender 1 sends to 0
At time step 1, sender 7 sends to 3
At time step 1, sender 5 sends to 1

Many sorting algorithms can be parallelized by partitioning using divide and conquer

Bucket Sort

Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.

Simple approach to parallel bucket sort

Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.
  • Broadcast data
  • Sort only those elements that fit in local interval bucket (determined by rank)
  • Gather sorted bucket

In [ ]:
int MPI_Scatter(
    void *sendbuf, 
    int sendcount, 
    MPI_Datatype sendtype, 
    void *recvbuf,
    int recvcnt,
    MPI_Datatype recvtype,
    int root, 
    MPI_Comm comm);

In [ ]:
int MPI_Scatterv(
  void *sendbuf,
  int *sendcnts,
  int *displs,
  MPI_Datatype sendtype,
  void *recvbuf,
  int recvcnt,
  MPI_Datatype recvtype,
  int root,
  MPI_Comm comm
);
  • sendbuf: address of send buffer (choice, significant only at root)
  • sendcounts: integer array (of length group size) specifying the number of elements to send to each processor
  • displs: integer array (of length group size). Entry i specifies the displacement (relative to sendbuf from which to take the outgoing data to process i
  • sendtype: data type of send buffer elements
  • recvbuf: address of receive buffer (choice)
  • recvcount: number of elements in receive buffer (integer)
  • recvtype: data type of receive buffer elements
  • root: rank of sending process (integer)
  • comm: communicator

In [7]:
%%writefile codes/openmpi/scatterv.c
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int rank, size;    
    int i; 
    int sendcounts[4] = {1,2,3,4}; /* each process will receive its rank plus 1 numbers from the sendbuf array */
    int displs[4] = {0,0,0,0}; /* array describing the displacements where each segment begins and is initialized to all 0s */
    int sendbuf[10] = {2,13,4,3,5,1,0,12,10,8}; /* the buffer to be sent */
    int *recvbuf; /* array at each process to receive data. To be initialized based on process rank */

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    /* initializes recvbuf to contain exactly rank plus 1 numbers */
    recvbuf = malloc(sizeof(int)* (rank + 1));
    
    // calculate displacements
    for (i = 1; i < 4; i++) {
        displs[i] = displs[i-1] + sendcounts[i-1];
    }

    // divide the data among processes as described by sendcounts and displs
    MPI_Scatterv(sendbuf, sendcounts, displs, MPI_INT, recvbuf, (rank + 1), MPI_INT, 0, MPI_COMM_WORLD);

    // print what each process received
    printf("%d: ", rank);
    for (i = 0; i < sendcounts[rank]; i++) {
        printf("%d ", recvbuf[i]);
    }
    printf("\n");
    
    free(recvbuf);

    MPI_Finalize();
    return 0;
}


Overwriting codes/openmpi/scatterv.c

In [8]:
!mpicc codes/openmpi/scatterv.c -o ~/scatterv
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/scatterv


0: 2 
1: 13 4 
2: 3 5 1 
3: 0 12 10 8 

In [ ]:
int MPI_Gather(
    void *sendbuff, 
    int sendcount, 
    MPI_Datatype sendtype, 
    void *recvbuff,
    int recvcnt,
    MPI_Datatype recvtype,
    int root, 
    MPI_Comm comm);

In [ ]:
int MPI_Gatherv(
  void *sendbuf,
  int sendcnt,
  MPI_Datatype sendtype,
  void *recvbuf,
  int *recvcnts,
  int *displs,
  MPI_Datatype recvtype,
  int root,
  MPI_Comm comm
);
  • sendbuf: starting address of send buffer (choice)
  • sendcount: number of elements in send buffer (integer)
  • sendtype: data type of send buffer elements
  • recvbuf: address of receive buffer (choice, significant only at root)
  • recvcounts: integer array (of length group size) containing the number of elements that are received from each process (significant only at root)
  • displs: integer array (of length group size). Entry i specifies the displacement relative to recvbuf at which to place the incoming data from process i (significant only at root)
  • recvtype: data type of recv buffer elements (significant only at root)
  • root: rank of receiving process (integer)
  • comm: communicator

In [13]:
%%writefile codes/openmpi/gatherv.c
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int rank, size;    
    int i; 
    int recvcounts[4] = {1,2,3,4}; /* process 0 will receive from each process that process rank */
                                   /* plus 1 numbers */
    int displs[4] = {0,0,0,0}; /* array describing the displacements where each segment begins and is initialized to all 0s */
    int *sendbuf; /* the buffer to be sent. will be initialized individually at each process */
    int *recvbuf; /* arrayto receive data. will only be initialized at process 0*/

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    /* initializes recvbuf to receive 10 numbers */
    if (rank == 0){
      recvbuf = malloc(sizeof(int) * (10));

      for (i = 0; i < 10; i ++)
        recvbuf[i] = -1;
    }
    
    /* initializes sendbuf to receive 10 numbers */
    sendbuf = malloc(sizeof(int) * (rank + 1));
    for (i = 0; i < (rank + 1); i++){
        sendbuf[i] = rank;
    }
    
    // calculate displacements
    for (i = 1; i < 4; i++) {
        displs[i] = displs[i-1] + recvcounts[i-1] - 1;
    }

    // divide the data among processes as described by sendcounts and displs
    MPI_Gatherv(sendbuf, rank + 1, MPI_INT, recvbuf, recvcounts, displs, MPI_INT, 0, MPI_COMM_WORLD);

    // print what process has at the end
    if (rank == 0){
      for (i = 0; i < 10; i++) {
        printf("%d ", recvbuf[i]);
      }
      printf("\n");
      free(recvbuf);
    }
    MPI_Finalize();
    return 0;
}


Overwriting codes/openmpi/gatherv.c

In [14]:
!mpicc codes/openmpi/gatherv.c -o ~/gatherv
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/gatherv


1 2 2 3 3 3 3 -1 -1 -1 

Parallel Bucket Sort version 1


In [1]:
%%writefile codes/openmpi/bucket1.c
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define N 64

int main(int argc, char* argv[]){

  int rawNum[N];
  int sortNum[N];
  int* local_bucket;
  int rank,size;
  int* proc_count;
  int* disp;
  MPI_Status status;
  int i,j,counter;
  int local_min,local_max;
  int tmp;

  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);
    
  if (rank == 0){
    /* Initialize a random array with N integers whose values range between 0 and N */
    for (i = 0; i < N; i++){
      rawNum[i] = rand() % N;
    }
  }

  /* Broadcast contents of rawNum from 0 to all other processes */
  MPI_Bcast(rawNum, N, MPI_INT, 0, MPI_COMM_WORLD);

  /* Each process only works with numbers within their assigned interval */
  counter = 0;
  local_min = rank * (N/size);
  local_max = (rank + 1) * (N/size);  
  for (i = 0; i < N; i++){
    if ((rawNum[i] >= local_min) && (rawNum[i] < local_max)){
      counter += 1;
    }
  }    
    
  printf("For rank %d, max is %d, min is %d, and there are %d elements in rawNum that falls within max and min \n",
         rank,local_max,local_min,counter);


  /* Each process creates its own bucket containing values that fall within its interval */  
  local_bucket = malloc(counter * sizeof(int));
  counter = 0;
  for (i = 0; i < N; i++){
    if ((rawNum[i] >= local_min) && (rawNum[i] < local_max)){
      local_bucket[counter] = rawNum[i];
      counter += 1;
    }
  }

  /* Insertion sort */
  for (i = 0; i < counter; i++){
    for (j = i+1; j < counter; j++){
      if (local_bucket[i] > local_bucket[j]){
        tmp = local_bucket[i];
        local_bucket[i] = local_bucket[j];
        local_bucket[j] = tmp;
      }
    }
  }


  for (i = 0; i < counter; i++){
    printf("%d %d \n",rank,local_bucket[i]);
  }

  /* set up root process */
  if (rank == 0){
    proc_count = malloc(size * sizeof(int));
    disp = malloc(size * sizeof(int));
  }

  /* populate proc_count */
  MPI_Gather(&counter,1,MPI_INT,proc_count,1,MPI_INT,0,MPI_COMM_WORLD);

  if (rank == 0){
    disp[0] = 0;
    for (i = 0; i < size-1; i++){
      disp[i+1] = disp[i] + proc_count[i];
    }
  }

  // receive final result
  MPI_Gatherv(local_bucket,counter,MPI_INT,sortNum,proc_count,disp,MPI_INT,0,MPI_COMM_WORLD);

  if (rank == 0){
    printf("Before sort: \n");
    for (i = 0; i < N; i++) printf("%d ",rawNum[i]);
    printf("\nAfter sort: \n");
    for (i = 0; i < N; i++) printf("%d ",sortNum[i]);
  }

  MPI_Finalize();
  return 0;
}


Overwriting codes/openmpi/bucket1.c

In [2]:
!mpicc codes/openmpi/bucket1.c -o ~/bucket1
!mpirun -np 8 --map-by core:OVERSUBSCRIBE ~/bucket1


For rank 0, max is 8, min is 0, and there are 8 elements in rawNum that falls within max and min 
0 1 
0 2 
0 2 
0 5 
0 6 
0 6 
0 6 
0 7 
For rank 2, max is 24, min is 16, and there are 6 elements in rawNum that falls within max and min 
2 17 
2 17 
2 20 
2 20 
2 20 
2 23 
For rank 4, max is 40, min is 32, and there are 8 elements in rawNum that falls within max and min 
4 33 
4 35 
4 35 
4 35 
4 37 
4 38 
4 39 
4 39 
For rank 1, max is 16, min is 8, and there are 7 elements in rawNum that falls within max and min 
1 9 
1 10 
1 13 
1 13 
1 13 
1 13 
1 14 
For rank 5, max is 48, min is 40, and there are 8 elements in rawNum that falls within max and min 
5 40 
5 41 
5 41 
5 41 
5 43 
5 43 
5 44 
5 46 
For rank 3, max is 32, min is 24, and there are 11 elements in rawNum that falls within max and min 
3 24 
3 24 
3 26 
3 26 
3 26 
3 27 
3 27 
3 28 
3 29 
3 30 
3 31 
For rank 6, max is 56, min is 48, and there are 10 elements in rawNum that falls within max and min 
6 49 
6 50 
6 50 
6 50 
6 51 
6 51 
6 52 
6 52 
6 54 
6 55 
For rank 7, max is 64, min is 56, and there are 6 elements in rawNum that falls within max and min 
7 56 
7 58 
7 59 
7 60 
7 61 
7 63 
Before sort: 
39 6 41 51 17 63 10 44 41 13 58 43 50 59 35 6 60 2 20 56 27 40 39 13 54 26 46 35 51 31 9 26 38 50 13 55 49 24 35 26 37 29 5 23 24 41 30 20 43 50 13 6 27 52 20 17 14 2 52 1 33 61 28 7 
After sort: 
1 2 2 5 6 6 6 7 9 10 13 13 13 13 14 17 17 20 20 20 23 24 24 26 26 26 27 27 28 29 30 31 33 35 35 35 37 38 39 39 40 41 41 41 43 43 44 46 49 50 50 50 51 51 52 52 54 55 56 58 59 60 61 63 
  • The data might be too large to be distributed via MPI_Bcast
Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.
Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.
Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.

In [ ]:
int MPI_Alltoall(
  void *sendbuf,
  int sendcount,
  MPI_Datatype sendtype,
  void *recvbuf,
  int recvcount,
  MPI_Datatype recvtype,
  MPI_Comm comm
);
  • sendbuf: starting address of send buffer (choice)
  • sendcount: number of elements to send to each process (integer)
  • sendtype: data type of send buffer elements
  • recvbuf: address of receive buffer (choice)
  • recvcount: number of elements received from any process (integer)
  • recvtype: data type of receive buffer elements
  • comm: communicator

In [5]:
%%writefile codes/openmpi/alltoall.c
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc,char *argv[]){
  int rank, size;
  int *sray,*rray;
  int *sdisp,*scounts,*rdisp,*rcounts;
  int ssize,rsize,i,k,j;
  float z;

  MPI_Init(&argc,&argv);
  MPI_Comm_size( MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
  scounts=(int*)malloc(sizeof(int)*size);
  rcounts=(int*)malloc(sizeof(int)*size);
  sdisp=(int*)malloc(sizeof(int)*size);
  rdisp=(int*)malloc(sizeof(int)*size);

  z = (float) rand() / RAND_MAX;
    
  for(i=0; i < size; i++){
    scounts[i]= rank * i + i;
  }
  
  printf("myid = %d scounts = ",rank);
  for(i=0;i<size;i++)
    printf("%d ",scounts[i]);
  printf("\n");

  /* send the data */
  MPI_Alltoall(scounts,1,MPI_INT,rcounts,1,MPI_INT,MPI_COMM_WORLD);
  printf("myid = %d rcounts = ",rank);
  for(i=0;i<size;i++)
    printf("%d ",rcounts[i]);
  printf("\n");
  MPI_Finalize();
}


Overwriting codes/openmpi/alltoall.c

In [6]:
!mpicc codes/openmpi/alltoall.c -o ~/alltoall
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/alltoall


myid = 3 scounts = 0 4 8 12 
myid = 2 scounts = 0 3 6 9 
myid = 1 scounts = 0 2 4 6 
myid = 0 scounts = 0 1 2 3 
myid = 3 rcounts = 3 6 9 12 
myid = 1 rcounts = 1 2 3 4 
myid = 2 rcounts = 2 4 6 8 
myid = 0 rcounts = 0 0 0 0 

In [ ]:
int MPI_Alltoallv(
  void *sendbuf,
  int *sendcnts,
  int *sdispls,
  MPI_Datatype sendtype,
  void *recvbuf,
  int *recvcnts,
  int *rdispls,
  MPI_Datatype recvtype,
  MPI_Comm comm
);
  • sendbuf: starting address of send buffer (choice)
  • sendcounts: integer array equal to the group size specifying the number of elements to send to each processor
  • sdispls: integer array (of length group size). Entry j specifies the displacement (relative to sendbuf from which to take the outgoing data destined for process j
  • sendtype: data type of send buffer elements
  • recvbuf: address of receive buffer (choice)
  • recvcounts: integer array equal to the group size specifying the maximum number of elements that can be received from each processor
  • rdispls: integer array (of length group size). Entry i specifies the displacement (relative to recvbuf at which to place the incoming data from process i
  • recvtype: data type of receive buffer elements
  • comm: communicator

In [7]:
%%writefile codes/openmpi/alltoallv.c
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc,char *argv[]){
  int size, rank; 
  int *sray,*rray;
  int *sdisp,*scounts,*rdisp,*rcounts;
  int ssize,rsize,i,k,j;
  float z;

  MPI_Init(&argc,&argv);
  MPI_Comm_size( MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
  scounts=(int*)malloc(sizeof(int)*size);
  rcounts=(int*)malloc(sizeof(int)*size);
  sdisp=(int*)malloc(sizeof(int)*size);
  rdisp=(int*)malloc(sizeof(int)*size);

  
  /* find out how much data to send */
  srand((unsigned int) rank);    
  for(i=0;i<size;i++){
    z = (float) rand()/RAND_MAX;
    scounts[i]=(int)(5.0 * z) + 1;
  }
    
  printf("rank= %d scounts= %d %d %d %d\n",rank,scounts[0],scounts[1],scounts[2],scounts[3]);

  for(i=0;i<size;i++)
    printf("%d ",scounts[i]);
  printf("\n");
    
  /* tell the other processors how much data is coming */
  MPI_Alltoall(scounts,1,MPI_INT,rcounts,1,MPI_INT,MPI_COMM_WORLD);

  /* calculate displacements and the size of the arrays */
  sdisp[0]=0;
  for(i=1;i<size;i++){
    sdisp[i]=scounts[i-1]+sdisp[i-1];
  }
  rdisp[0]=0;
  for(i=1;i<size;i++){
    rdisp[i]=rcounts[i-1]+rdisp[i-1];
  }
  ssize=0;
  rsize=0;
  for(i=0;i<size;i++){
    ssize=ssize+scounts[i];
    rsize=rsize+rcounts[i];
  }
  
  /* allocate send and rec arrays */
  sray=(int*)malloc(sizeof(int)*ssize);
  rray=(int*)malloc(sizeof(int)*rsize);
  for(i=0;i<ssize;i++)
    sray[i]=rank;

  /* send/rec different amounts of data to/from each processor */
  MPI_Alltoallv( sray,scounts,sdisp,MPI_INT,rray,rcounts,rdisp,MPI_INT,MPI_COMM_WORLD);
                  
  printf("rank= %d rray=",rank);
  for(i=0;i<rsize;i++)
    printf("%d ",rray[i]);
  printf("\n");
  MPI_Finalize();
}


Overwriting codes/openmpi/alltoallv.c

In [8]:
!mpicc codes/openmpi/alltoallv.c -o ~/alltoallv
!mpirun -np 4 --map-by core:OVERSUBSCRIBE ~/alltoallv


rank= 1 scounts= 5 2 4 4
5 2 4 4 
rank= 2 scounts= 4 5 1 1
4 5 1 1 
rank= 0 scounts= 5 2 4 4
5 2 4 4 
rank= 3 scounts= 3 2 2 3
3 2 2 3 
rank= 1 rray=0 0 1 1 2 2 2 2 2 3 3 
rank= 0 rray=0 0 0 0 0 1 1 1 1 1 2 2 2 2 3 3 3 
rank= 2 rray=0 0 0 0 1 1 1 1 2 3 3 
rank= 3 rray=0 0 0 0 1 1 1 1 2 3 3 3 

In [9]:
%%writefile codes/openmpi/bucket2.c
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define N 32

int main(int argc, char* argv[]){

  int rank,size;
  MPI_Status status;

  int rawNum[N];
  int sortNum[N];
  int* local_array;

  int i,j;

  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);

  // initialize the unsorted array at process 0
  if (rank == 0){
    for (i = 0; i < N; i++){
      rawNum[i] = rand() % N;
    }
  }

  // prepare the local container, then distribution equal portions of the
  // unsorted array to all the processes from process 0
  local_array = malloc((N/size) * sizeof(int));
  MPI_Scatter(rawNum,(N/size),MPI_INT,local_array,(N/size),MPI_INT,0,MPI_COMM_WORLD);

  // initialize the local bucket matrix
  int local_bucket[size][N/size];
  for (i = 0; i < size; i++){
    for (j = 0; j < N/size; j++){
      local_bucket[i][j] = RAND_MAX;
    }
  }

  int counter = 0;
  int local_min,local_max;
 // populate the bucket matrix
  for (i = 0; i < size; i++){
    counter = 0;
    for (j = 0; j < N/size; j++){
      local_min = i * N/size;
      local_max = (i + 1) * N / size;
      if ((local_array[j] >= local_min)&&(local_array[j] < local_max)){
        local_bucket[i][counter] = local_array[j];
        counter += 1;
      }
    }
  }

  // sort the bucket matrix. 
  int tmp = 0;
  for (i = 0; i < size; i++){
    for (j = 0; j < N/size; j++){
      for (counter = j; counter < N/size; counter++){
        if (local_bucket[i][j] > local_bucket[i][counter]){
          tmp = local_bucket[i][j];
          local_bucket[i][j] = local_bucket[i][counter];
          local_bucket[i][counter] = tmp;
        }
      }
    }
  }

  // placing the number from the buckets back into the main array
  counter = 0;
  int array_counter[size];
  for (i = 0; i < size; i++){
    for (j = 0; j < N/size; j++){
      if (local_bucket[i][j] != RAND_MAX){
        local_array[counter] = local_bucket[i][j];
        counter += 1;
      }
      else {
        array_counter[i] = j;
        printf("Rank %d counter %d \n",rank,array_counter[i]);
        break;
      }
    }
  }
 /*
  for (i = 0; i < N/size; i++){
    printf("rank %d sorted num %d \n",rank,local_array[i]);
  }
  */

  if (rank == 0)  printf("-----------------\n");
  MPI_Barrier(MPI_COMM_WORLD);

  // preparation for bucket gathering
  int recvbuf[size];
  int rdisp[size];
  int sdisp[size];

  sdisp[0] = 0;
  for (i = 0; i < size - 1; i++){
    sdisp[i+1] = sdisp[i] + array_counter[i];
    printf("%d send displace %d \n",rank,sdisp[i+1]);
  }

  MPI_Alltoall(array_counter,1,MPI_INT,recvbuf,1,MPI_INT,MPI_COMM_WORLD);

  MPI_Barrier(MPI_COMM_WORLD);

  int sum = 0;
  for (i = 0; i < size; i++){
    sum += recvbuf[i];
    printf("rank %d recvbuf %d \n",rank,recvbuf[i]);
  }

  printf("rank %d total recv buf %d \n", rank,sum);

  MPI_Barrier(MPI_COMM_WORLD);

  rdisp[0] = 0;
  for (i = 0; i < size - 1; i++){
    rdisp[i+1] = rdisp[i] + recvbuf[i];
    printf("%d recv displace %d \n",rank,rdisp[i+1]);
  }

  int local_array_alltoall[sum];
  // initialize local_array_alltoall for testing purpose
  for (i = 0; i < sum; i++) local_array_alltoall[i] = -1;
  MPI_Alltoallv(local_array,array_counter,sdisp,MPI_INT,local_array_alltoall,recvbuf,rdisp,MPI_INT,MPI_COMM_WORLD);

  for (i = 0; i < sum; i++){
    printf("rank %d semi-sorted num %d \n",rank,local_array_alltoall[i]);
  }


  // local sort on big bucket one more time
  for (i = 0; i < sum; i++){
    for (j = i; j < sum; j++){
      if (local_array_alltoall[i] > local_array_alltoall[j]){
        tmp = local_array_alltoall[i];
        local_array_alltoall[i] = local_array_alltoall[j];
        local_array_alltoall[j] = tmp;
      }
    }
  }

  // preparation for the final gathering
  int proc_count[size];
  int disp[size];


  MPI_Gather(&sum,1,MPI_INT,proc_count,1,MPI_INT,0,MPI_COMM_WORLD);

  if (rank == 0){
    disp[0] = 0;
    for (i = 0; i < size-1; i++){
      disp[i+1] = disp[i] + proc_count[i];
    }
  }

  MPI_Gatherv(local_array_alltoall,sum,MPI_INT,sortNum,proc_count,disp,MPI_INT,0,MPI_COMM_WORLD);

  if (rank == 0){
    printf("Before sort: \n");
    for (i = 0; i < N; i++) printf("%d ",rawNum[i]);
    printf("\nAfter sort: \n");
    for (i = 0; i < N; i++) printf("%d ",sortNum[i]);
  }
MPI_Finalize();
  return 0;
}


Overwriting codes/openmpi/bucket2.c

In [10]:
!mpicc codes/openmpi/bucket2.c -o ~/bucket2
!mpirun -np 8 --map-by core:OVERSUBSCRIBE ~/bucket2


Rank 0 counter 0 
Rank 0 counter 2 
Rank 0 counter 1 
Rank 0 counter 0 
Rank 0 counter 1 
Rank 0 counter 0 
Rank 0 counter 0 
Rank 0 counter 0 
-----------------
Rank 1 counter 0 
Rank 1 counter 0 
Rank 1 counter 1 
Rank 1 counter 1 
Rank 1 counter 1 
Rank 1 counter 0 
Rank 1 counter 0 
Rank 1 counter 1 
Rank 6 counter 1 
Rank 6 counter 0 
Rank 6 counter 0 
Rank 6 counter 1 
Rank 6 counter 0 
Rank 6 counter 1 
Rank 6 counter 1 
Rank 6 counter 0 
Rank 4 counter 1 
Rank 4 counter 0 
Rank 4 counter 0 
Rank 4 counter 0 
Rank 4 counter 0 
Rank 4 counter 1 
Rank 4 counter 1 
Rank 4 counter 1 
Rank 3 counter 1 
Rank 3 counter 1 
Rank 3 counter 0 
Rank 3 counter 0 
Rank 3 counter 1 
Rank 3 counter 0 
Rank 3 counter 1 
Rank 3 counter 0 
Rank 2 counter 0 
Rank 2 counter 0 
Rank 2 counter 2 
Rank 2 counter 1 
Rank 2 counter 0 
Rank 2 counter 0 
Rank 2 counter 1 
Rank 2 counter 0 
Rank 7 counter 0 
Rank 7 counter 0 
Rank 7 counter 1 
Rank 7 counter 0 
Rank 7 counter 1 
Rank 7 counter 0 
Rank 7 counter 1 
Rank 7 counter 1 
4 send displace 1 
4 send displace 1 
4 send displace 1 
4 send displace 1 
4 send displace 1 
4 send displace 2 
4 send displace 3 
6 send displace 1 
6 send displace 1 
6 send displace 1 
6 send displace 2 
6 send displace 2 
6 send displace 3 
6 send displace 4 
Rank 5 counter 0 
Rank 5 counter 1 
Rank 5 counter 1 
Rank 5 counter 1 
Rank 5 counter 0 
Rank 5 counter 0 
Rank 5 counter 1 
Rank 5 counter 0 
5 send displace 0 
5 send displace 1 
5 send displace 2 
5 send displace 3 
5 send displace 3 
5 send displace 3 
5 send displace 4 
1 send displace 0 
1 send displace 0 
1 send displace 1 
1 send displace 2 
1 send displace 3 
1 send displace 3 
1 send displace 3 
7 send displace 0 
7 send displace 0 
7 send displace 1 
7 send displace 1 
7 send displace 2 
7 send displace 2 
7 send displace 3 
3 send displace 1 
3 send displace 2 
3 send displace 2 
3 send displace 2 
3 send displace 3 
3 send displace 3 
3 send displace 4 
2 send displace 0 
2 send displace 0 
2 send displace 2 
2 send displace 3 
2 send displace 3 
2 send displace 3 
2 send displace 4 
0 send displace 0 
0 send displace 2 
0 send displace 3 
0 send displace 3 
0 send displace 4 
0 send displace 4 
0 send displace 4 
rank 4 recvbuf 1 
rank 4 recvbuf 1 
rank 4 recvbuf 0 
rank 4 recvbuf 1 
rank 4 recvbuf 0 
rank 4 recvbuf 0 
rank 4 recvbuf 0 
rank 4 recvbuf 1 
rank 4 total recv buf 4 
rank 6 recvbuf 0 
rank 6 recvbuf 0 
rank 6 recvbuf 1 
rank 6 recvbuf 1 
rank 6 recvbuf 1 
rank 6 recvbuf 1 
rank 6 recvbuf 1 
rank 6 recvbuf 1 
rank 6 total recv buf 6 
rank 0 recvbuf 0 
rank 0 recvbuf 0 
rank 0 recvbuf 0 
rank 0 recvbuf 1 
rank 0 recvbuf 1 
rank 0 recvbuf 0 
rank 0 recvbuf 1 
rank 0 recvbuf 0 
rank 0 total recv buf 3 
rank 5 recvbuf 0 
rank 5 recvbuf 0 
rank 5 recvbuf 0 
rank 5 recvbuf 0 
rank 5 recvbuf 1 
rank 5 recvbuf 0 
rank 5 recvbuf 1 
rank 5 recvbuf 0 
rank 5 total recv buf 2 
rank 2 recvbuf 1 
rank 2 recvbuf 1 
rank 2 recvbuf 2 
rank 2 recvbuf 0 
rank 2 recvbuf 0 
rank 2 recvbuf 1 
rank 2 recvbuf 0 
rank 2 recvbuf 1 
rank 2 total recv buf 6 
rank 7 recvbuf 0 
rank 7 recvbuf 1 
rank 7 recvbuf 0 
rank 7 recvbuf 0 
rank 7 recvbuf 1 
rank 7 recvbuf 0 
rank 7 recvbuf 0 
rank 7 recvbuf 1 
rank 7 total recv buf 3 
rank 3 recvbuf 0 
rank 3 recvbuf 1 
rank 3 recvbuf 1 
rank 3 recvbuf 0 
rank 3 recvbuf 0 
rank 3 recvbuf 1 
rank 3 recvbuf 1 
rank 3 recvbuf 0 
rank 3 total recv buf 4 
rank 1 recvbuf 2 
rank 1 recvbuf 0 
rank 1 recvbuf 0 
rank 1 recvbuf 1 
rank 1 recvbuf 0 
rank 1 recvbuf 1 
rank 1 recvbuf 0 
rank 1 recvbuf 0 
rank 1 total recv buf 4 
1 recv displace 2 
1 recv displace 2 
1 recv displace 2 
1 recv displace 3 
1 recv displace 3 
1 recv displace 4 
1 recv displace 4 
3 recv displace 0 
3 recv displace 1 
3 recv displace 2 
3 recv displace 2 
3 recv displace 2 
3 recv displace 3 
3 recv displace 4 
5 recv displace 0 
5 recv displace 0 
5 recv displace 0 
5 recv displace 0 
5 recv displace 1 
5 recv displace 1 
5 recv displace 2 
7 recv displace 0 
7 recv displace 1 
7 recv displace 1 
7 recv displace 1 
7 recv displace 2 
7 recv displace 2 
7 recv displace 2 
2 recv displace 1 
2 recv displace 2 
2 recv displace 4 
2 recv displace 4 
2 recv displace 4 
2 recv displace 5 
2 recv displace 5 
0 recv displace 0 
0 recv displace 0 
0 recv displace 0 
0 recv displace 1 
0 recv displace 2 
0 recv displace 2 
0 recv displace 3 
4 recv displace 1 
4 recv displace 2 
4 recv displace 2 
4 recv displace 3 
4 recv displace 3 
4 recv displace 3 
4 recv displace 3 
6 recv displace 0 
6 recv displace 0 
6 recv displace 1 
6 recv displace 2 
6 recv displace 3 
6 recv displace 4 
6 recv displace 5 
rank 5 semi-sorted num 20 
rank 5 semi-sorted num 22 
rank 6 semi-sorted num 26 
rank 6 semi-sorted num 27 
rank 6 semi-sorted num 24 
rank 6 semi-sorted num 27 
rank 6 semi-sorted num 26 
rank 6 semi-sorted num 26 
rank 3 semi-sorted num 12 
rank 3 semi-sorted num 13 
rank 3 semi-sorted num 13 
rank 3 semi-sorted num 14 
rank 4 semi-sorted num 19 
rank 4 semi-sorted num 17 
rank 4 semi-sorted num 18 
rank 4 semi-sorted num 19 
rank 0 semi-sorted num 3 
rank 0 semi-sorted num 2 
rank 0 semi-sorted num 3 
rank 2 semi-sorted num 9 
rank 2 semi-sorted num 10 
rank 2 semi-sorted num 9 
rank 2 semi-sorted num 11 
rank 2 semi-sorted num 8 
rank 2 semi-sorted num 9 
rank 7 semi-sorted num 31 
rank 7 semi-sorted num 28 
rank 7 semi-sorted num 31 
rank 1 semi-sorted num 6 
rank 1 semi-sorted num 7 
rank 1 semi-sorted num 6 
rank 1 semi-sorted num 7 
Before sort: 
7 6 9 19 17 31 10 12 9 13 26 11 18 27 3 6 28 2 20 24 27 8 7 13 22 26 14 3 19 31 9 26 
After sort: 
2 3 3 6 6 7 7 8 9 9 9 10 11 12 13 13 14 17 18 19 19 20 22 24 26 26 26 27 27 28 31 31 

N-Body Problem

Fundamental settings for most, if not all, of computational simulation problems:

  • Given a space
  • Given a group of entities whose activities are (often) bounded within this space
  • Given a set of equation that governs how these entities react to one another and to attributes of the containing space
  • Simulate how these reactions impact all entities and the entire space overtime
  • Computation requires parallelization
  • Experimental spaces are simulated at massive scale (millions of entities)
  • Individual time steps are significantly smaller than the total simulation time.
  • Time complexity can be reduced by approximating a cluster of distant bodies as a single distant body with mass sited at the center of the mass of the cluster

Barnes-Hut Algorithm (2-D)

Start with whole region in which one square contains the bodies (or particles).

  • First, this cube is divided into four subregions.
  • If a subregion contains no particles, it is deleted from further consideration.
  • If a subregion contains one body, it is retained.
  • If a subregion contains more than one body, it is recursively divided until every subregion contains one body.
  • Create an quadtree – a tree with up to four edges from each node
  • The leaves represent cells each containing one body.
  • After the tree has been constructed, the total mass and center of mass of the subregion is stored at each node.

Orthogonal Recursive Bisection

  • First, a vertical line found that divides area into two areas each with equal number of bodies.
  • For each area, a horizontal line found that divides it into two areas, each with equal number of bodies.
  • Repeated as required.