Parallelizing Common Sorting Algorithm: Tranposition and Merge Sort

Linh B. Ngo

CPSC 3620

Buble Sort: Transposition Sort

  • Largest number moved to the end of the list by a series of compares and exchanges, strting at the opposite end

  • Actions repeated with subsequent numbers, stopping just before that previously positioned number

  • In this way, lage numbers move ("bubble") towrard one end

Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.
  • Transposition Sort is a variation of Buble Sort
  • Operates in two alternating phases, even and odd
  • Even phase: Even-numbered processes exchange numbers with their right neighbor
  • Odd phase: Odd-numbered processes exchange numbers with their right neighbor
Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.

In [3]:
%%writefile codes/mpi4py/transposition.py
#!/usr/bin/env python
# transposition.py
import numpy as np
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank(); size = comm.Get_size(); status = MPI.Status();
N = 16
unsorted = np.zeros(N, dtype="int")
final_sorted = np.zeros(N, dtype="int")
local_array = np.zeros(int(N / size), dtype="int")
local_tmp = np.zeros(int(N / size), dtype="int")
local_remain = np.zeros(2 * int(N / size), dtype="int")

if rank == 0:
    unsorted = np.random.randint(low=0,high=N,size=N)
    print (unsorted)
comm.Scatter(unsorted, local_array, root = 0)

local_array.sort()
for step in range(0, size):
    print ("Step: ", step)
    if (step % 2 == 0):
        if (rank % 2 == 0):
            des = rank + 1
        else:
            des = rank - 1
    else:
        if (rank % 2 == 0):
            des = rank - 1
        else:
            des = rank + 1
            
    if (des >= 0 and des < size):
        print ("My rank is ", rank, " and my des is ", des)
        comm.Send(local_array, dest = des, tag = 0)
        comm.Recv(local_tmp, source = des)    
        print ("Rank ", rank, " ", step, ": Initial local_array: ", local_array)
        print ("Rank ", rank, " ", step, ": Received local_tmp:", local_tmp)
        local_remain = np.concatenate((local_array, local_tmp), axis=0)
        local_remain.sort()
        
        if (rank < des):
            local_array = local_remain[0:int(N/size)]
        else:
            local_array = local_remain[int(N/size):2 * int(N/size)]
        print ("Rank ", rank, " ", step, ": Retained portions: ", local_array)
comm.Gather(local_array, final_sorted)

if (rank  == 0):
    print (final_sorted)


Overwriting codes/mpi4py/transposition.py

In [1]:
!chmod 755 codes/mpi4py/transposition.py
!module load gcc/5.3.0 openmpi/1.10.3; mpirun -np 4 codes/mpi4py/transposition.py


--------------------------------------------------------------------------
The library attempted to open the following supporting CUDA libraries, 
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory
If you are not interested in CUDA-aware support, then run with 
--mca mpi_cuda_support 0 to suppress this message.  If you are interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.
--------------------------------------------------------------------------
[ 9  9  9  2  2  1 15  2  0  5  1 14 14  2  5  6]
Step:  0
My rank is  0  and my des is  1
Rank  0   0 : Initial local_array:  [2 9 9 9]
Rank  0   0 : Received local_tmp: [ 1  2  2 15]
Rank  0   0 : Retained portions:  [1 2 2 2]
Step:  1
Step:  2
My rank is  0  and my des is  1
Step:  0
My rank is  1  and my des is  0
Rank  1   0 : Initial local_array:  [ 1  2  2 15]
Rank  1   0 : Received local_tmp: [2 9 9 9]
Rank  1   0 : Retained portions:  [ 9  9  9 15]
Step:  1
My rank is  1  and my des is  2
Rank  1   1 : Initial local_array:  [ 9  9  9 15]
Rank  1   1 : Received local_tmp: [0 1 2 5]
Rank  1   1 : Retained portions:  [0 1 2 5]
Step:  2
My rank is  1  and my des is  0
Step:  0
My rank is  2  and my des is  3
Rank  2   0 : Initial local_array:  [ 0  1  5 14]
Rank  2   0 : Received local_tmp: [ 2  5  6 14]
Rank  2   0 : Retained portions:  [0 1 2 5]
Step:  1
My rank is  2  and my des is  1
Rank  2   1 : Initial local_array:  [0 1 2 5]
Rank  2   1 : Received local_tmp: [ 9  9  9 15]
Rank  2   1 : Retained portions:  [ 9  9  9 15]
Step:  2
My rank is  2  and my des is  3
Rank  2   2 : Initial local_array:  [ 9  9  9 15]
Rank  2   2 : Received local_tmp: [ 5  6 14 14]
Step:  0
My rank is  3  and my des is  2
Rank  3   0 : Initial local_array:  [ 2  5  6 14]
Rank  3   0 : Received local_tmp: [ 0  1  5 14]
Rank  3   0 : Retained portions:  [ 5  6 14 14]
Step:  1
Step:  2
My rank is  3  and my des is  2
Rank  3   2 : Initial local_array:  [ 5  6 14 14]
Rank  3   2 : Received local_tmp: [ 9  9  9 15]
Rank  3   2 : Retained portions:  [ 9 14 14 15]
Step:  3
Rank  0   2 : Initial local_array:  [1 2 2 2]
Rank  0   2 : Received local_tmp: [0 1 2 5]
Rank  0   2 : Retained portions:  [0 1 1 2]
Step:  3
[ 0  1  1  2  2  2  2  5  5  6  9  9  9 14 14 15]
Rank  1   2 : Initial local_array:  [0 1 2 5]
Rank  1   2 : Received local_tmp: [1 2 2 2]
Rank  1   2 : Retained portions:  [2 2 2 5]
Step:  3
My rank is  1  and my des is  2
Rank  1   3 : Initial local_array:  [2 2 2 5]
Rank  1   3 : Received local_tmp: [5 6 9 9]
Rank  1   3 : Retained portions:  [2 2 2 5]
Rank  2   2 : Retained portions:  [5 6 9 9]
Step:  3
My rank is  2  and my des is  1
Rank  2   3 : Initial local_array:  [5 6 9 9]
Rank  2   3 : Received local_tmp: [2 2 2 5]
Rank  2   3 : Retained portions:  [5 6 9 9]
[node0305.palmetto.clemson.edu:22617] 3 more processes have sent help message help-mpi-common-cuda.txt / dlopen failed
[node0305.palmetto.clemson.edu:22617] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Merge Sort

  • A classical sequential sorting algorithm using divide-and-conquer approach

  • Unsorted list first divided into half. Each half is again divided into two. Continued until individual numbers obtained.

  • Then pairs of numbers combined (merged) into sorted list of two numbers.

  • Pairs of these lists of four numbers are merged into sorted lists of eight numbers.

  • This is continued until the one fully sorted list is obtained.

Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.

In [5]:
%%writefile codes/mpi4py/merge.py
#!/usr/bin/env python
# merge.py
import numpy as np
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank(); size = comm.Get_size(); status = MPI.Status();
N = 16
unsorted = np.zeros(N, dtype="int")
final_sorted = np.zeros(N, dtype="int")
local_array = np.zeros(int(N / size), dtype="int")
local_tmp = np.zeros(int(N / size), dtype="int")
local_remain = np.zeros(2 * int(N / size), dtype="int")

if rank == 0:
    unsorted = np.random.randint(low=0,high=N,size=N)
    print (unsorted)
comm.Scatter(unsorted, local_array, root = 0)

local_array.sort()

step = size / 2
print ("Rank: ", rank)
while (step >= 1):
    if (rank >= step and rank < step * 2):
        comm.Send(local_array, rank - step, tag = 0)
    elif (rank < step):
        local_tmp = np.zeros(local_array.size, dtype="int")
        local_remain = np.zeros(2 * local_array.size, dtype="int")
        comm.Recv(local_tmp, rank + step, tag = 0)
        i = 0 #local_array counter
        j = 0 # local_tmp counter
        for k in range (0, 2 * local_array.size):
            if (i >= local_array.size):
                local_remain[k] = local_tmp[j]
                j += 1
            elif (j >= local_array.size):
                local_remain[k] = local_array[i]
                i += 1
            elif (local_array[i] > local_tmp[j]):
                local_remain[k] = local_tmp[j]
                j += 1
            else:
                local_remain[k] = local_array[i]
                i += 1        
        print ("Step: ", step)
        print ("local array: ", local_array)
        print ("local tmp: ", local_tmp)
        print ("local remain: ", local_remain)
        local_array = local_remain
    step = step / 2
    
if (rank  == 0):
    print (local_array)


Writing codes/mpi4py/merge.py

In [5]:
!chmod 755 codes/mpi4py/merge.py
!module load gcc/5.3.0 openmpi/1.10.3; mpirun -np 4 codes/mpi4py/merge.py


[ 3 14  8  8  8  2  8 10 14  5  3 10  7 11  7 10]
Rank:  0
Step:  2.0
local array:  [ 3  8  8 14]
Rank:  1
Step:  2.0
Rank:  2
Rank:  3
local tmp:  [ 3  5 10 14]
local remain:  [ 3  3  5  8  8 10 14 14]
Step:  1.0
local array:  [ 3  3  5  8  8 10 14 14]
local tmp:  [ 2  7  7  8  8 10 10 11]
local array:  [ 2  8  8 10]
local tmp:  [ 7  7 10 11]
local remain:  [ 2  7  7  8  8 10 10 11]
local remain:  [ 2  3  3  5  7  7  8  8  8  8 10 10 10 11 14 14]
[ 2  3  3  5  7  7  8  8  8  8 10 10 10 11 14 14]

In [ ]: