Parallelizing Common Sorting Algorithm: Tranposition and Merge Sort

Linh B. Ngo

CPSC 3620

Buble Sort: Transposition Sort

Largest number moved to the end of the list by a series of compares and exchanges, strting at the opposite end
Actions repeated with subsequent numbers, stopping just before that previously positioned number
In this way, lage numbers move ("bubble") towrard one end

_{Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.}

Transposition Sort is a variation of Buble Sort
Operates in two alternating phases, even and odd
Even phase: Even-numbered processes exchange numbers with their right neighbor
Odd phase: Odd-numbered processes exchange numbers with their right neighbor

_{Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.}



In [3]:

    
%%writefile codes/mpi4py/transposition.py
#!/usr/bin/env python
# transposition.py
import numpy as np
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank(); size = comm.Get_size(); status = MPI.Status();
N = 16
unsorted = np.zeros(N, dtype="int")
final_sorted = np.zeros(N, dtype="int")
local_array = np.zeros(int(N / size), dtype="int")
local_tmp = np.zeros(int(N / size), dtype="int")
local_remain = np.zeros(2 * int(N / size), dtype="int")

if rank == 0:
    unsorted = np.random.randint(low=0,high=N,size=N)
    print (unsorted)
comm.Scatter(unsorted, local_array, root = 0)

local_array.sort()
for step in range(0, size):
    print ("Step: ", step)
    if (step % 2 == 0):
        if (rank % 2 == 0):
            des = rank + 1
        else:
            des = rank - 1
    else:
        if (rank % 2 == 0):
            des = rank - 1
        else:
            des = rank + 1
            
    if (des >= 0 and des < size):
        print ("My rank is ", rank, " and my des is ", des)
        comm.Send(local_array, dest = des, tag = 0)
        comm.Recv(local_tmp, source = des)    
        print ("Rank ", rank, " ", step, ": Initial local_array: ", local_array)
        print ("Rank ", rank, " ", step, ": Received local_tmp:", local_tmp)
        local_remain = np.concatenate((local_array, local_tmp), axis=0)
        local_remain.sort()
        
        if (rank < des):
            local_array = local_remain[0:int(N/size)]
        else:
            local_array = local_remain[int(N/size):2 * int(N/size)]
        print ("Rank ", rank, " ", step, ": Retained portions: ", local_array)
comm.Gather(local_array, final_sorted)

if (rank  == 0):
    print (final_sorted)









    



Overwriting codes/mpi4py/transposition.py



In [1]:

    
!chmod 755 codes/mpi4py/transposition.py
!module load gcc/5.3.0 openmpi/1.10.3; mpirun -np 4 codes/mpi4py/transposition.py









    



--------------------------------------------------------------------------
The library attempted to open the following supporting CUDA libraries, 
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory
If you are not interested in CUDA-aware support, then run with 
--mca mpi_cuda_support 0 to suppress this message.  If you are interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.
--------------------------------------------------------------------------
[ 9  9  9  2  2  1 15  2  0  5  1 14 14  2  5  6]
Step:  0
My rank is  0  and my des is  1
Rank  0   0 : Initial local_array:  [2 9 9 9]
Rank  0   0 : Received local_tmp: [ 1  2  2 15]
Rank  0   0 : Retained portions:  [1 2 2 2]
Step:  1
Step:  2
My rank is  0  and my des is  1
Step:  0
My rank is  1  and my des is  0
Rank  1   0 : Initial local_array:  [ 1  2  2 15]
Rank  1   0 : Received local_tmp: [2 9 9 9]
Rank  1   0 : Retained portions:  [ 9  9  9 15]
Step:  1
My rank is  1  and my des is  2
Rank  1   1 : Initial local_array:  [ 9  9  9 15]
Rank  1   1 : Received local_tmp: [0 1 2 5]
Rank  1   1 : Retained portions:  [0 1 2 5]
Step:  2
My rank is  1  and my des is  0
Step:  0
My rank is  2  and my des is  3
Rank  2   0 : Initial local_array:  [ 0  1  5 14]
Rank  2   0 : Received local_tmp: [ 2  5  6 14]
Rank  2   0 : Retained portions:  [0 1 2 5]
Step:  1
My rank is  2  and my des is  1
Rank  2   1 : Initial local_array:  [0 1 2 5]
Rank  2   1 : Received local_tmp: [ 9  9  9 15]
Rank  2   1 : Retained portions:  [ 9  9  9 15]
Step:  2
My rank is  2  and my des is  3
Rank  2   2 : Initial local_array:  [ 9  9  9 15]
Rank  2   2 : Received local_tmp: [ 5  6 14 14]
Step:  0
My rank is  3  and my des is  2
Rank  3   0 : Initial local_array:  [ 2  5  6 14]
Rank  3   0 : Received local_tmp: [ 0  1  5 14]
Rank  3   0 : Retained portions:  [ 5  6 14 14]
Step:  1
Step:  2
My rank is  3  and my des is  2
Rank  3   2 : Initial local_array:  [ 5  6 14 14]
Rank  3   2 : Received local_tmp: [ 9  9  9 15]
Rank  3   2 : Retained portions:  [ 9 14 14 15]
Step:  3
Rank  0   2 : Initial local_array:  [1 2 2 2]
Rank  0   2 : Received local_tmp: [0 1 2 5]
Rank  0   2 : Retained portions:  [0 1 1 2]
Step:  3
[ 0  1  1  2  2  2  2  5  5  6  9  9  9 14 14 15]
Rank  1   2 : Initial local_array:  [0 1 2 5]
Rank  1   2 : Received local_tmp: [1 2 2 2]
Rank  1   2 : Retained portions:  [2 2 2 5]
Step:  3
My rank is  1  and my des is  2
Rank  1   3 : Initial local_array:  [2 2 2 5]
Rank  1   3 : Received local_tmp: [5 6 9 9]
Rank  1   3 : Retained portions:  [2 2 2 5]
Rank  2   2 : Retained portions:  [5 6 9 9]
Step:  3
My rank is  2  and my des is  1
Rank  2   3 : Initial local_array:  [5 6 9 9]
Rank  2   3 : Received local_tmp: [2 2 2 5]
Rank  2   3 : Retained portions:  [5 6 9 9]
[node0305.palmetto.clemson.edu:22617] 3 more processes have sent help message help-mpi-common-cuda.txt / dlopen failed
[node0305.palmetto.clemson.edu:22617] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Merge Sort

A classical sequential sorting algorithm using divide-and-conquer approach
Unsorted list first divided into half. Each half is again divided into two. Continued until individual numbers obtained.
Then pairs of numbers combined (merged) into sorted list of two numbers.
Pairs of these lists of four numbers are merged into sorted lists of eight numbers.
This is continued until the one fully sorted list is obtained.

_{Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003.}



In [5]:

    
%%writefile codes/mpi4py/merge.py
#!/usr/bin/env python
# merge.py
import numpy as np
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank(); size = comm.Get_size(); status = MPI.Status();
N = 16
unsorted = np.zeros(N, dtype="int")
final_sorted = np.zeros(N, dtype="int")
local_array = np.zeros(int(N / size), dtype="int")
local_tmp = np.zeros(int(N / size), dtype="int")
local_remain = np.zeros(2 * int(N / size), dtype="int")

if rank == 0:
    unsorted = np.random.randint(low=0,high=N,size=N)
    print (unsorted)
comm.Scatter(unsorted, local_array, root = 0)

local_array.sort()

step = size / 2
print ("Rank: ", rank)
while (step >= 1):
    if (rank >= step and rank < step * 2):
        comm.Send(local_array, rank - step, tag = 0)
    elif (rank < step):
        local_tmp = np.zeros(local_array.size, dtype="int")
        local_remain = np.zeros(2 * local_array.size, dtype="int")
        comm.Recv(local_tmp, rank + step, tag = 0)
        i = 0 #local_array counter
        j = 0 # local_tmp counter
        for k in range (0, 2 * local_array.size):
            if (i >= local_array.size):
                local_remain[k] = local_tmp[j]
                j += 1
            elif (j >= local_array.size):
                local_remain[k] = local_array[i]
                i += 1
            elif (local_array[i] > local_tmp[j]):
                local_remain[k] = local_tmp[j]
                j += 1
            else:
                local_remain[k] = local_array[i]
                i += 1        
        print ("Step: ", step)
        print ("local array: ", local_array)
        print ("local tmp: ", local_tmp)
        print ("local remain: ", local_remain)
        local_array = local_remain
    step = step / 2
    
if (rank  == 0):
    print (local_array)









    



Writing codes/mpi4py/merge.py



In [5]:

    
!chmod 755 codes/mpi4py/merge.py
!module load gcc/5.3.0 openmpi/1.10.3; mpirun -np 4 codes/mpi4py/merge.py









    



[ 3 14  8  8  8  2  8 10 14  5  3 10  7 11  7 10]
Rank:  0
Step:  2.0
local array:  [ 3  8  8 14]
Rank:  1
Step:  2.0
Rank:  2
Rank:  3
local tmp:  [ 3  5 10 14]
local remain:  [ 3  3  5  8  8 10 14 14]
Step:  1.0
local array:  [ 3  3  5  8  8 10 14 14]
local tmp:  [ 2  7  7  8  8 10 10 11]
local array:  [ 2  8  8 10]
local tmp:  [ 7  7 10 11]
local remain:  [ 2  7  7  8  8 10 10 11]
local remain:  [ 2  3  3  5  7  7  8  8  8  8 10 10 10 11 14 14]
[ 2  3  3  5  7  7  8  8  8  8 10 10 10 11 14 14]



In [ ]: