#

Install pp: pip install pp

web page :http://www.parallelpython.com/

#

pp in one of the many libraries availables for parallel coding.

The default Python interpreter was designed with simplicity in mind and has a thread-safe mechanism, the so-called “GIL” (Global Interpreter Lock). In order to prevent conflicts between threads, it executes only one statement at a time (so-called serial processing, or single-threading).

Depending on the application, two common approaches in parallel programming are either to run code via threads or multiple processes, respectively. If we submit “jobs” to different threads, those jobs can be pictured as “sub-tasks” of a single process and those threads will usually have access to the same memory areas (i.e., shared memory). This approach can easily lead to conflicts in case of improper synchronization, for example, if processes are writing to the same memory location at the same time.

Threads= GIL problem. I.e. From my needs not efficient :(

Parallel python uses the availables cores, i.e. it run 2,3..n CPUs

For more advance needs Please also check: joblib!! http://pythonhosted.org/joblib/

Brute force approach from bash script for 4 CPUs

python my_code.py input1 &

python my_code.py input2 &

python my_code.py input3 &

python my_code.py input4 &

wait

python my_code.py input5 &

python my_code.py input6 &

python my_code.py input7 &

python my_code.py input8 &

wait

Quick start:

Start pp execution server

Job_server=pp.Server()

Submit jobs for parallel execution:

f1 = job_server.submit(func1, args1, depfuncs1, modules1)

f2 = job_server.submit(func1, args2, depfuncs1, modules1)

f3 = job_server.submit(func2, args3, depfuncs2, modules2)

where:

func1= function

args1=args

depfuncs1=dependencies

modules1=modules

Retrieve the results:

r1=f1()

r2=f2()

r3=f3()



In [3]:

    
##################################
#  Example 1 Parallel python
#################################
import pp
import math, sys, time

def sum_all(n):
    """Calculates sum of n"""
    return sum([x for x in xrange(1,n)])

#Setting the parallelization 

# tuple of all parallel python servers to connect with
ppservers = ()  #create a list of all nodes, can be ignored in single coputer

#edit the numbers of cpus here
ncpus = 4

#or by default:
#job_server = pp.Server()

job_server = pp.Server(ncpus, ppservers=ppservers)

print "Starting pp with", job_server.get_ncpus(), "workers"

#time
start_time = time.time()

# The following submits 10 jobs and then retrieves the results
#Numeros = (1000000, 1001000, 1000200, 1003000, 1004000, 1005000, 1006000, 1007000,10000,120000)
# Or in a for loop..  

Numeros = [n for n in range (10000000,10000010)]

print Numeros
# magic begins...

jobs = [(input, job_server.submit(sum_all,(input,),)) for input in Numeros]
for input, job in jobs:
    print "Sum of Numbers", input, "is", job()

print "Time elapsed: ", time.time() - start_time, "s"
job_server.print_stats()









    



Starting pp with 4 workers
[10000000, 10000001, 10000002, 10000003, 10000004, 10000005, 10000006, 10000007, 10000008, 10000009]
Sum of Numbers 10000000 is 49999995000000
Sum of Numbers 10000001 is 50000005000000
Sum of Numbers 10000002 is 50000015000001
Sum of Numbers 10000003 is 50000025000003
Sum of Numbers 10000004 is 50000035000006
Sum of Numbers 10000005 is 50000045000010
Sum of Numbers 10000006 is 50000055000015
Sum of Numbers 10000007 is 50000065000021
Sum of Numbers 10000008 is 50000075000028
Sum of Numbers 10000009 is 50000085000036
Time elapsed:  3.4311311245 s
Job execution statistics:
 job count | % of all jobs | job time sum | time per job | job server
        10 |        100.00 |      12.1383 |     1.213835 | local
Time elapsed since server creation 3.43221092224
0 active tasks, 4 cores



In [6]:

    
##################################
#  Example 2 Parallel python
#################################
import pp
import math, sys, time
import pp

#boundaries 
def isprime(n):
    """Returns True if n is prime and False otherwise"""
    if not isinstance(n, int):
        raise TypeError("argument passed to is_prime is not of 'int' type")
    if n < 2:
        return False
    if n == 2:
        return True
    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True

def sum_primes(n):
    """Calculates sum of all primes below given integer n"""
    return sum([x for x in xrange(2,n) if isprime(x)])

start_time = time.time()

# tuple of all parallel python servers to connect with
ppservers = ()

#edit the numbers of cpus here
#ncpus = 1 
#job_server = pp.Server(ncpus, ppservers=ppservers)

#or autodetec the  N of cpus
job_server = pp.Server(ncpus='autodetect', ppservers=ppservers)


print "Starting pp with", job_server.get_ncpus(), "workers"


#Numeros = (1,0,1,100000, 100100, 100200, 100300, 100400, 100500, 100600, 100700,10000,120000)
Numeros = ((int(n)) for n in range (1000000,1000010))

jobs = [(input, job_server.submit(sum_primes,(input,), (isprime,), ("math",))) for input in Numeros]
for input, job in jobs:
    print "Sum of primes below", input, "is", job()

print "Time elapsed: ", time.time() - start_time, "s"
job_server.print_stats()









    



Starting pp with 4 workers
Sum of primes below 1000000 is 37550402023
Sum of primes below 1000001 is 37550402023
Sum of primes below 1000002 is 37550402023
Sum of primes below 1000003 is 37550402023
Sum of primes below 1000004 is 37551402026
Sum of primes below 1000005 is 37551402026
Sum of primes below 1000006 is 37551402026
Sum of primes below 1000007 is 37551402026
Sum of primes below 1000008 is 37551402026
Sum of primes below 1000009 is 37551402026
Time elapsed:  56.0738520622 s
Job execution statistics:
 job count | % of all jobs | job time sum | time per job | job server
        10 |        100.00 |     200.5862 |    20.058619 | local
Time elapsed since server creation 55.8320801258
0 active tasks, 4 cores



In [ ]: