Compiling Python Modules to Native Parallel Modules Using Pythran and OpenMP Annotations

  • Serge Guelton, École Normale Supérieure, Paris, France & Télécom Bretagne, Plouzané, France
  • Pierrick Brunet, Télécom Bretagne, Plouzané, France
  • Mehdi Amini, SILKAN INC., Los Altos, USA

Get it - PyHPC release!


In [2]:
from IPython.display import IFrame
IFrame("http://pythonhosted.org/pythran/", 1000, 500)


Out[2]:

Introduction to Pythran


In [3]:
%%sh
pythran --help


usage: pythran [-h] [-o OUTPUT_FILE] [-E] [-e] [-f flag] [-v] [-p pass] [-m machine] [-I include_dir] [-L ldflags] [-D macro_definition] [-O level] [-g] input_file

pythran: a python to C++ compiler

positional arguments:
  input_file           the pythran module to compile, either a .py or a .cpp file

optional arguments:
  -h, --help           show this help message and exit
  -o OUTPUT_FILE       path to generated file
  -E                   only run the translator, do not compile
  -e                   similar to -E, but does not generate python glue
  -f flag              any compiler switch relevant to the underlying C++ compiler
  -v                   be verbose
  -p pass              any pythran optimization to apply before code generation
  -m machine           any machine flag relevant to the underlying C++ compiler
  -I include_dir       any include dir relevant to the underlying C++ compiler
  -L ldflags           any search dir relevant to the linker
  -D macro_definition  any macro definition relevant to the underlying C++ compiler
  -O level             any optimization level relevant to the underlying C++ compiler
  -g                   any debug level relevant to the underlying C++ compiler

It's a megablast!

Step 1: Using Pythran to Turn Python Code into C++ Metaprograms


In [4]:
%%file hello_world.py
def hello(s="world"):
    print "hello", s


Overwriting hello_world.py

In [5]:
%%sh
pythran -e hello_world.py -o hello_world.hpp
head -n 20 hello_world.hpp


#include <pythran/pythran.h>
namespace __pythran_hello_world
{
  ;
  ;
  struct hello
  {
    typedef void callable;
    template <typename argument_type0 = core::string>
    struct type
    {
      typedef typename assignable<typename std::remove_cv<typename std::remove_reference<decltype(__builtin__::None)>::type>::type>::type result_type;
    }  
    ;
    template <typename argument_type0 = core::string>
    typename type<argument_type0>::result_type operator()(argument_type0 const & s= core::string("world")) const
    ;
  }  ;
  template <typename argument_type0 >
  typename hello::type<argument_type0>::result_type hello::operator()(argument_type0 const & s) const

In [6]:
%%file hello_world.cpp
#include "hello_world.hpp"
#include <iostream>

using namespace __pythran_hello_world;

int main(int argc, char *argv[])
{
 if(argc == 1)
    hello()();
 else
 {
    std::string msg(argv[1]);
    hello()(msg);
 }
 return 0;
}


Overwriting hello_world.cpp

In [7]:
%%sh
INCLUDES="-I $HOME/.local/lib/python2.7/site-packages -I $HOME/.local/lib/python2.7/site-packages/pythran -I $HOME/.local/lib/python2.7/site-packages/pythran/pythonic++"
clang++ -std=c++11 $INCLUDES hello_world.cpp -o hello_world

In [8]:
%%sh
./hello_world
./hello_world donald


hello world 
hello donald 

Step 2: Using Pythran to Turn Python Code Into Native Modules


In [9]:
%%sh
pythran hello_world.py


CRITICAL I am in trouble. Your input file does not seem to match Pythran's constraints...
E: Pythran spec error: no pythran specification

In [10]:
%%sh
sed -i '1 i #pythran export hello(str)' hello_world.py
head -n 3 hello_world.py


#pythran export hello(str)
def hello(s="world"):
    print "hello", s

In [11]:
%%sh
pythran hello_world.py
ls hello_world.so
python -c "import hello_world as hw ; hw.hello('PyHPC')"


hello_world.so
hello PyHPC 

Step 3: Running Numpy Code


In [12]:
%%file arc_distance.py
#pythran export arc_distance(float [], float[], float[], float[])
import numpy as np
def arc_distance(theta_1, phi_1,theta_2, phi_2):
    """Calculates the pairwise arc distance between all points in vector a and b."""
    temp = (np.sin((theta_2 - theta_1) / 2) ** 2
            + np.cos(theta_1) * np.cos(theta_2) * np.sin((phi_2 - phi_1) / 2) ** 2)
    distance_matrix = 2 * (np.arctan2(np.sqrt(temp), np.sqrt(1 - temp)))
    return distance_matrix


Overwriting arc_distance.py

In [13]:
%pylab inline
import timeit


Populating the interactive namespace from numpy and matplotlib

In [14]:
n = 10000000
t0, p0, t1, p1 = random.random(n), random.random(n), random.random(n), random.random(n),

In [15]:
import arc_distance as ad
%timeit ad.arc_distance(t0, p0, t1, p1)


1 loops, best of 3: 2.15 s per loop

In [16]:
%%sh
pythran arc_distance.py -O3 -o pythran_arc_distance.so
pythran arc_distance.py -O3 -fopenmp -o pythran_fast_arc_distance.so

In [17]:
import pythran_arc_distance as pad
%timeit pad.arc_distance(t0, p0, t1, p1)


1 loops, best of 3: 1.72 s per loop

In [18]:
import pythran_fast_arc_distance as pfad
%timeit pfad.arc_distance(t0, p0, t1, p1)


1 loops, best of 3: 304 ms per loop

An Introduction to OpenMP with Pythran


In [19]:
%%file parallel_hello.py
#pythran export parallel_hello()
def parallel_hello():
    #omp parallel
    print "hello"


Overwriting parallel_hello.py

In [20]:
import parallel_hello as ph
ph.parallel_hello()


hello

In [21]:
%%sh
pythran parallel_hello.py -o sequential_hello.so
pythran parallel_hello.py -fopenmp -o real_parallel_hello.so

In [22]:
%%sh
python -c 'import sequential_hello as sh ; sh.parallel_hello()'
echo "*********"
python -c 'import real_parallel_hello as ph ; ph.parallel_hello()'


hello 
*********
hello hellohello 
 

hello 
hello 
hello 
hello 
hello 

In [23]:
%load_ext pythranmagic

In [24]:
%%pythran
#pythran export nthreads()
def nthreads():
    import omp
    m = omp.get_num_threads()
    #omp parallel shared(n)
    #omp master
    n = omp.get_num_threads()
    return m, n

In [25]:
nthreads()


Out[25]:
(1, 8)

Parallelizing Loops with Pythran and OpenMP


In [26]:
%%pythran
#pythran export pi(int)
def pi(n):
    s, step = 0, 1 / (1 + n)
    for i in range(n):
        x = (i - .5) * step
        s += 4. / (1 + x ** 2)
    return step * s

In [27]:
n = 1000000
%timeit pi(n)


100 loops, best of 3: 10 ms per loop

In [28]:
%%pythran
#pythran export pi(int)
def pi(n):
    s, step = 0, 1 / (1 + n)
    #omp parallel for reduction(+:s)
    for i in range(n):
        x = (i - .5) * step
        s += 4. / (1 + x ** 2)
    return step * s

In [29]:
%timeit pi(n)


1000 loops, best of 3: 1.39 ms per loop

Parallelizing with Tasks


In [30]:
import numpy as np
def nested_loop(a):
    n = a.shape[0]
    b = np.zeros((n,n))
    for i in range(n):
        for j in range(i):
            b[i,j] = np.cos(i * j)
    return b

In [31]:
n = 400
r = random.random(n)
%timeit nested_loop(r)


1 loops, best of 3: 206 ms per loop

In [32]:
%%pythran
#pythran export nested_loop(float [])
import numpy as np
def nested_loop(a):
    n = a.shape[0]
    b = np.zeros((n,n))
    #omp parallel for
    for i in range(n):
        for j in range(i):
            #omp task
            b[i,j] = np.cos(i * j)
    return b

In [33]:
%timeit nested_loop(r)


10 loops, best of 3: 26.9 ms per loop

In [34]:
%%pythran
#pythran export nested_loop(float [])
import numpy as np
def nested_loop(a):
    n = a.shape[0]
    b = np.zeros((n,n))
    #omp parallel
    #omp single
    for i in range(n):
        for j in range(i):
            #omp task
            b[i,j] = np.cos(i * j)
    return b

In [35]:
%timeit nested_loop(r)


10 loops, best of 3: 15.9 ms per loop

Parallelizing Complex reductions


In [36]:
import numpy as np
def minmax(arr):
    n, m = arr.shape
    mini, maxi = np.inf, -np.inf
    for i in range(n):
        for j in range(m):
            mini = min(mini, arr[i, j])
            maxi = max(maxi, arr[i, j])
    return mini, maxi

In [37]:
n = 1000
r = random.random((n,n))
minmax(r)


Out[37]:
(5.6717835938968619e-07, 0.99999986590219125)

In [38]:
%timeit minmax(r)


1 loops, best of 3: 680 ms per loop

In [39]:
%%pythran
import numpy as np
#pythran export minmax(float[][])
def minmax(arr):
    n, m = arr.shape
    mini, maxi = np.inf, -np.inf
    #omp parallel private(lmini, lmaxi)
    if 1:
        lmini, lmaxi = np.inf, -np.inf
        #omp for
        for i in range(n):
            for j in range(m):
                lmini = min(lmini, arr[i, j])
                lmaxi = max(lmaxi, arr[i, j])
        #omp critical
        if 1:
            mini = min(lmini, mini)
            maxi = max(lmaxi, maxi)
    return mini, maxi

In [40]:
minmax(r)


Out[40]:
(5.671783593896862e-07, 0.9999998659021913)

In [41]:
%timeit minmax(r)


1000 loops, best of 3: 224 µs per loop

Integration with Iterators


In [43]:
%%pythran
#pythran export enum(str list)
def enum(l):
    out = 0
    #omp parallel for
    for i, v in enumerate(l):
        if len(v) == i:
            #omp atomic
            out += 1
    return out

In [44]:
enum(['Say', 'Hello', 'to', 'PyHPC', 'from', 'Denver'])


Out[44]:
2

Validation

  • An OpenMP 3.1 validation test suite, C. Wang, S. Chandrasekaran and B. M. Chapman in 8th International Workshop on OpenMP

In [45]:
IFrame("http://numfocus.github.io/python-benchmarks/", 1000, 1000)


Out[45]:

In [46]:
%%pythran bye.py
#pythran export bye(int)
import math
def bye(n=3):
    radii   = [i * n for i in [1, 3, 6]]
    ranges  = [list(range(-r, r+1)) for r in radii]
    squares = [[ (x,y) for x in rnge for y in rnge] for rnge in ranges]
    circles = [[ (x,y) for x,y in sqrpoints if math.hypot(x,y) <= radius ]for sqrpoints, radius in zip(squares, radii)]
    m = {(x,y):' ' for x,y in squares[-1]}
    for x,y in circles[-1]:
        m[x,y] = '*'
    for x,y in circles[-1]:
        if x>0: m[(x,y)] = '.'
    for x,y in circles[-2]:
        m[(x,y+3*n)] = '*'
        m[(x,y-3*n)] = '.'
    for x,y in circles[-3]:
        m[(x,y+3*n)] = '.'
        m[(x,y-3*n)] = '*'
    return '\n'.join(''.join(m[(x,y)] for x in reversed(ranges[-1])) for y in ranges[-1])

In [47]:
print bye(5)


                              .                              
                       .............**                       
                    ..................***                    
                 .......................****                 
                .........................****                
              ............................*****              
            ...............................******            
           ................................*******           
          ..................................*******          
         ...................................********         
        ......................*..............********        
       ....................*******...........*********       
      ....................*********..........**********      
      ....................*********..........**********      
     .....................*********..........***********     
     ....................***********..........**********     
    ......................*********..........************    
   .......................*********..........*************   
   .......................*********..........*************   
   ........................*******...........*************   
  ............................*..............**************  
  ..........................................***************  
  ..........................................***************  
 ..........................................***************** 
 ..........................................***************** 
 .........................................****************** 
 ........................................******************* 
 .......................................******************** 
 .....................................********************** 
 ...................................************************ 
...............................******************************
 ........................*********************************** 
 ......................************************************* 
 ....................*************************************** 
 ...................**************************************** 
 ..................***************************************** 
 .................****************************************** 
 .................****************************************** 
  ...............******************************************  
  ...............******************************************  
  ..............**************.****************************  
   .............***********.......************************   
   .............**********.........***********************   
   .............**********.........***********************   
    ............**********.........**********************    
     ..........**********...........********************     
     ...........**********.........*********************     
      ..........**********.........********************      
      ..........**********.........********************      
       .........***********.......********************       
        ........**************.**********************        
         ........***********************************         
          .......**********************************          
           .......********************************           
            ......*******************************            
              .....****************************              
                ....*************************                
                 ....***********************                 
                    ...******************                    
                       ..*************                       
                              *                              

In [ ]: