In [1]:

    
name = "2019-10-08-speedy-python"
title = "Speeding up python using tools from the standard library"
tags = "basics, optimisation, numpy, multiprocessing"
author = "Callum Rollo"



In [2]:

    
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML

html = connect_notebook_to_post(name, title, tags, author)

To demonstrate Python's performance, we'll use a short function



In [3]:

    
import math
import numpy as np
from datetime import datetime



In [4]:

    
def cart2pol(x, y):
    r = np.sqrt(x**2 + y**2)
    phi = np.arctan2(y, x)
    return(r, phi)

As the name suggest cart2pol converts a pair of cartesian coordinates [x, y] to polar coordinates [r, phi]



In [5]:

    
from IPython.core.display import Image 
Image(url='https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Polar_to_cartesian.svg/1024px-Polar_to_cartesian.svg.png',width=400)









    Out[5]:



In [6]:

    
x = 3
y = 4
r, phi = cart2pol(x,y)
print(r,phi)









    



5.0 0.9272952180016122

All well and good. However, what if we want to convert a list of cartesian coordinates to polar coordinates?

We could loop through both lists and perform the conversion for each x-y pair:



In [7]:

    
def cart2pol_list(list_x, list_y):
    # Prepare empty lists for r and phi values
    r = np.empty(len(list_x))
    phi = np.empty(len(list_x))
    
    # Loop through the lists of x and y, calculating the r and phi values
    for i in range(len(list_x)):
        r[i] = np.sqrt(list_x[i]**2 + list_y[i]**2)
        phi[i] = np.arctan2(list_y[i], list_x[i])
    
    return(r, phi)



In [8]:

    
x_list = np.sin(np.arange(0,2*np.pi,0.1))
y_list = np.cos(np.arange(0,2*np.pi,0.1))

These coordinates make a circle centered at [0,0]



In [9]:

    
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(6,6))
ax.scatter(x_list,y_list)









    Out[9]:





<matplotlib.collections.PathCollection at 0x7fedf547b8d0>



In [10]:

    
r_list, phi_list = cart2pol_list(x_list,y_list)
print(r_list)
print(phi_list)









    



[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1.57079633  1.47079633  1.37079633  1.27079633  1.17079633  1.07079633
  0.97079633  0.87079633  0.77079633  0.67079633  0.57079633  0.47079633
  0.37079633  0.27079633  0.17079633  0.07079633 -0.02920367 -0.12920367
 -0.22920367 -0.32920367 -0.42920367 -0.52920367 -0.62920367 -0.72920367
 -0.82920367 -0.92920367 -1.02920367 -1.12920367 -1.22920367 -1.32920367
 -1.42920367 -1.52920367 -1.62920367 -1.72920367 -1.82920367 -1.92920367
 -2.02920367 -2.12920367 -2.22920367 -2.32920367 -2.42920367 -2.52920367
 -2.62920367 -2.72920367 -2.82920367 -2.92920367 -3.02920367 -3.12920367
  3.05398163  2.95398163  2.85398163  2.75398163  2.65398163  2.55398163
  2.45398163  2.35398163  2.25398163  2.15398163  2.05398163  1.95398163
  1.85398163  1.75398163  1.65398163]

This is a bit time consuming to type out though, surely there is a better way to make our functions work for lists of inputs?

Step forward vectorise



In [11]:

    
cart2pol_vec = np.vectorize(cart2pol)



In [12]:

    
r_list_vec, phi_list_vec = cart2pol_vec(x_list, y_list)

Like magic! We can assure ourselves that these two methods produce the same answers



In [13]:

    
print(r_list == r_list_vec)
print(phi_list == phi_list_vec)









    



[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True]
[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True]

But how do they perform?

We can use Python's magic %timeit function to test this



In [14]:

    
%timeit cart2pol_list(x_list, y_list)









    



272 µs ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



In [15]:

    
%timeit cart2pol_vec(x_list, y_list)









    



161 µs ± 9.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

It is significantly faster, both for code writing and at runtime, to use vectorsie rather than manually looping through lists

Multiprocessing

Another important consideration when code becomes computationally intensive is multiprocessing. Python normally runs on one core, so you won't feel the full benefit of your quad-core or greater machine. You can see this when you run a section of code. To demonstrate the effect of multiprocessing we'll need some heftier maths:



In [16]:

    
def do_maths(start=0, num=10):
    pos = start
    big = 1000 * 1000
    ave = 0
    while pos < num:
        pos += 1
        val = math.sqrt((pos - big) * (pos - big))
        ave += val / num

    return int(ave)



In [17]:

    
t0 = datetime.now()

do_maths(num=30000000)

dt = datetime.now() - t0
print("Done in {:,.2f} sec.".format(dt.total_seconds()))









    



Done in 8.93 sec.



In [18]:

    
import multiprocessing



In [19]:

    
t0 = datetime.now()

pool = multiprocessing.Pool()
processor_count = multiprocessing.cpu_count()
# processor_count = 2 # we can Python to use a specific number of cores if desired

print(f"Computing with {processor_count} processor(s)")
tasks = []
for n in range(1, processor_count + 1):
    task = pool.apply_async(do_maths, (30000000 * (n - 1) / processor_count,
                                      30000000 * n / processor_count))
    
    tasks.append(task)

pool.close()
pool.join()

dt = datetime.now() - t0
print("Done in {:,.2f} sec.".format(dt.total_seconds()))









    



Computing with 4 processor(s)
Done in 4.05 sec.

Note that you can recover results stored in the task list with get(). This list will be in the same order as that which you used to spawn the processes



In [20]:

    
for t in tasks:
    print(t.get())

The structure of a multiproccess call is:



In [21]:

    
pool = multiprocessing.Pool() # Make a pool ready to recieve taks
results = [] # empty list for results
for n in range(1, processor_count + 1): # Loop for assigning a number of tasks
    result = pool.appy_async(function, (arguments)) # make a task by passing it a function and arguments
    results.append(result) # append the result(s) of this task to the list

pool.close() # tell async there are no more tasks coming
pool.join() # start running the tasks concurrently

for t in results:
    t.get() # ret`rieve your results, You could print or assign each result to a variable









    Out[21]:





'\npool = multiprocessing.Pool() # Make a pool ready to recieve taks\nresults = [] # empty list for results\nfor n in range(1, processor_count + 1): # Loop for assigning a number of tasks\n    result = pool.appy_async(function, (arguments)) # make a task by passing it a function and arguments\n    results.append(result) # append the result(s) of this task to the list\n\npool.close() # tell async there are no more tasks coming\npool.join() # start running the tasks concurrently\n\nfor t in results:\n    t.get() # ret`rieve your results, You could print or assign each result to a variable\n'

Why can't we multithread in Python?

If you have experience of other programming languages, you may wonder why we can't assign tasks to multiple threads to speed up execution. We are prevented from doing this by the Global Interpreter Lock (GIL). This is a lock on the interpreter which ensures that only one thread can be in a state of execution at any one time. This is essential to protect Python's reference system that keeps track of all of the objects in memory.

To get around this lock we spawn several processes which each have their own instance of the interpreter and allocated memory so cannot block one another or cause mischief with references. There's a great summary of the GIL on the real Python website here

tl;dr multithreading won't speed up your compute heavy calcualtions as only one thread can execute at any. time. Use multiprocessing instead

Multiprocessing example adapted from Talk Python To Me Training: async techniques



In [22]:

    
HTML(html)









    Out[22]:





    
     This post was written as an IPython (Jupyter) notebook. You can view or download it using
    nbviewer.