Code Optimization and Multi Threading

Writing Python code is one thing - writing efficient code is a much different thing. Optimizing your code may take a while; if it takes longer to work on the code than it takes to run it, the additional time spent on optimization is a bad investment. However, if you use that specific piece of code on a daily basis, it might be worth to spend more time on the optimization.

Code Optimization

You can optimize the runtime of your code by identifying bottlenecks: find passages of your code that govern its runtime. Consider the following example in which the integral over a function is determined using the quadrature rule (https://en.wikipedia.org/wiki/Numerical_integration#Quadrature_rules_based_on_interpolating_functions):


In [1]:
import time

def square(x):
    return x**2

def quadrature(func, a, b, n=10000000):
    """ use the quadrature rule to determine the integral over the function from a to b"""

    # calculate individual elements
    integral_elements = [func(a)/2.]
    for k in range(1, n):
        integral_elements.append(func(a+k*float(b-a)/n))
    integral_elements.append(func(b)/2.)
    
    # sum up elements
    result = 0
    for element in integral_elements:
        result += element
        
    # normalize result and return
    return float(b-a)/n*result
    
start = time.time()
print quadrature(square, 0, 1)    
print 'this took', time.time()-start, 's'


0.333333333333
this took 3.80633187294 s

The code works and returns the correct result - but it takes a while to calculate the result. This is in part due to the fact the we cut our function into 10000000 segments; a smaller number would work, too, but it we need this many iterations to signify the impact different functions used here have.

Let's see how the runtime changes by changing the code.

Lists

List functions are useful but very memory consuming: every time something is appended to a list, Python checks how much memory is left at current location in the memory to add future elements. If the memory at the current location runs low, it will move the list in the memory, reserving space for additional list elements up to 1/2 the length of the current list - this is a problem for large lists.

In this case, the use of lists is not necessary. Instead of using integral_elements, we can add up the results variable (a simple float value) on-the-fly, which also saves us the second for loop:


In [2]:
def quadrature(func, a, b, n=10000000):
    """ use the quadrature rule to determine the integral over the function from a to b"""

    result = 0
    
    # calculate individual elements and sum them up
    result += func(a)/2.
    for k in range(1, n):
        result += func(a+k*float(b-a)/n)
    result += func(b)/2.
           
    # normalize result and return
    return float(b-a)/n*result
    
start = time.time()
print quadrature(square, 0, 1)    
print 'this took', time.time()-start, 's'


0.333333333333
this took 3.17584586143 s

This is only a little bit faster, since there is still a list function in the code: range. Instead of using the for loop, let's try a while loop that uses an integer to count:


In [3]:
def quadrature(func, a, b, n=10000000):
    """ use the quadrature rule to determine the integral over the function from a to b"""

    result = 0
    k = 1
    
    # calculate individual elements and sum them up
    result += func(a)/2.
    while k < n:
        result += func(a+k*float(b-a)/n)
        k += 1
    result += func(b)/2.
           
    # normalize result and return
    return float(b-a)/n*result
    
start = time.time()
print quadrature(square, 0, 1)    
print 'this took', time.time()-start, 's'


0.333333333333
this took 3.14508199692 s

This is only a little bit faster.

Numpy Arrays

A good approach to saving runtime is always to use numpy functionality.


In [4]:
import numpy as np

def quadrature(func, a, b, n=10000000):
    """ use the quadrature rule to determine the integral over the function from a to b"""

    result = 0
        
    # calculate individual elements and sum them up
    result += func(a)/2.
    steps = a + np.arange(1, n, 1)*float(b-a)/n
    result += np.sum(func(steps))
    result += func(b)/2.
           
    # normalize result and return
    return float(b-a)/n*result
    
start = time.time()
print quadrature(square, 0, 1)    
print 'this took', time.time()-start, 's'


0.333333333333
this took 0.158165931702 s

Using numpy makes a huge difference, since numpy uses libraries written in C, which is much faster than Python.

Numpy and Scipy Functions

The most important rule to optimize your code is the following: see if what you want to do is already implemented in numpy, scipy, or some other module. Usually, people writing code for these modules know what they do and use a very efficient implementation. For instance, quadrature integration is actually already implemented as part of scipy.integrate:


In [5]:
from scipy.integrate import quad

start = time.time()
print quad(square, 0, 1)    
print 'this took', time.time()-start, 's'


(0.33333333333333337, 3.700743415417189e-15)
this took 0.000202894210815 s

Using scipy.integrate.quad is again significantly faster. This is in part due to the fact that it does not use 10000000 segments over which it calculates the integral. Instead, it uses an adaptive algorithm that estimates (and outputs) the uncertainty on the integral and stop iterating if that uncertainty is smaller than some threshold.

General Guideline to Optimize your Code

Use the following rules in your coding to make your code run efficiently:

  1. whatever you want to do, check if there is already an existing function available from numpy, scipy, some other module
  2. minimize the use of lists; use tuples or numpy arrays (better) instead
  3. use numpy functions on arrays - they are especially designed for that and usually run faster than list functions

Multiprocessing

You will often have to deal with problems that require running the exact same code for a number of different input parameters. The runtime of the entire program will be long, since all processes (i.e., each run using a set of input parameters) will be run sequentially.


In [6]:
import time
import numpy as np
from scipy.integrate import quad

def task(x):
    """ this simulates a complicated tasks that takes input x and returns some float based on x"""
    time.sleep(1) # simulate that calculating the result takes 1 sec
    return x**2

# an array with input parameters
input = np.random.rand(10)

start = time.time()
results = []
for x in input:
    results.append(task(x))

print results
print 'this took', time.time()-start, 's'


[0.07366964610160269, 0.18143295552251407, 0.62976092819710328, 0.49838579302847913, 0.97411898814538789, 0.1049180110098413, 0.0011164597668119032, 0.31091003837959097, 0.69263899078246582, 0.030961469574162265]
this took 10.010764122 s

The results generated by the different tasks do not rely on each other, i.e., they could in principle be all calculated at the same time. This can be realized using the multiprocessing module:


In [7]:
from multiprocessing import Pool

pool = Pool()   # create a pool object

start = time.time()

results = pool.map(task, input)   # map the function 'task' on the input data

print results
print 'this took', time.time()-start, 's'


[0.07366964610160269, 0.18143295552251407, 0.62976092819710328, 0.49838579302847913, 0.97411898814538789, 0.1049180110098413, 0.0011164597668119032, 0.31091003837959097, 0.69263899078246582, 0.030961469574162265]
this took 2.00300788879 s

What happens is that all input parameters are pooled and the task is applied to each of them in a number of processes. The number of processes is the same as the number of input data. The processes are run in no specific order, but the results list has the same order as the input array.

The real magic is that the whole script only takes 2 seconds instead of 10. This is possible since the processes are distributed over a number of threads, each of which uses only a fraction of the computational power of your CPU (or CPUs). Hence, different processes can be run in parallel, fully exploiting the power of your CPU.

Not all things make sense to be run in threads. For instance, if each process has to access a file on your hard drive, multiprocessing would most likely not improve the runtime of your script. The reason is simply that your hard drive has only one read/write head, which all processes have to share.