Writing Python code is one thing - writing efficient code is a much different thing. Optimizing your code may take a while; if it takes longer to work on the code than it takes to run it, the additional time spent on optimization is a bad investment. However, if you use that specific piece of code on a daily basis, it might be worth to spend more time on the optimization.
You can optimize the runtime of your code by identifying bottlenecks: find passages of your code that govern its runtime. Consider the following example in which the integral over a function is determined using the quadrature rule (https://en.wikipedia.org/wiki/Numerical_integration#Quadrature_rules_based_on_interpolating_functions):
In [1]:
import time
def square(x):
return x**2
def quadrature(func, a, b, n=10000000):
""" use the quadrature rule to determine the integral over the function from a to b"""
# calculate individual elements
integral_elements = [func(a)/2.]
for k in range(1, n):
integral_elements.append(func(a+k*float(b-a)/n))
integral_elements.append(func(b)/2.)
# sum up elements
result = 0
for element in integral_elements:
result += element
# normalize result and return
return float(b-a)/n*result
start = time.time()
print quadrature(square, 0, 1)
print 'this took', time.time()-start, 's'
The code works and returns the correct result - but it takes a while to calculate the result. This is in part due to the fact the we cut our function into 10000000 segments; a smaller number would work, too, but it we need this many iterations to signify the impact different functions used here have.
Let's see how the runtime changes by changing the code.
List functions are useful but very memory consuming: every time something is appended to a list, Python checks how much memory is left at current location in the memory to add future elements. If the memory at the current location runs low, it will move the list in the memory, reserving space for additional list elements up to 1/2 the length of the current list - this is a problem for large lists.
In this case, the use of lists is not necessary. Instead of using integral_elements
, we can add up the results
variable (a simple float value) on-the-fly, which also saves us the second for
loop:
In [2]:
def quadrature(func, a, b, n=10000000):
""" use the quadrature rule to determine the integral over the function from a to b"""
result = 0
# calculate individual elements and sum them up
result += func(a)/2.
for k in range(1, n):
result += func(a+k*float(b-a)/n)
result += func(b)/2.
# normalize result and return
return float(b-a)/n*result
start = time.time()
print quadrature(square, 0, 1)
print 'this took', time.time()-start, 's'
This is only a little bit faster, since there is still a list function in the code: range
. Instead of using the for
loop, let's try a while loop that uses an integer to count:
In [3]:
def quadrature(func, a, b, n=10000000):
""" use the quadrature rule to determine the integral over the function from a to b"""
result = 0
k = 1
# calculate individual elements and sum them up
result += func(a)/2.
while k < n:
result += func(a+k*float(b-a)/n)
k += 1
result += func(b)/2.
# normalize result and return
return float(b-a)/n*result
start = time.time()
print quadrature(square, 0, 1)
print 'this took', time.time()-start, 's'
In [4]:
import numpy as np
def quadrature(func, a, b, n=10000000):
""" use the quadrature rule to determine the integral over the function from a to b"""
result = 0
# calculate individual elements and sum them up
result += func(a)/2.
steps = a + np.arange(1, n, 1)*float(b-a)/n
result += np.sum(func(steps))
result += func(b)/2.
# normalize result and return
return float(b-a)/n*result
start = time.time()
print quadrature(square, 0, 1)
print 'this took', time.time()-start, 's'
Using numpy
makes a huge difference, since numpy
uses libraries written in C
, which is much faster than Python.
The most important rule to optimize your code is the following: see if what you want to do is already implemented in numpy
, scipy
, or some other module. Usually, people writing code for these modules know what they do and use a very efficient implementation. For instance, quadrature integration is actually already implemented as part of scipy.integrate
:
In [5]:
from scipy.integrate import quad
start = time.time()
print quad(square, 0, 1)
print 'this took', time.time()-start, 's'
Using scipy.integrate.quad
is again significantly faster. This is in part due to the fact that it does not use 10000000 segments over which it calculates the integral. Instead, it uses an adaptive algorithm that estimates (and outputs) the uncertainty on the integral and stop iterating if that uncertainty is smaller than some threshold.
Use the following rules in your coding to make your code run efficiently:
numpy
, scipy
, some other modulenumpy
arrays (better) insteadnumpy
functions on arrays - they are especially designed for that and usually run faster than list functions
In [6]:
import time
import numpy as np
from scipy.integrate import quad
def task(x):
""" this simulates a complicated tasks that takes input x and returns some float based on x"""
time.sleep(1) # simulate that calculating the result takes 1 sec
return x**2
# an array with input parameters
input = np.random.rand(10)
start = time.time()
results = []
for x in input:
results.append(task(x))
print results
print 'this took', time.time()-start, 's'
The results generated by the different tasks do not rely on each other, i.e., they could in principle be all calculated at the same time. This can be realized using the multiprocessing
module:
In [7]:
from multiprocessing import Pool
pool = Pool() # create a pool object
start = time.time()
results = pool.map(task, input) # map the function 'task' on the input data
print results
print 'this took', time.time()-start, 's'
What happens is that all input parameters are pooled and the task
is applied to each of them in a number of processes. The number of processes is the same as the number of input data. The processes are run in no specific order, but the results
list has the same order as the input
array.
The real magic is that the whole script only takes 2 seconds instead of 10. This is possible since the processes are distributed over a number of threads, each of which uses only a fraction of the computational power of your CPU (or CPUs). Hence, different processes can be run in parallel, fully exploiting the power of your CPU.
Not all things make sense to be run in threads. For instance, if each process has to access a file on your hard drive, multiprocessing would most likely not improve the runtime of your script. The reason is simply that your hard drive has only one read/write head, which all processes have to share.