In [ ]:
import random
import numpy as np
from matplotlib import pyplot as plt
It can be hard to guess which code is going to operate faster just by looking at it because the interactions between software and computers can be extremely complex. The best way to optimize code is through using profilers to identify bottlenecks in your code and then attempt to address these problems through optimization. Let's give it a whirl.
timeit
We will begin our experience with profilers by using the time
and timeit
commands. time
can be run on any size of program, but it returns coarse level time information on how long something took to run overall.
There are a lot of small optimizations that can add up to a lot of time in real-world software. Let's look at a few of the non-obvious ones.
Problem 1a
What is the best way to join a bunch of strings into a larger string? There are several ways of doing this, but some are clearly superior to others. Let's use timeit
to test things out.
Below, in each of the cells after the string_list
is defined, put a new code snippet using the following three methods for building a string:
--Use the builtin +
operator to add strings together in an iterative way
--Use the join
method, as in "".join(list).
--Iteratively add the strings from the list together using "%s %s" string composition.
Guess which method you think will be fastest? Now test it out and see if you're right!
In [ ]:
string_list = ['the ', 'quick ', 'brown ', 'fox ', 'jumped ', 'over ', 'the ', 'lazy ', 'dog']
In [ ]:
%%timeit
output = ""
# complete
In [ ]:
%%timeit
# complete
In [ ]:
%%timeit
output = ""
# complete
Interesting! So it appears that the join
method was the fastest by a factor of four or so. Good to keep that in mind for future use of strings!
Problem 1b
What about building big lists or list-like structures (like numpy arrays)? We now know how to construct lists in a variety of ways, so let's see which is fastest. Make a list of ascending perfect squares (i.e. 1, 4, 9, ...) for the first 1 million integers. Use these methods:
--Iteratively appending x**2 values on to an empty list
--A for loop with the built in python range
command
--A for loop with the numpy arange
command
--Use the numpy arange
command directly, and then take the square of it
--Use map
to map a lambda
squaring function to a numpy array constructed with numpy arange
Guess which method you think will be fastest? Now test it out and see if you're right!
In [ ]:
%%timeit
output = []
# complete
In [ ]:
%%timeit
# complete
In [ ]:
%%timeit
# complete
In [ ]:
%%timeit
# complete
In [ ]:
%%timeit
map(lambda x:# complete
Whoa! We were able to see a >100x efficiency increase by just switching these methods slightly! Numpy arrays are awesome, but I'm sort of surprised that the lambda function won compared to native numpy.
cProfile
and line_profiler
Problem 2a
OK, so what about larger program? Here is a sorting algorithm that I wrote, which may possess some inefficiencies. But it is hard to know which bugs are causing the biggest problems (some actually aren't that big of a deal in the long term). Let's see if we can speed it up. First, take this code and copy it into a file called sort.py. Read through the code to make sure you understand it. Then, run it with the time
command, and write down the total time it took to run.
In [ ]:
# Sort version1
import random
def create_random_list(n_elements):
"""
Create a list made up of random elements in random order
"""
random_list = []
for i in range(n_elements):
random_list.append(random.random())
return random_list
def find_minimum_index(random_list):
"""
Find the index of the minimum value in the list
"""
# current minimum
min_value = 1
i = 0
# Find minimum in list
for element in random_list:
if element < min_value:
min_value = element
# Find value that matches minimum
for element in random_list:
if element == min_value:
return i
i += 1
def sort_list(random_list):
"""
Sort a list into ascending order
"""
output_list = []
for _ in range(len(random_list)):
i = find_minimum_index(random_list)
minimum = random_list[i]
output_list.append(minimum)
del random_list[i]
return output_list
if __name__ == '__main__':
l = create_random_list(10000)
o = sort_list(l)
Problem 2b
OK, now try running the cProfile module with it in order to produce some profiling statistics. You can do this by running:
python -m cProfile -o sort.prof sort.py
This will produce an output profile file called sort.prof. You can do a variety of things with sort.prof, but you'll need a few programs to do this. First, install pyprof2html with: pip install pyprof2html
. Then, try:
pyprof2html sort.prof
This will produce a html
directory, and you can just open up the enclosed index.html
file to bring it to your browser. You can see function by function, what is taking the most time! You can click on column headers to change which sorting occurs.
Problem 2c
But there are graphical ways of representing these data effectively. Download snakeviz
, another means of viewing your profile data. You can do this with pip install snakeviz
. And then open up the same file with snakeviz:
snakeviz sort.prof
This should bring up another graphical interface for analyzing the profile data. Switch to icicle mode, and explore the information a bit. Try to figure out where the "hot" sections of the code are. Namely, what is the most expensive function that is running in terms of time?
Problem 2d
OK, so if that's the most expensive, we better speed it up. We can investigate line-by-line how slow/fast things are, but we need another package for that called line_profiler
. Go ahead and install this with pip install line_profiler
.
Go back to the source code file, and add a @profile
line directly above the slow function. line_profiler
automatically installed a file called kernprof
to your $PYTHONPATH
, which is used with the following format at the command line:
kernprof.py -v -l your_script your_script_args
Start up kernprof and we'll look at the slow function in our sort program! See if you can find where the slowdown is, based on the amount of time spent on a particular line. Can you fix this line to not be so inefficient?
Hint: Remember the ways we discussed for optimizing code: In-place operations, string concatenations, vectorizing loops, list comprehensions, range vs arange, lambda functions, etc.
Problem 2e
Great! Now repeat these steps to improve your code:
1) Run code with cProfile
2) Record total time it took to run in this iteration.
3) Load up the profiler information in snakeviz or pyprof2html
4) Look for "hot" functions
5) Run kernprof with line_profiler to identify individual lines that may be slow
6) Make a modification to the code trying to address the problem
7) Go back to (1) until you're satisfied.
You should be able to iterate on this until not one of the native functions is in the top 20 list of hot functions, the others being associated with loading numpy and such. If this is the case, there is more overhead being spent on loading the data than on your actual code--try increasing the number of elements in the sorting array.
Problem 2f
Here is a good test. Make a new code file where you swap out all of the sorting information and just run python's native list.sort()
function. Profile this, look at total time spent running, and see how it compares with our version. Note any differences? What about if you use the np.sort()
function?
Problem 2g
Look at the memory consumption of your optimized code versus the real list.sort() and numpy.sort() functions. You can do this by using the memory_profiler. You'll need to download it first with:
pip install memory_profiler
Now, you can look at the line by line memory consumption of a function, just like you did with line_profiler and kernprof. Again, you have to put the @profile
decorator just before the function you want to profile, but then you run:
python -m memory_profiler program.py
Run this on your optimized code, and then on the true python list.sort() and the numpy.sort() and see who takes up the most memory. Why do you think that is?
Problem 3a
Below I have included the Moving Galaxy and Universe code, the challenge problem from Tuesday's session. First, glance over it, to make sure you know what it's doing for the most part. Then profile it and optimize it using the algorithm described above. What is the slowest general part of the runtime?
Hint: If you comment that out, do things speed up?
In [ ]:
from matplotlib import pyplot as plt
import numpy as np
import random
class Galaxy():
"""
Galaxy class for simply representing a galaxy.
"""
def __init__(self, total_mass, cold_gas_mass, stellar_mass, age=0):
self.total_mass = total_mass
self.cold_gas_mass = cold_gas_mass
self.stellar_mass = stellar_mass
self.age = age
self.SFR = 0
self.color = 'red'
def __repr__(self):
return "Galaxy (m_total = %.1g; m_cold = %.1g; m_stars = %.1g; age = %.1g; SFR = %0.2f)" % \
(self.total_mass, self.cold_gas_mass, self.stellar_mass, self.age, self.SFR)
class EvolvingGalaxy(Galaxy):
"""
Galaxy class for representing a galaxy that can evolve over time.
"""
def current_state(self):
"""
Return a tuple of the galaxy's total_mass, cold_gas_mass, stellar_mass, age, and SFR
"""
return (self.total_mass, self.cold_gas_mass, self.stellar_mass, self.age, self.SFR)
def calculate_star_formation_rate(self):
"""
Calculate the star formation rate by taking a random number between 0 and 1
normalized by the galaxy total mass / 1e12;
Also updates the galaxy's color to blue if SFR > 0.01, otherwise color = red
"""
self.SFR = random.random() * (self.total_mass / 1e12)
if self.SFR > 0.01:
self.color = 'blue'
else:
self.color = 'red'
def accrete_gas_from_IGM(self, time):
"""
Allow the galaxy to accrete cold gas from the IGM at a variable rate normalized to
the galaxy's mass
"""
cold_gas_accreted = random.random() * 0.1 * time * (self.total_mass / 1e12)
self.cold_gas_mass += cold_gas_accreted
self.total_mass += cold_gas_accreted
def form_stars(self, time):
"""
Form stars according to the current star formation rate and time available
If unable cold gas, then shut off star formation
"""
if self.cold_gas_mass > self.SFR * time:
self.cold_gas_mass -= self.SFR * time
self.stellar_mass += self.SFR * time
else:
self.SFR = 0
self.color = 'red'
def evolve(self, time):
"""
Evolve this galaxy forward for a period time
"""
if random.random() < 0.01:
self.calculate_star_formation_rate()
self.accrete_gas_from_IGM(time)
self.form_stars(time)
self.age += time
class MovingGalaxy(EvolvingGalaxy):
"""
This galaxy can move over time in the x,y plane
"""
def __init__(self, total_mass, cold_gas_mass, stellar_mass, x_position, y_position, x_velocity, y_velocity, idnum, age=0):
# Replace self with super to activate the superclass's methods
super().__init__(total_mass, cold_gas_mass, stellar_mass)
self.x_position = x_position
self.y_position = y_position
self.x_velocity = x_velocity
self.y_velocity = y_velocity
self.idnum = idnum
def __repr__(self):
return "Galaxy %i (x = %.0f; y = %.0f)" % (self.idnum, self.x_position, self.y_position)
def move(self, time):
"""
"""
self.x_position += self.x_velocity * time
self.y_position += self.y_velocity * time
def calculate_momentum(self):
return (self.total_mass * self.x_velocity, self.total_mass * self.y_velocity)
def evolve(self, time):
self.move(time)
super().evolve(time)
def distance(galaxy1, galaxy2):
x_diff = galaxy1.x_position - galaxy2.x_position
y_diff = galaxy1.y_position - galaxy2.y_position
return (x_diff**2 + y_diff**2)**0.5
class Universe():
"""
"""
def __init__(self):
self.xrange = (0,100)
self.yrange = (0,100)
self.galaxies = []
self.added_galaxies = []
self.removed_galaxies = []
self.time = 0
pass
def __repr__(self):
out = 'Universe: t=%.2g\n' % self.time
for galaxy in self.galaxies:
out = "%s%s\n" % (out, galaxy)
return out
def add_galaxy(self, galaxy=None):
if galaxy is None:
stellar_mass = 10**(4*random.random()) * 1e6
cold_gas_mass = 10**(4*random.random()) * 1e6
total_mass = (cold_gas_mass + stellar_mass)*1e2
galaxy = MovingGalaxy(total_mass,
cold_gas_mass,
stellar_mass,
x_position=random.random()*100,
y_position=random.random()*100,
x_velocity=random.uniform(-1,1)*1e-7,
y_velocity=random.uniform(-1,1)*1e-7,
idnum=len(self.galaxies))
self.galaxies.append(galaxy)
def remove_galaxy(self, galaxy):
if galaxy in self.galaxies:
del self.galaxies[self.galaxies.index(galaxy)]
def evolve(self, time):
for galaxy in self.galaxies:
galaxy.evolve(time)
galaxy.x_position %= 100
galaxy.y_position %= 100
self.check_for_mergers()
for galaxy in self.removed_galaxies:
self.remove_galaxy(galaxy)
for galaxy in self.added_galaxies:
self.add_galaxy(galaxy)
self.removed_galaxies = []
self.added_galaxies = []
self.time += time
def merge_galaxies(self, galaxy1, galaxy2):
print('Merging:\n%s\n%s' % (galaxy1, galaxy2))
x_mom1, y_mom1 = galaxy1.calculate_momentum()
x_mom2, y_mom2 = galaxy2.calculate_momentum()
new_total_mass = galaxy1.total_mass + galaxy2.total_mass
new_galaxy = MovingGalaxy(total_mass = new_total_mass,
cold_gas_mass = galaxy1.cold_gas_mass + galaxy2.cold_gas_mass,
stellar_mass = galaxy1.stellar_mass + galaxy2.stellar_mass,
x_position = galaxy1.x_position,
y_position = galaxy1.y_position,
x_velocity = (x_mom1 + x_mom2) / new_total_mass,
y_velocity = (y_mom1 + y_mom2) / new_total_mass,
idnum = galaxy1.idnum)
self.added_galaxies.append(new_galaxy)
self.removed_galaxies.append(galaxy1)
self.removed_galaxies.append(galaxy2)
def check_for_mergers(self):
for i, galaxy1 in enumerate(self.galaxies):
for j, galaxy2 in enumerate(self.galaxies[i+1:]):
if distance(galaxy1, galaxy2) <= 2:
self.merge_galaxies(galaxy1, galaxy2)
def plot_state(self, frame_id):
plt.clf()
x = [galaxy.x_position for galaxy in self.galaxies]
y = [galaxy.y_position for galaxy in self.galaxies]
color = [galaxy.color for galaxy in self.galaxies]
size = [galaxy.total_mass / 1e9 for galaxy in self.galaxies]
plt.scatter(x,y, color=color, s=size)
plt.xlim(uni.xrange)
plt.ylim(uni.yrange)
plt.savefig('frame%04i.png' % frame_id)
if __name__ == '__main__':
uni = Universe()
n_timesteps = 2e2
n_galaxies = 25
for i in range(n_galaxies):
uni.add_galaxy()
for i in range(int(n_timesteps)):
uni.evolve(2e9/n_timesteps)
uni.plot_state(i)
Problem 3b
Great! So how are we going to address this slowdown? Instead of generating one plot at a time, and later packaging them all as a single movie, why not try doing it all at once using the new matplotlib animation module: http://matplotlib.org/api/animation_api.html . See if you can figure out how to do it!