Profiling and Optimizing


By C Hummels (Caltech)


In [ ]:
import random
import numpy as np
from matplotlib import pyplot as plt

It can be hard to guess which code is going to operate faster just by looking at it because the interactions between software and computers can be extremely complex. The best way to optimize code is through using profilers to identify bottlenecks in your code and then attempt to address these problems through optimization. Let's give it a whirl.

Problem 1) Using timeit

We will begin our experience with profilers by using the time and timeit commands. time can be run on any size of program, but it returns coarse level time information on how long something took to run overall.

There are a lot of small optimizations that can add up to a lot of time in real-world software. Let's look at a few of the non-obvious ones.

Problem 1a

What is the best way to join a bunch of strings into a larger string? There are several ways of doing this, but some are clearly superior to others. Let's use timeit to test things out.

Below, in each of the cells after the string_list is defined, put a new code snippet using the following three methods for building a string:

--Use the builtin + operator to add strings together in an iterative way

--Use the join method, as in "".join(list).

--Iteratively add the strings from the list together using "%s %s" string composition.

Guess which method you think will be fastest? Now test it out and see if you're right!


In [ ]:
string_list = ['the ', 'quick ', 'brown ', 'fox ', 'jumped ', 'over ', 'the ', 'lazy ', 'dog']

In [ ]:
%%timeit
output = ""
# complete

In [ ]:
%%timeit
# complete

In [ ]:
%%timeit
output = ""
# complete

Interesting! So it appears that the join method was the fastest by a factor of four or so. Good to keep that in mind for future use of strings!

Problem 1b

What about building big lists or list-like structures (like numpy arrays)? We now know how to construct lists in a variety of ways, so let's see which is fastest. Make a list of ascending perfect squares (i.e. 1, 4, 9, ...) for the first 1 million integers. Use these methods:

--Iteratively appending x**2 values on to an empty list

--A for loop with the built in python range command

--A for loop with the numpy arange command

--Use the numpy arange command directly, and then take the square of it

--Use map to map a lambda squaring function to a numpy array constructed with numpy arange

Guess which method you think will be fastest? Now test it out and see if you're right!


In [ ]:
%%timeit
output = []
# complete

In [ ]:
%%timeit
# complete

In [ ]:
%%timeit
# complete

In [ ]:
%%timeit
# complete

In [ ]:
%%timeit
map(lambda x:# complete

Whoa! We were able to see a >100x efficiency increase by just switching these methods slightly! Numpy arrays are awesome, but I'm sort of surprised that the lambda function won compared to native numpy.

Problem 2) Deeper profiling with cProfile and line_profiler

Problem 2a

OK, so what about larger program? Here is a sorting algorithm that I wrote, which may possess some inefficiencies. But it is hard to know which bugs are causing the biggest problems (some actually aren't that big of a deal in the long term). Let's see if we can speed it up. First, take this code and copy it into a file called sort.py. Read through the code to make sure you understand it. Then, run it with the time command, and write down the total time it took to run.


In [ ]:
# Sort version1

import random

def create_random_list(n_elements):
    """
    Create a list made up of random elements in random order
    """
    random_list = []
    for i in range(n_elements):
        random_list.append(random.random())
    return random_list

def find_minimum_index(random_list):
    """
    Find the index of the minimum value in the list
    """
    # current minimum
    min_value = 1
    i = 0

    # Find minimum in list
    for element in random_list:
        if element < min_value:
            min_value = element

    # Find value that matches minimum
    for element in random_list:
        if element == min_value:
            return i
        i += 1

def sort_list(random_list):
    """
    Sort a list into ascending order
    """
    output_list = []
    for _ in range(len(random_list)):
        i = find_minimum_index(random_list)
        minimum = random_list[i]
        output_list.append(minimum)
        del random_list[i]
    return output_list

if __name__ == '__main__':
    l = create_random_list(10000)
    o = sort_list(l)

Problem 2b

OK, now try running the cProfile module with it in order to produce some profiling statistics. You can do this by running:

python -m cProfile -o sort.prof sort.py

This will produce an output profile file called sort.prof. You can do a variety of things with sort.prof, but you'll need a few programs to do this. First, install pyprof2html with: pip install pyprof2html. Then, try:

pyprof2html sort.prof

This will produce a html directory, and you can just open up the enclosed index.html file to bring it to your browser. You can see function by function, what is taking the most time! You can click on column headers to change which sorting occurs.

Problem 2c

But there are graphical ways of representing these data effectively. Download snakeviz, another means of viewing your profile data. You can do this with pip install snakeviz. And then open up the same file with snakeviz:

snakeviz sort.prof

This should bring up another graphical interface for analyzing the profile data. Switch to icicle mode, and explore the information a bit. Try to figure out where the "hot" sections of the code are. Namely, what is the most expensive function that is running in terms of time?

Problem 2d

OK, so if that's the most expensive, we better speed it up. We can investigate line-by-line how slow/fast things are, but we need another package for that called line_profiler. Go ahead and install this with pip install line_profiler.

Go back to the source code file, and add a @profile line directly above the slow function. line_profiler automatically installed a file called kernprof to your $PYTHONPATH, which is used with the following format at the command line:

kernprof.py -v -l your_script your_script_args

Start up kernprof and we'll look at the slow function in our sort program! See if you can find where the slowdown is, based on the amount of time spent on a particular line. Can you fix this line to not be so inefficient?

Hint: Remember the ways we discussed for optimizing code: In-place operations, string concatenations, vectorizing loops, list comprehensions, range vs arange, lambda functions, etc.

Problem 2e

Great! Now repeat these steps to improve your code:

1) Run code with cProfile

2) Record total time it took to run in this iteration.

3) Load up the profiler information in snakeviz or pyprof2html

4) Look for "hot" functions

5) Run kernprof with line_profiler to identify individual lines that may be slow

6) Make a modification to the code trying to address the problem

7) Go back to (1) until you're satisfied.

You should be able to iterate on this until not one of the native functions is in the top 20 list of hot functions, the others being associated with loading numpy and such. If this is the case, there is more overhead being spent on loading the data than on your actual code--try increasing the number of elements in the sorting array.

Problem 2f

Here is a good test. Make a new code file where you swap out all of the sorting information and just run python's native list.sort() function. Profile this, look at total time spent running, and see how it compares with our version. Note any differences? What about if you use the np.sort() function?

Problem 2g

Look at the memory consumption of your optimized code versus the real list.sort() and numpy.sort() functions. You can do this by using the memory_profiler. You'll need to download it first with:

pip install memory_profiler

Now, you can look at the line by line memory consumption of a function, just like you did with line_profiler and kernprof. Again, you have to put the @profile decorator just before the function you want to profile, but then you run:

python -m memory_profiler program.py

Run this on your optimized code, and then on the true python list.sort() and the numpy.sort() and see who takes up the most memory. Why do you think that is?

Problem 3) Profiling real code

Problem 3a

Below I have included the Moving Galaxy and Universe code, the challenge problem from Tuesday's session. First, glance over it, to make sure you know what it's doing for the most part. Then profile it and optimize it using the algorithm described above. What is the slowest general part of the runtime?

Hint: If you comment that out, do things speed up?


In [ ]:
from matplotlib import pyplot as plt
import numpy as np
import random

class Galaxy():
    """
    Galaxy class for simply representing a galaxy.
    """
    def __init__(self, total_mass, cold_gas_mass, stellar_mass, age=0):
        self.total_mass = total_mass
        self.cold_gas_mass = cold_gas_mass
        self.stellar_mass = stellar_mass
        self.age = age
        self.SFR = 0
        self.color = 'red'
        
    def __repr__(self):
        return "Galaxy (m_total = %.1g; m_cold = %.1g; m_stars = %.1g; age = %.1g; SFR = %0.2f)" % \
                (self.total_mass, self.cold_gas_mass, self.stellar_mass, self.age, self.SFR)
        
class EvolvingGalaxy(Galaxy):
    """
    Galaxy class for representing a galaxy that can evolve over time.
    """
    def current_state(self):
        """
        Return a tuple of the galaxy's total_mass, cold_gas_mass, stellar_mass, age, and SFR
        """
        return (self.total_mass, self.cold_gas_mass, self.stellar_mass, self.age, self.SFR)
    
    def calculate_star_formation_rate(self):
        """
        Calculate the star formation rate by taking a random number between 0 and 1 
        normalized by the galaxy total mass / 1e12; 
        
        Also updates the galaxy's color to blue if SFR > 0.01, otherwise color = red
        """
        self.SFR = random.random() * (self.total_mass / 1e12)
        if self.SFR > 0.01: 
            self.color = 'blue'
        else:
            self.color = 'red'
            
    def accrete_gas_from_IGM(self, time):
        """
        Allow the galaxy to accrete cold gas from the IGM at a variable rate normalized to
        the galaxy's mass
        """
        cold_gas_accreted = random.random() * 0.1 * time * (self.total_mass / 1e12)
        self.cold_gas_mass += cold_gas_accreted
        self.total_mass += cold_gas_accreted
        
    def form_stars(self, time):
        """
        Form stars according to the current star formation rate and time available
        If unable cold gas, then shut off star formation
        """
        if self.cold_gas_mass > self.SFR * time:
            self.cold_gas_mass -= self.SFR * time
            self.stellar_mass += self.SFR * time
        else:
            self.SFR = 0
            self.color = 'red'
            
    def evolve(self, time):
        """
        Evolve this galaxy forward for a period time
        """
        if random.random() < 0.01:
            self.calculate_star_formation_rate()
        self.accrete_gas_from_IGM(time)
        self.form_stars(time)
        self.age += time       
        
class MovingGalaxy(EvolvingGalaxy):
    """
    This galaxy can move over time in the x,y plane
    """
    def __init__(self, total_mass, cold_gas_mass, stellar_mass, x_position, y_position, x_velocity, y_velocity, idnum, age=0):
        
        # Replace self with super to activate the superclass's methods
        super().__init__(total_mass, cold_gas_mass, stellar_mass)
        
        self.x_position = x_position
        self.y_position = y_position
        self.x_velocity = x_velocity
        self.y_velocity = y_velocity
        self.idnum = idnum
        
    def __repr__(self):
        return "Galaxy %i (x = %.0f; y = %.0f)" % (self.idnum, self.x_position, self.y_position)
        
    def move(self, time):
        """
        """
        self.x_position += self.x_velocity * time
        self.y_position += self.y_velocity * time
        
    def calculate_momentum(self):
        return (self.total_mass * self.x_velocity, self.total_mass * self.y_velocity)

    def evolve(self, time):
        self.move(time)
        super().evolve(time)
        
def distance(galaxy1, galaxy2):
    x_diff = galaxy1.x_position - galaxy2.x_position
    y_diff = galaxy1.y_position - galaxy2.y_position
    return (x_diff**2 + y_diff**2)**0.5

class Universe():
    """
    """
    def __init__(self):
        self.xrange = (0,100)
        self.yrange = (0,100)
        self.galaxies = []
        self.added_galaxies = []
        self.removed_galaxies = []
        self.time = 0
        pass
    
    def __repr__(self):
        out = 'Universe: t=%.2g\n' % self.time
        for galaxy in self.galaxies:
            out = "%s%s\n" % (out, galaxy)
        return out
        
    def add_galaxy(self, galaxy=None):
        if galaxy is None:
            stellar_mass = 10**(4*random.random()) * 1e6
            cold_gas_mass = 10**(4*random.random()) * 1e6
            total_mass = (cold_gas_mass + stellar_mass)*1e2
            galaxy = MovingGalaxy(total_mass,
                                  cold_gas_mass,
                                  stellar_mass,
                                  x_position=random.random()*100,
                                  y_position=random.random()*100,
                                  x_velocity=random.uniform(-1,1)*1e-7,
                                  y_velocity=random.uniform(-1,1)*1e-7,
                                  idnum=len(self.galaxies))
        self.galaxies.append(galaxy)
        
    def remove_galaxy(self, galaxy):
        if galaxy in self.galaxies:
            del self.galaxies[self.galaxies.index(galaxy)]
        
    def evolve(self, time):
        for galaxy in self.galaxies:
            galaxy.evolve(time)
            galaxy.x_position %= 100
            galaxy.y_position %= 100
        self.check_for_mergers()
        for galaxy in self.removed_galaxies:
            self.remove_galaxy(galaxy)
        for galaxy in self.added_galaxies:
            self.add_galaxy(galaxy)
        self.removed_galaxies = []
        self.added_galaxies = []
        self.time += time
            
    def merge_galaxies(self, galaxy1, galaxy2):
        print('Merging:\n%s\n%s' % (galaxy1, galaxy2))
        x_mom1, y_mom1 = galaxy1.calculate_momentum()
        x_mom2, y_mom2 = galaxy2.calculate_momentum()
        new_total_mass = galaxy1.total_mass + galaxy2.total_mass
        new_galaxy = MovingGalaxy(total_mass = new_total_mass,
                                  cold_gas_mass = galaxy1.cold_gas_mass + galaxy2.cold_gas_mass,
                                  stellar_mass = galaxy1.stellar_mass + galaxy2.stellar_mass,
                                  x_position = galaxy1.x_position,
                                  y_position = galaxy1.y_position,
                                  x_velocity = (x_mom1 + x_mom2) / new_total_mass,
                                  y_velocity = (y_mom1 + y_mom2) / new_total_mass,
                                  idnum = galaxy1.idnum)
        self.added_galaxies.append(new_galaxy)
        self.removed_galaxies.append(galaxy1)
        self.removed_galaxies.append(galaxy2)
        
    def check_for_mergers(self):
        for i, galaxy1 in enumerate(self.galaxies):
            for j, galaxy2 in enumerate(self.galaxies[i+1:]):
                if distance(galaxy1, galaxy2) <= 2:
                    self.merge_galaxies(galaxy1, galaxy2)
                
    def plot_state(self, frame_id):
        plt.clf()
        x = [galaxy.x_position for galaxy in self.galaxies]
        y = [galaxy.y_position for galaxy in self.galaxies]
        color = [galaxy.color for galaxy in self.galaxies]
        size = [galaxy.total_mass / 1e9 for galaxy in self.galaxies]
        plt.scatter(x,y, color=color, s=size)
        plt.xlim(uni.xrange)
        plt.ylim(uni.yrange)
        plt.savefig('frame%04i.png' % frame_id)

if __name__ == '__main__':
    uni = Universe()
    n_timesteps = 2e2
    n_galaxies = 25
    for i in range(n_galaxies):
        uni.add_galaxy()

    for i in range(int(n_timesteps)):
        uni.evolve(2e9/n_timesteps)
        uni.plot_state(i)

Problem 3b

Great! So how are we going to address this slowdown? Instead of generating one plot at a time, and later packaging them all as a single movie, why not try doing it all at once using the new matplotlib animation module: http://matplotlib.org/api/animation_api.html . See if you can figure out how to do it!