Functions That We Will Use for Exercise

$$\pi \approx \sqrt{6 \sum_{i=1}^{n} \frac{1}{i^2}}$$

For example, if $n=5$, then

$$\pi \approx \sqrt{6 \left( \frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \frac{1}{5^2} \right)} = 2.9634$$

Whereas if $n=100$, then $\pi \approx $.



In [1]:

    
import numpy as np
from numpy import pi

def approx_pi(n=1000000):
    s = 0.0
    for k in range(1, n+1):
        s += 1.0 / k**2
    return (6.0 * s) ** 0.5

print approx_pi(100)
print approx_pi(1000)
print approx_pi(10000)
print approx_pi(100000)
print approx_pi(1000000)









    



3.13207653181
3.14063805621
3.14149716395
3.14158310433
3.14159169866

Timing Function Calls in IPython

IPython provides a magic function to compute the time taken to execute a function.



In [2]:

    
def f1(n):
    s = 0.0
    for k in range(1, n+1):
        s += k
    return s

%timeit f1(100000)









    



100 loops, best of 3: 10.7 ms per loop



In [3]:

    
def f2(n):
    s = 0.0
    for k in xrange(1, n+1):
        s += k
    return k

%timeit f2(100000)









    



100 loops, best of 3: 8.43 ms per loop



In [4]:

    
n = 100000
%timeit(sum(range(1, n+1)))









    



100 loops, best of 3: 2.01 ms per loop



In [5]:

    
n = 100000
a = np.array(range(1, n+1), dtype=float)
%timeit sum(a)









    



100 loops, best of 3: 9.73 ms per loop



In [6]:

    
%timeit approx_pi(10000)









    



1000 loops, best of 3: 1.73 ms per loop



In [16]:

    
%timeit approx_pi(1000000)









    



1 loops, best of 3: 1.55 s per loop



In [7]:

    
def pi2(n=1000000):
    s = 0.0
    for k in xrange(1, n+1):
        s += 1.0 / k**2
    return (6.0 * s) ** 0.5

%timeit pi2(1000000)









    



10 loops, best of 3: 160 ms per loop

Optimizing Code

Optimizing a program is an attempt to improve its performance in terms of execution time. Before attempting to optimize your program, be sure that it requires optimizing. So benchmark it and assess its performance before you are sure that it requires optimizing.

Donald Knuth is said to have made the following statement:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Premature optimization is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a program. This can result in a program whose design is not as clean as could have been because it is complicated by the optimization and the programmer is distracted by the focus on optimization.

Optimization can be carried out at different levels:

Design Level: At the highest level, choice of an efficient algorithm will result in better performance
Source Code Level: Avoiding poor quality code can result in better performance. Some optimizations are possible at the cost of code readability and maintainability. Some, but not all, source code optimization can usually be performed by the compilers chosen by appropriate switches on the command line. Many compilers are able to improve performance by implementing build level and compile level optimization of the source code
Run Level: Just-in-time (JIT) compilers and assemblers may be able to perform run-time optimization exceeding the capability of the static compilers by dynamically adjusting parameters according to the actual input or other factors

Before beginning to optimize a program, do the following:

Ensure that it uses an efficient algorithm appropriate to the circumstances under which it is being developed and will be used
Thoroughly test and debug the program, preferably through a suit of automated tests
Benchmark the performance of the program for a wide variety of input data using a suitable profiling mechanism
From the profiling data, identify the parts of the program that when optimized, will substantially improve its performance

Profiling Python Programs

Profiling is the process by which it is possible to gather statistics regarding the number of times a function is called, the amount of time the program spends in each function and other similar data. Analysis of profiling data helps in identifying which parts of the program need to be optimized.

Python provides two modules for profiling Python programs.

cProfile is a C extension with only moderate overhead, making it suitable for profiling long running programs.
profile is a pure Python module which is slower than cProfile, but offers the advantage that it can be extended using inheritance

Both profile and cProfile export the same interface, so they are interchangeable. That is, replacing profile with cProfile, or vice versa does not require any changes to the program being profiled.

A program can be profiled in one of two ways:

Including profiling code in a driver program
Profiling from the command line

Including Profiling Code in a Driver Program



In [8]:

    
# filename: calc_pi.py

def recip_square(i):
    return 1.0 / i**2

def approx_pi(n=10000000):
    s = 0
    for k in range(1, n+1):
        s += recip_square(k)
    return (6 * s) ** 0.5

# Driver Program. The lines appearing below could also be written in a separate file
# If driver program is separate, it must import the file containing the above two functions
if __name__ == '__main__':
    # Import Python modules required for profiling and statistics gatering
    import cProfile

    cProfile.run('approx_pi()')
#    cProfile.runctx("approx_pi()", globals(), locals(), "Profile.prof")

#    s = pstats.Stats("Profile.prof")
#    s.strip_dirs().sort_stats("time").print_stats()









    



         10000004 function calls in 4.174 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10000000    1.701    0.000    1.701    0.000 <ipython-input-8-f194cb5b1112>:3(recip_square)
        1    2.346    2.346    4.174    4.174 <ipython-input-8-f194cb5b1112>:6(approx_pi)
        1    0.000    0.000    4.174    4.174 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.126    0.126    0.126    0.126 {range}

If the results of a profile are to be saved in a file for subsequent processing, the name of the file must be supplied as the second argument to cProfile.run().



In [2]:

    
if __name__ == '__main__':
    cProfile.run('approx_pi()', 'approx_pi.prof')

Profiling from the Command Line

The above results could also be achieved by the following commands from the command line:

$ python -m cProfile calc_pi.py

To write profile data to a file named calc_pi.prof, use the followingcommand:

$ python -m cProfile -o calc_pi.prof calc_pi.py

Statistics of Profiling Data

The advantage of saving profiling data in a file is that it can be processed in a flexible way. Profiling statistics can be processed using the module pstats and the class stats which provides several methods that enable processing of profiling statistics.



In [3]:

    
if __name__ == '__main__':
    import cProfile, pstats

    cProfile.run('approx_pi()', 'calc_pi.prof')
    p = pstats.Stats('calc_pi.prof')
    p.strip_dirs().sort_stats('time').print_stats()









    



Tue Feb 24 11:15:24 2015    calc_pi.prof

         10000004 function calls in 18.588 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10000000   16.431    0.000   16.431    0.000 <ipython-input-1-f194cb5b1112>:3(recip_square)
        1    2.030    2.030   18.588   18.588 <ipython-input-1-f194cb5b1112>:6(approx_pi)
        1    0.127    0.127    0.127    0.127 {range}
        1    0.000    0.000   18.588   18.588 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Some useful methods in Stats are listed below:

strip_dirs() strips removes extraneous path from all module names
sort_stats() sorts the lines according to several criteria, such as, by name of the functions (name), time spent within each function (time), cumulative time spent in each function (cumulative), filename (file) or the number of times each function is called (calls)
print_stats() prints the processed statistics
print_callers() prints a list of all functions that called each function in the profiled database
print_callees() prints a list of all functions that were called by the indicated function

To use these methods, you must first create an object of type Stats, initialising it with the filename containing the saved profiling data. The above example strips path data, sorts by time spent in each function and prints the statistics. It is possible to print the statistics from the command line also, provided the -o option is not used.

$ python -m cProfile -s time calc_pi.py

Timing Execution of Python Programs

Profiling gives not only the time spent in each of the functions, but also the total time taken by the program. If you only want the execution times without the profiling data, you must use the timeit module. It contains the class Timer that can execute a specified function and report its execution time. It is possible to specify the number of times the function is to be executed.



In [9]:

    
# filename: timeit_pi.py

from timeit import Timer

t = Timer('approx_pi(100000)', 'from __main__ import approx_pi')
reps = 5
print 'Execution time = %.2f s' % (sum(t.repeat(repeat=reps, number=1)) / reps)









    



Execution time = 0.04 s



In [10]:

    
%timeit approx_pi(100000)









    



10 loops, best of 3: 28.3 ms per loop

IPython console gives a handy magic command %timeit that does the same job as above.

In [10] from calc_pi import approx_pi
In [11] %timeit approx_pi()
1 loops, best of 3: 2.62 s per loop

But if you are not using IPython, it is best to use the timeit module as described above.

References

Wikipedia article on Program Optimization

Questions

Why are Python programs slow compared to programs written in C/C++ and Fortran?
What are the different strategies to improve performance of Python programs?
What are the best programming languages for numerical computations when it comes to performance?
What are the issues of writing functions in C/C++ and calling them in Python programs?
What are the goals of Cython project?
What is LLVM?
What are the goals of Numba project?
Compare the Cython and Numba projects and the way they approach optimizing Python programs