In [1]:
import numpy as np
from numpy import pi
def approx_pi(n=1000000):
s = 0.0
for k in range(1, n+1):
s += 1.0 / k**2
return (6.0 * s) ** 0.5
print approx_pi(100)
print approx_pi(1000)
print approx_pi(10000)
print approx_pi(100000)
print approx_pi(1000000)
In [2]:
def f1(n):
s = 0.0
for k in range(1, n+1):
s += k
return s
%timeit f1(100000)
In [3]:
def f2(n):
s = 0.0
for k in xrange(1, n+1):
s += k
return k
%timeit f2(100000)
In [4]:
n = 100000
%timeit(sum(range(1, n+1)))
In [5]:
n = 100000
a = np.array(range(1, n+1), dtype=float)
%timeit sum(a)
In [6]:
%timeit approx_pi(10000)
In [16]:
%timeit approx_pi(1000000)
In [7]:
def pi2(n=1000000):
s = 0.0
for k in xrange(1, n+1):
s += 1.0 / k**2
return (6.0 * s) ** 0.5
%timeit pi2(1000000)
Optimizing a program is an attempt to improve its performance in terms of execution time. Before attempting to optimize your program, be sure that it requires optimizing. So benchmark it and assess its performance before you are sure that it requires optimizing.
Donald Knuth is said to have made the following statement:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Premature optimization is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a program. This can result in a program whose design is not as clean as could have been because it is complicated by the optimization and the programmer is distracted by the focus on optimization.
Optimization can be carried out at different levels:
Before beginning to optimize a program, do the following:
Profiling is the process by which it is possible to gather statistics regarding the number of times a function is called, the amount of time the program spends in each function and other similar data. Analysis of profiling data helps in identifying which parts of the program need to be optimized.
Python provides two modules for profiling Python programs.
Both profile
and cProfile
export the same interface, so they are interchangeable. That is, replacing profile
with cProfile
, or vice versa does not require any changes to the program being profiled.
A program can be profiled in one of two ways:
In [8]:
# filename: calc_pi.py
def recip_square(i):
return 1.0 / i**2
def approx_pi(n=10000000):
s = 0
for k in range(1, n+1):
s += recip_square(k)
return (6 * s) ** 0.5
# Driver Program. The lines appearing below could also be written in a separate file
# If driver program is separate, it must import the file containing the above two functions
if __name__ == '__main__':
# Import Python modules required for profiling and statistics gatering
import cProfile
cProfile.run('approx_pi()')
# cProfile.runctx("approx_pi()", globals(), locals(), "Profile.prof")
# s = pstats.Stats("Profile.prof")
# s.strip_dirs().sort_stats("time").print_stats()
If the results of a profile are to be saved in a file for subsequent processing, the name of the file must be supplied as the second argument to cProfile.run()
.
In [2]:
if __name__ == '__main__':
cProfile.run('approx_pi()', 'approx_pi.prof')
The above results could also be achieved by the following commands from the command line:
$ python -m cProfile calc_pi.py
To write profile data to a file named calc_pi.prof
, use the followingcommand:
$ python -m cProfile -o calc_pi.prof calc_pi.py
The advantage of saving profiling data in a file is that it can be processed in a flexible way. Profiling statistics can be processed using the module pstats and the class stats which provides several methods that enable processing of profiling statistics.
In [3]:
if __name__ == '__main__':
import cProfile, pstats
cProfile.run('approx_pi()', 'calc_pi.prof')
p = pstats.Stats('calc_pi.prof')
p.strip_dirs().sort_stats('time').print_stats()
Some useful methods in Stats are listed below:
strip_dirs()
strips removes extraneous path from all module namessort_stats()
sorts the lines according to several criteria, such as, by name of the functions (name
), time spent within each function (time
), cumulative time spent in each function (cumulative
), filename (file
) or the number of times each function is called (calls
)print_stats()
prints the processed statisticsprint_callers()
prints a list of all functions that called each function in the profiled databaseprint_callees()
prints a list of all functions that were called by the indicated functionTo use these methods, you must first create an object of type Stats
, initialising it with the filename containing the saved profiling data. The above example strips path data, sorts by time spent in each function and prints the statistics. It is possible to print the statistics from the command line also, provided the -o
option is not used.
$ python -m cProfile -s time calc_pi.py
Profiling gives not only the time spent in each of the functions, but also the total time taken by the program. If you only want the execution times without the profiling data, you must use the timeit
module. It contains the class Timer
that can execute a specified function and report its execution time. It is possible to specify the number of times the function is to be executed.
In [9]:
# filename: timeit_pi.py
from timeit import Timer
t = Timer('approx_pi(100000)', 'from __main__ import approx_pi')
reps = 5
print 'Execution time = %.2f s' % (sum(t.repeat(repeat=reps, number=1)) / reps)
In [10]:
%timeit approx_pi(100000)
IPython console gives a handy magic command %timeit
that does the same job as above.
In [10] from calc_pi import approx_pi
In [11] %timeit approx_pi()
1 loops, best of 3: 2.62 s per loop
But if you are not using IPython, it is best to use the timeit
module as described above.