This is based on the previous dojo-20150131-memoization notebook.
In [1]:
from __future__ import print_function
We start with a function that calculates fibonacci numbers. Its code is very much like the definition of Fibonacci numbers.
In [2]:
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
It is correct. Check the output in the following cell.
In [3]:
[fib(i) for i in range(10)]
Out[3]:
However, it takes an exponential amount of time to calculate fibonacci numbers. This is not so bad for small values, but exponential things become big quickly.
In [4]:
n = 33
In [5]:
%timeit fib(n)
fib(n)
Out[5]:
The problem is that the naïve function recalculates Fibonacci numbers. Some code is added to keep track of how many times the function is called for each input value.
In [6]:
from collections import Counter
In [7]:
def fib(n):
global c
c[n] += 1
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
In [8]:
c = Counter()
fib(n)
Out[8]:
In [9]:
for i in sorted(c, reverse=True):
print(i, c[i])
print()
print(sum(c.values()))
No wonder it is so slow. By the way, do you see a pattern in the numbers?
By the way, here's another way of sorting and iterating through the counts.
In [10]:
for item in sorted(c.items(), key=(lambda x: x[0]), reverse=True):
print(item)
print()
print(sum(c.values()))
The above output has parentheses, because item is a tuple. We can avoid the parentheses, by explicitly indexing the elements of the tuple as in the next cell.
In [11]:
for item in sorted(c.items(), key=(lambda x: x[0]), reverse=True):
print(item[0], item[1])
print()
print(sum(c.values()))
Another example of how to avoid the parentheses, without having to explicitly index each element of the tuple, is to use a leading '*' to unpack the tuple into separate arguments to the print function.
In [12]:
for item in sorted(c.items(), key=(lambda x: x[0]), reverse=True):
print(*item)
print()
print(sum(c.values()))
We can make the function quicker, by calculating each Fibonacci number only once. When a previously calculated value is needed, we just return that previously calculated number.
In [13]:
def fib(n):
global c
global cache
c[n] += 1
if n in cache:
return cache[n]
if n == 0:
f = 0
elif n == 1:
f = 1
else:
f = fib(n-1) + fib(n-2)
cache[n] = f
return f
In [14]:
# just to make them global
c = None
cache = None
In [15]:
def foo(n):
global c
global cache
c = Counter()
cache = {}
fib(n)
%timeit foo(n)
That is over 40,000 times faster than the naïve version. Let's look at the details.
In [16]:
c = Counter()
cache = {}
fib(n)
for i in sorted(c.keys(), reverse=True):
print(i, cache[i], c[i])
print()
print(sum(c.values()))
Now let's clean up the code. First, let's remove the counting stuff, since we know the answer for that now.
In [17]:
def fib(n):
global cache
if n in cache:
return cache[n]
if n == 0:
f = 0
elif n == 1:
f = 1
else:
f = fib(n-1) + fib(n-2)
cache[n] = f
return f
In [18]:
def foo(n):
global cache
cache = {}
fib(n)
%timeit foo(n)
Taking out the diagnostic counting code made it faster yet. The caching code makes the fib() function ugly and I don't like needing to remember to initialize the cache, so let's separate the caching into a separate function. Note that memoize() doesn't know anything about the fib() function, except that it takes a single argument. memoize() can be applied to many single argument functions.
In [19]:
def memoize(f):
results = {}
def helper(x):
if x not in results:
results[x] = f(x)
return results[x]
return helper
In [20]:
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
In [21]:
fib = memoize(fib)
In [22]:
def foo(n):
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
fib = memoize(fib)
%timeit -n 1 fib(n)
In [23]:
for i in range(20):
foo(n)
That yielded yet another big speed increase. It's now over a million times faster than the original function.
In [24]:
%timeit fib(n)
That's faster yet, but perhaps deceptively so. %timeit did fib(n) many times, but initialized the results cache only once.
Next, let's use the syntactic sugar of a function decorator to eliminate the fib = memoize(fib) statement. The @memoize decorator can be used for many single argument functions.
In [25]:
@memoize
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
In [26]:
%timeit fib(n)
That yielded about the same speed. The code is now both simple and fast.
How would one measure the speed of starting with an empty cache each time?
What are the limitations of memoize() and @memoize?
How would you work around those limitations?
In [27]:
# By the way, this computer is modest.
!lscpu
After the 2015-02-23 CohPy presentation of the above, it was suggested to use the .get() method of dictionaries to avoid the if statement inside helper(). The point is to make the code simpler and more elegant. Unfortunately, the .get() method would not change the dictionary, so the return value from the .get() method would have to be saved. That new memoize function would look like the following. It does succeed in getting rid the if statement, but the 'results[x] = results' looks ugly. Even worse, f(x) is always called, defeating the memoization.
In [28]:
def memoize(f):
results = {}
def helper(x):
results[x] = results.get(x, f(x))
return results[x]
return helper
In [29]:
@memoize
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
In [30]:
%timeit fib(n)
fib(n)
Out[30]:
Using the .setdefault() method instead of the .get() method looks better, but suffers the same flaw of defeating the memoization by always calling f(x).
In [31]:
def memoize(f):
results = {}
def helper(x):
return results.setdefault(x, f(x))
return helper
In [32]:
@memoize
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
In [33]:
%timeit fib(n)
fib(n)
Out[33]:
My memoize decorator is fun and good for exploring closures and decorators. For real projects, use functools.lru_cache instead of my memoize decorator.