In the section on loops we introduced the range function, and said that you should think about it as creating a list of numbers. In Python 2.X this is exactly what it does. In Python 3.X this is not what it does. Instead it creates the numbers one at a time. The difference in speed and memory usage is enormous for very large lists - examples are given here and here.
We can recreate one of the examples from Meuer's slides in detail:
In [1]:
def naivesum_list(N):
"""
Naively sum the first N integers
"""
A = 0
for i in list(range(N + 1)):
A += i
return A
We will now see how much memory this uses:
In [2]:
%load_ext memory_profiler
In [3]:
%memit naivesum_list(10**4)
In [4]:
%memit naivesum_list(10**5)
In [5]:
%memit naivesum_list(10**6)
In [6]:
%memit naivesum_list(10**7)
In [7]:
%memit naivesum_list(10**8)
We see that the memory usage is growing very rapidly - as the list gets large it's growing as $N$.
Instead we can use the range function that yields one integer at a time:
In [8]:
def naivesum(N):
"""
Naively sum the first N integers
"""
A = 0
for i in range(N + 1):
A += i
return A
In [9]:
%memit naivesum(10**4)
In [10]:
%memit naivesum(10**5)
In [11]:
%memit naivesum(10**6)
In [12]:
%memit naivesum(10**7)
In [13]:
%memit naivesum(10**8)
We see that the memory usage is unchanged with $N$, making it practical to run much larger calculations.
The range function is returning an iterator here. This is an object - a general thing - that represents a stream, or a sequence, of data. The iterator knows how to create the first element of the stream, and it knows how to get the next element. It does not, in general, need to know all of the elements at once.
As we've seen above this can save a lot of memory. It can also save time: the code does not need to construct all of the members of the sequence before starting, and it's quite possible you don't need all of them (think about the "Shortest published mathematical paper" exercise).
An iterator such as range is very useful, and there's a lot more useful ways to work with iterators in the itertools module. These functions that return iterators, such as range, are called generators, and it's useful to be able to make your own.
Let's look at an example: finding all primes less than $N$ that can be written in the form $4 k - 1$, where $k$ is an integer.
We're going to need to calculate all prime numbers less than or equal to $N$. We could write a function that returns all these numbers as a list. However, if $N$ gets large then this will be expensive, both in time and memory. As we only need one number at a time, we can use a generator.
In [14]:
def all_primes(N):
"""
Return all primes less than or equal to N.
Parameters
----------
N : int
Maximum number
Returns
-------
prime : generator
Prime numbers
"""
primes = []
for n in range(2, N+1):
is_n_prime = True
for p in primes:
if n%p == 0:
is_n_prime = False
break
if is_n_prime:
primes.append(n)
yield n
This code needs careful examination. First it defines the list of all prime numbers that it currently knows, primes (which is initially empty). Then it loops through all integers $n$ from $2$ to $N$ (ignoring $1$ as we know it's not prime).
Inside this loop it initially assumes that $n$ is prime. It then checks if any of the known primes exactly divides $n$ (n%p == 0 checks if $n \bmod p = 0$). As soon as it finds such a prime divisor it knows that $n$ is not prime it resets the assumption with this new knowledge, then breaks out of the loop. This statement stops the for p in primes loop early, as we don't need to look at later primes.
If no known prime ever divides $n$ then at the end of the for p in primes loop we will still have is_n_prime being True. In this case we must have $n$ being prime, so we add it to the list of known primes and return it.
It is precisely this point which makes the code above define a generator. We return the value of the prime number found
yield keyword, not the return keyword, andIt is the use of the yield keyword that makes this function a generator.
This means that only the latest prime number is stored for return.
To use the iterator within a loop, we code it in the same way as with the range function:
In [15]:
print("All prime numbers less than or equal to 20:")
for p in all_primes(20):
print(p)
To see what the generator is actually doing, we can step through it one call at a time using the built in next function:
In [16]:
a = all_primes(10)
In [17]:
next(a)
Out[17]:
In [18]:
next(a)
Out[18]:
In [19]:
next(a)
Out[19]:
In [20]:
next(a)
Out[20]:
In [21]:
next(a)
So, when the generator gets to the end of its iteration it raises an exception. As seen in previous sections, we could surround the next call with a try block to capture the StopIteration so that we can continue after it finishes. This is effectively what the for loop is doing.
We can now find all primes (less than or equal to 100, for example) that have the form $4 k - 1$ using
In [ ]:
for p in all_primes(100):
if (1+p)%4 == 0:
print("The prime {} is 4 * {} - 1.".format(p, int((1+p)/4)))
A twin prime is a pair $(p_1, p_2)$ such that both $p_1$ and $p_2$ are prime and $p_2 = p_1 + 2$.
Write a generator that returns twin primes. You can use the generators above, and may want to look at the itertools module together with its recipes, particularly the pairwise recipe.
In the section on classes we defined a Monomial class to represent a polynomial with leading coefficient $1$. As the $N+1$ monomials $1, x, x^2, \dots, x^N$ form a basis for the vector space of polynomials of order $N$, $\mathbb{P}^N$, we can use the Monomial class to return this basis.
Define a generator that will iterate through this basis of $\mathbb{P}^N$ and test it on $\mathbb{P}^3$.
An alternative basis is given by the monomials
\begin{align} p_0(x) &= 1, \\ p_1(x) &= 1-x, \\ p_2(x) &= (1-x)(2-x), \\ \dots & \quad \dots, \\ p_N(x) &= \prod_{n=1}^N (n-x). \end{align}Define a generator that will iterate through this basis of $\mathbb{P}^N$ and test it on $\mathbb{P}^4$.