The following iterative sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
In [1]:
from functools import lru_cache
In [2]:
def gen_collatz(n):
yield n
while n > 1:
if n % 2 == 0:
n //= 2
else:
n = 3 * n + 1
yield n
In [3]:
list(gen_collatz(13))
Out[3]:
In [4]:
len(list(gen_collatz(13)))
Out[4]:
In [5]:
def foo(n):
max_chain_len = 0
max_chain_n = 0
for i in range(1, n):
if len(list(gen_collatz(i))) > max_chain_len:
max_chain_len = len(list(gen_collatz(i)))
max_chain_n = i
return max_chain_n, max_chain_len
In [6]:
n = 10**6
%time known_good_output = foo(n)
print('The Collatz sequence for %s is %s long.' % known_good_output)
In [7]:
@lru_cache(maxsize=None)
def collatz(n):
x = [n]
if n > 1:
if n % 2 == 0:
n //= 2
else:
n = 3 * n + 1
x.extend(collatz(n))
return x
In [8]:
list(gen_collatz(13))
Out[8]:
In [9]:
collatz(13)
Out[9]:
In [10]:
def foo(n):
max_chain_len = 0
max_chain_n = 0
for i in range(1, n):
if len(collatz(i)) > max_chain_len:
max_chain_len = len(collatz(i))
max_chain_n = i
return max_chain_n, max_chain_len
In [11]:
n = 10**6
%timeit foo(n)
assert known_good_output == foo(n)
Only the lengths of the Collatz chains are wanted, not the numbers in the Collatz chains, so len_collatz is created to return only the lengths of the Collatz chains. len_collatz is also recursive to take advantage of lru_cache.
len_collatz is faster than collatz, but not as general as collatz.
In [12]:
@lru_cache(maxsize=None)
def len_collatz(n):
len_chain = 1
if n > 1:
if n % 2 == 0:
n //= 2
else:
n = 3 * n + 1
len_chain += len(collatz(n))
return len_chain
In [13]:
len_collatz(13)
Out[13]:
In [14]:
def foo(n):
max_chain_len = 0
max_chain_n = 0
for i in range(1, n):
if len_collatz(i) > max_chain_len:
max_chain_len = len_collatz(i)
max_chain_n = i
return max_chain_n, max_chain_len
In [15]:
n = 10**6
%timeit foo(n)
assert known_good_output == foo(n)
Cell 2 is very readable. Cell 5 has good readability. Together they pound through with brute force and produce the correct result slowly.
Cell 7 is slightly less readable than cell 2, but uses caching for a big speed increase, especially when generating many Collatz chains. Cell 10 is slightly more readable than cell 5. Together, cells 7 and 10 are much faster than cells 2 and 5.
Cell 12 has good readability, but is less general than cell 7. Cell 14 is more readable than cell 10. Together, cells 12 and 14 are faster than cells 7 and 10 but not hugely so.
If one wants only one Collatz chain, caching would not help, so use gen_collatz of cell 2 because it is most readable and is probably faster than collatz of cell 7 for a single Collatz chain.
collatz of cell 7 is readable, general, and fast. It is probably the best choice if doing work with many Collatz chains.
len_collatz of cell 12 is useful only for when
Otherwise collatz of cell 7's readability and generality make it a better choice.
collatz of cell 7 is probably the the best overall choice because: