Project Euler

Longest Collatz sequence

Problem 14

The following iterative sequence is defined for the set of positive integers:

n → n/2 (n is even)
n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.


In [1]:
from functools import lru_cache

In [2]:
def gen_collatz(n):
    yield n
    while n > 1:
        if n % 2 == 0:
            n //= 2
        else:
            n = 3 * n + 1
        yield n

In [3]:
list(gen_collatz(13))


Out[3]:
[13, 40, 20, 10, 5, 16, 8, 4, 2, 1]

In [4]:
len(list(gen_collatz(13)))


Out[4]:
10

In [5]:
def foo(n):
    max_chain_len = 0
    max_chain_n = 0
    for i in range(1, n):
        if len(list(gen_collatz(i))) > max_chain_len:
            max_chain_len = len(list(gen_collatz(i)))
            max_chain_n = i
    return max_chain_n, max_chain_len

In [6]:
n = 10**6
%time known_good_output = foo(n)
print('The Collatz sequence for %s is %s long.' % known_good_output)


CPU times: user 50.5 s, sys: 0 ns, total: 50.5 s
Wall time: 50.6 s
The Collatz sequence for 837799 is 525 long.

The subchains are calculated over and over again, so use caching to avoid recalculating subchains. collatz is changed to be recursive to take advantage of lru_cache.

Using cache was inspired by Kristian.


In [7]:
@lru_cache(maxsize=None)
def collatz(n):
    x = [n]
    if n > 1:
        if n % 2 == 0:
            n //= 2
        else:
            n = 3 * n + 1
        x.extend(collatz(n))
    return x

In [8]:
list(gen_collatz(13))


Out[8]:
[13, 40, 20, 10, 5, 16, 8, 4, 2, 1]

In [9]:
collatz(13)


Out[9]:
[13, 40, 20, 10, 5, 16, 8, 4, 2, 1]

In [10]:
def foo(n):
    max_chain_len = 0
    max_chain_n = 0
    for i in range(1, n):
        if len(collatz(i)) > max_chain_len:
            max_chain_len = len(collatz(i))
            max_chain_n = i
    return max_chain_n, max_chain_len

In [11]:
n = 10**6
%timeit foo(n)
assert known_good_output == foo(n)


The slowest run took 6.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 1.64 s per loop

Only the lengths of the Collatz chains are wanted, not the numbers in the Collatz chains, so len_collatz is created to return only the lengths of the Collatz chains. len_collatz is also recursive to take advantage of lru_cache.

len_collatz is faster than collatz, but not as general as collatz.


In [12]:
@lru_cache(maxsize=None)
def len_collatz(n):
    len_chain = 1
    if n > 1:
        if n % 2 == 0:
            n //= 2
        else:
            n = 3 * n + 1
        len_chain += len(collatz(n))
    return len_chain

In [13]:
len_collatz(13)


Out[13]:
10

In [14]:
def foo(n):
    max_chain_len = 0
    max_chain_n = 0
    for i in range(1, n):
        if len_collatz(i) > max_chain_len:
            max_chain_len = len_collatz(i)
            max_chain_n = i
    return max_chain_n, max_chain_len

In [15]:
n = 10**6
%timeit foo(n)
assert known_good_output == foo(n)


1 loop, best of 3: 1.45 s per loop

Review

Cell 2 is very readable. Cell 5 has good readability. Together they pound through with brute force and produce the correct result slowly.

Cell 7 is slightly less readable than cell 2, but uses caching for a big speed increase, especially when generating many Collatz chains. Cell 10 is slightly more readable than cell 5. Together, cells 7 and 10 are much faster than cells 2 and 5.

Cell 12 has good readability, but is less general than cell 7. Cell 14 is more readable than cell 10. Together, cells 12 and 14 are faster than cells 7 and 10 but not hugely so.

Summary

If one wants only one Collatz chain, caching would not help, so use gen_collatz of cell 2 because it is most readable and is probably faster than collatz of cell 7 for a single Collatz chain.

collatz of cell 7 is readable, general, and fast. It is probably the best choice if doing work with many Collatz chains.

len_collatz of cell 12 is useful only for when

  • one wants to know only the lengths of many Collatz chains
  • one wants fastest speed

Otherwise collatz of cell 7's readability and generality make it a better choice.


collatz of cell 7 is probably the the best overall choice because:

  1. It has good readability.
  2. It is fast for making many Collatz chains.
    • It is probably slower than gen_collatz of cell 2 if one is making few chains. However, if one is making few chains, then speed is likely not critical.
  3. It is general.