Consider this puzzle from The Riddler:
The misanthropes are coming. Suppose there is a row of some number, N, of houses in a new, initially empty development. Misanthropes are moving into the development one at a time and selecting a house at random from those that have nobody in them and nobody living next door. They keep on coming until no acceptable houses remain. At most, one out of two houses will be occupied; at least one out of three houses will be. But what’s the expected fraction of occupied houses as the development gets larger, that is, as N goes to infinity?
To make sure we understand the problem, let's try a simple example, with N=4 houses. We will represent the originally empty row of four houses by four dots:
....
Now the first person chooses one of the four houses (which are all acceptable). We'll indicate an occupied house by a 1
, so the four equiprobable choices are:
1...
.1..
..1.
...1
When a house is occupied, any adjacent houses become unacceptable. We'll indicate that with a 0
:
10..
010.
.010
..01
In all four cases, a second occupant has a place to move in, but then there is no place for a third occupant. The occupancy is 2 in all cases, and thus the expected occpancy is 2. The occupancy fraction, or density, is 2/N = 2/4 = 1/2.
With N=7 houses, there are 7 equiprobable choices for the first occupant:
10.....
010....
.010...
..010..
...010.
....010
.....01
Now we'll add something new: the lengths of the runs of consecutive acceptable houses to the left and right of the chosen house:
10..... runs = (0, 5)
010.... runs = (0, 4)
.010... runs = (1, 3)
..010.. runs = (2, 2)
...010. runs = (3, 1)
....010 runs = (4, 0)
.....01 runs = (5, 0)
occ(n)
This gives me a key hint as to how to analyze the problem. I'll define occ(n)
to be the expected number of occupied houses in a row of n
houses. So:
10..... runs = (0, 5); occupancy = occ(0) + 1 + occ(5)
010.... runs = (0, 4); occupancy = occ(0) + 1 + occ(4)
.010... runs = (1, 3); occupancy = occ(1) + 1 + occ(3)
..010.. runs = (2, 2); occupancy = occ(2) + 1 + occ(2)
...010. runs = (3, 1); occupancy = occ(3) + 1 + occ(1)
....010 runs = (4, 0); occupancy = occ(4) + 1 + occ(0)
.....01 runs = (5, 0); occupancy = occ(5) + 1 + occ(0)
So we can say that occ(n)
is:
n
is 0 (because no houses means no occupants),n
is 1 (because one isolated acceptable house has one occupant),n
choices for the first occupied house,
of the sum of that house plus the occupancy of the runs to the left and right.We can implement that:
In [1]:
from statistics import mean
def occ(n):
"The expected occupancy for a row of n houses (under misanthrope rules)."
return (0 if n == 0 else
1 if n == 1 else
mean(occ(L) + 1 + occ(R)
for (L, R) in runs(n)))
def runs(n):
"""A list [(L, R), ...] where the i-th tuple contains the lengths of the runs
of acceptable houses to the left and right of house i."""
return [(max(0, i - 1), max(0, n - i - 2))
for i in range(n)]
def density(n): return occ(n) / n
Let's check that occ(4)
is 2, as we computed it should be:
In [30]:
occ(4)
Out[30]:
And that runs(7)
is what we described above:
In [3]:
runs(7)
Out[3]:
Let's check on n = 7
:
In [4]:
occ(7)
Out[4]:
In [5]:
density(7)
Out[5]:
That seems reasonable, but I don't know for sure that it is correct.
occ
The computation of occ(n)
makes multiple calls to occ(n-1), occ(n-2),
etc. To avoid re-computing the same calls over and over, we will modify occ
to save previous results using dynamic programming. I could implement that in one line with the decorator @functools.lru_cache
, but instead I will explicitly manage a list, cache
, such that cache[n]
holds occ(n)
:
In [6]:
def occ(n, cache=[0, 1]):
"The expected occupancy for a row of n houses (under misanthrope rules)."
# Store occ(i) in cache[i] for all as-yet-uncomputed values of i up to n:
for i in range(len(cache), n+1):
cache.append(mean(cache[L] + 1 + cache[R]
for (L, R) in runs(i)))
return cache[n]
Let's make sure this new version gets the same results as the old version:
In [32]:
occ(4) == 2
Out[32]:
In [33]:
density(7)
Out[33]:
Let's make sure the caching makes computation faster the second time:
In [9]:
%time occ(2000)
Out[9]:
In [10]:
%time occ(2000)
Out[10]:
In [11]:
%matplotlib inline
import matplotlib.pyplot as plt
def plot_density(ns):
"Plot density(n) for each n in the list of numbers ns."
plt.xlabel('n houses'); plt.ylabel('density(n)')
plt.plot(ns, [density(n) for n in ns], 's-')
return density(ns[-1])
In [12]:
plot_density(range(1, 100))
Out[12]:
There is something funny going on with the first few values of n
. Let's separately look at the first few:
In [13]:
plot_density(range(1, 11))
Out[13]:
And at a wider range:
In [14]:
plot_density(range(100, 4000, 50))
Out[14]:
The density is going down, and the curve is almost but not quite flat.
Thes puzzle
is to figure out the limit of density(n)
as n
goes to infinity. The plot above makes it look like 0.432+something, but we can't answer the question just by plotting; we'll need to switch modes from computational thinking to mathematical thinking.
At this point I started playing around with density(n)
, looking at various values, differences of values, ratios of values, and ratios of differences of values, hoping to achieve some mathematical insight. Mostly I got dead ends.
But eventually I hit on something promising. I looked at the difference between density values (using the function diff
), and particularly the difference as you double n
:
In [15]:
def diff(n, m): return density(n) - density(m)
diff(100, 200)
Out[15]:
And compared that to the difference when you double n
again:
In [16]:
diff(200, 400)
Out[16]:
Hmm—I noticed that the first difference is just about twice as much as the second. Let's check:
In [17]:
diff(100, 200) / diff(200, 400)
Out[17]:
Wow—not only is it close to twice as much, it is exactly twice as much (to the precision of floating point numbers). Let's try other starting values for n
:
In [18]:
n = 500; diff(n, 2*n) / diff(2*n, 4*n)
Out[18]:
In [19]:
n = 1000; diff(n, 2*n) / diff(2*n, 4*n)
Out[19]:
OK, I'm convinced this is real!
Now, what mathematical function behaves like this? I figured out that f(n) = (1 / n) does. The ratio of the differences would be:
$$\frac{f(n) - f(2n)}{f(2n) - f(4n)} = \frac{1/n - 1/(2n)}{1/(2n) - 1 / (4n)}\;\;$$Multiplying top and bottom by n you get:
$$\frac{1 - 1/2}{1/2 - 1 /4} = \frac{1/2}{1/4} = 2\;\;$$If the function (1 / n) fits the pattern, then so does any affine transformation of (1 / n), because we are taking the ratio of differences. So that means a density function of the form
density(n) = A + B / n
would fit the patterm. I can try a curve_fit
to estimate the parameters A and B:
In [20]:
from scipy.optimize import curve_fit
Ns = list(range(100, 10001, 100))
def f(n, A, B): return A + B / n
((A, B), covariance) = curve_fit(f=f, xdata=Ns, ydata=[density(n) for n in Ns])
covariance
Out[20]:
The curve_fit
function returns a sequence of parameter values, and a covariance matrix. The fact that all the numbers in the covariance matrix are really small indicates that the parameters are a really good fit. Here are the parameters, A
and B
:
In [21]:
A, B
Out[21]:
We can plug them into a function that estimates the density:
In [22]:
def estimated_density(n): return A + B / n
And we can test how close this function is to the true density
function:
In [23]:
max(abs(density(n) - estimated_density(n))
for n in range(200, 4000))
Out[23]:
That says that, for all values of n
from 200 to 4,000, density(n)
and estimated_density(n)
agree at least through the first 15 decimal places!
We now have a plausible answer to the puzzle:
lim n→∞ density(n) ≅ A = 0.43233235838169343
Theis answer is empirically strong (15 decimal places of accuracy) but theoretically weak: we don't have a proof, and we don't have an explanation for why the density function has this form. We need some more mathematical thinking.
I didn't have any ideas, so I looked to see if anyone else had written something about the number 0.43233235838169343. I tried several searches and found two interesting formulas:
[0.4323323]
Formula: sinh(1) / exp(1)
(Page)[0.432332358]
Formula: 0.5(1-e^(-2))
(Page)I can verify that the two formulas are equivalent, and that they are indeed equal to A
to at least 15 decimal places:
In [24]:
from math import sinh, exp, e
S = sinh(1) / exp(1)
E = 0.5 * (1 - e ** (-2))
assert S == E
S, E, A
Out[24]:
So I now have a suspicion that
lim n → ∞ density(n) = (1 - e-2) / 2
but I still have no proof, nor any intuition as to why this is so.
I reported my results to Anne Paulson and John Lamping, who had originally related the problem to me, and the next day John wrote back with the following:
I got a derivation of the formula!Suppose that each house has a different "attractiveness", and that when it is a misanthrope's turn to pick a house, they consider the houses in order, from most attractive to least, and pick the first house not next to any other house. If the attractiveness are chosen independently, this process gives the same probabilities as each misanthrope picking from the available houses randomly.
To be more concrete, let the attractivenesses range from 0 to 1 with uniform probability, with lower numbers being being considered earlier, and hence more attractive. (This makes the math come out easier.)
Given the attractiveness, a, of a house, we can compute the chance that it will end up getting picked. It will get picked if and only if neither house on either side gets considered earlier and gets picked. Lets consider one side, and assume the houses, and their attractivenesses, are labeled a, b, c, d, e, f, g, ... with the house we are considering being a. There are an infinite number of cases where b won't get considered before a and picked. Here are the first few:
a < b (a is considered before b)
a > b > c < d (Someone considered c first among these 4 and picked it. A later person considered b before a, but rejected it because c had already been chosen.)
a > b > c > d > e < f (Someone considered e first among these 6, and picked it. A later person considered d, but rejected it because e was chosen, then considered c, which they picked. Still later, someone considered b, which they rejected because c had been chosen.)
We can write down the probabilities of these cases as a function of the attractiveness of a:
a < b: The chance that b is greater than a: 1 - a.
a > b > c < d: Let y = a - c, so y is between 0 and a. The probability is the integral over the possible y of the chance that b is between a and c, times the chance that d is greater than c, or integral from 0 to a of y * (1 - a + y) dy.
a > b > c > d > e < f: Let y = a - e. The probability is the integral of (the chance that b, c, and d are all between a and e, and ordered right, which is y^3 / 3!, times the chance that f is greater than e, which is (1 - a + y))
If you work out the definite integrals, you get
a < b: 1 - a
a > b > c < d: a^2 / 2 - a^3 / 3!
a > b > c > d > e < f: a^4 / 4! - a^5 / 5!
Add them all up, and you have 1 - a + a^2 / 2 - a^3 / 3! + a ^4 / 4! ... the Taylor expansion for e^-a.
Now, there will be a house at a if both adjacent houses are not picked earlier, so the chance is the square of the chance for one side: e^(-2a). Integrate that from 0 to 1, and you get 1/2 (1 - e^-2).
You can compare John Lamping's solution to the solutions by Jim Ferry and Andrew Mascioli.
It is clear that different write-ups of this problem display different styles of thinking. I'll attempt to name and describe them:
n
rather than
creating a proof for all values of n
; produces tables or plots to help guide intuition; verifies results
with tests or alternative implementations. (Norvig)In this specific puzzle, my steps were:
n
.A
and B
.A
to find a formula with that value.This is mostly computational thinking, with a little mathematical thrown in.
Is our implementation of occ(n)
correct? I think it is, but I can anticipate some objections and answer them:
In occ(n)
, is it ok to start from all empty houses, rather than considering layouts of partially-occupied houses? Yes, I think it is ok, because the problem states that initially all houses are empty, and each choice of a house breaks the street up into runs of acceptable houses, flanked by unacceptable houses. If we get the computation right for a run of n
acceptable houses, then we can get the whole answer right. A key point is that the chosen first house breaks the row of houses into 2 runs of acceptable houses, not 2 runs of unoccupied houses. If it were unoccupied houses, then we would have to also keep track of whether there were occupied houses to the right and/or left of the runs. By considering runs of acceptable houses, eveything is clean and simple.
In occ(7)
, if the first house chosen is 2, that breaks the street up into runs of 1 and 3 acceptable houses. There is only one way to occupy the 1 house, but there are several ways to occupy the 3 houses. Shouldn't the average give more weight to the 3 houses, since there are more possibilities there? No, I don't think so. We are caclulating occupancy, and there is a specific number (5/3) which is the expected occupancy of 3 houses; it doesn't matter if there is one combination or a million combinations that contribute to that expected value, all that matters is what the expected value is.
A simulation can add credence to our implementation of density(n)
, for two reasons:
The simulation will start with an empty set of occupied houses. Following Lamping's suggestion, we sample n houses (to get a random ordering) and go through them, occupying just the ones that have no neighbor"
In [25]:
import random
def simulate(n):
"Simulate moving in to houses, and return a sorted tuple of occupied houses."
occupied = set()
for house in random.sample(range(n), n):
if (house - 1) not in occupied and (house + 1) not in occupied:
occupied.add(house)
return sorted(occupied)
def simulated_density(n, repeat=10000):
"Estimate density by simulation, repeated `repeat` times."
return mean(len(simulate(n)) / n
for _ in range(repeat))
Let's see if the simulation returns results that match the actual density
function and the estimated_density
function:
In [26]:
print(' n simul density estimated')
for n in (25, 50, 100, 200, 400):
print('{:3} {:.3} {:.3} {:.3}'
.format(n, simulated_density(n), density(n), estimated_density(n)))
We got perfect agreement (at least to 3 decimal places), suggesting that either our three implementations are all correct, or we've made mistakes in all three.
The simulate
function can also give us insights when we look at the results it produces:
In [27]:
simulate(7)
Out[27]:
Let's repeat that multiple times, and store the results in a Counter
, which tracks how many times it has seen each result:
In [28]:
from collections import Counter
Counter(tuple(simulate(7)) for _ in range(10000))
Out[28]:
That says that about 1/3 of the time, things work out so that the 4 even-numbered houses are occupied. But if anybody ever chooses an odd-numbered house, then we are destined to have 3 houses occupied (in one of 6 different ways, of which (1, 3 5) is the most common, probably because it is the only one that has three chances of getting started with an odd-numbered house).
In [29]:
def test():
assert occ(0) == 0
assert occ(1) == occ(2) == 1
assert occ(3) == 5/3
assert density(3) == occ(3) / 3
assert density(100) == occ(100) / 100
assert runs(3) == [(0, 1), (0, 0), (1, 0)]
assert runs(7) == [(0, 5), (0, 4), (1, 3), (2, 2), (3, 1), (4, 0), (5, 0)]
for n in (3, 7, 10, 20, 100, 101, 200, 201):
for repeat in range(500):
assert_valid(simulate(n), n)
return 'ok'
def assert_valid(occupied, n):
"""Assert that, in this collection of occupied houses, no house is adjacent to an
occupied house, and every unoccupied position is adjacent to an occupied house."""
occupied = set(occupied) # coerce to set
for house in range(n):
if house in occupied:
assert (house - 1) not in occupied and (house + 1) not in occupied
else:
assert (house - 1) in occupied or (house + 1) in occupied
test()
Out[29]:
I'm happy with my result:
lim n → ∞ density(n) = (1 - e-2) / 2 ≅ 0.43233235838169365
I got the fun of working on the puzzle, the satisfaction of seeing John Lamping work out a proof, and the awe of seeing the mathematically sophisticated solutions of Jim Ferry and Andrew Mascioli, solutions that I know I could never come up with, but that I could get near to by coming at the problem with a different style of thinking.