The Puzzle of the Misanthropic Neighbors

Consider this puzzle from The Riddler:

The misanthropes are coming. Suppose there is a row of some number, N, of houses in a new, initially empty development. Misanthropes are moving into the development one at a time and selecting a house at random from those that have nobody in them and nobody living next door. They keep on coming until no acceptable houses remain. At most, one out of two houses will be occupied; at least one out of three houses will be. But what’s the expected fraction of occupied houses as the development gets larger, that is, as N goes to infinity?

Consider N=4 Houses

N houses

To make sure we understand the problem, let's try a simple example, with N=4 houses. We will represent the originally empty row of four houses by four dots:

 ....

Now the first person chooses one of the four houses (which are all acceptable). We'll indicate an occupied house by a 1, so the four equiprobable choices are:

 1...
 .1..
 ..1.
 ...1

When a house is occupied, any adjacent houses become unacceptable. We'll indicate that with a 0:

 10..
 010.
 .010
 ..01

In all four cases, a second occupant has a place to move in, but then there is no place for a third occupant. The occupancy is 2 in all cases, and thus the expected occpancy is 2. The occupancy fraction, or density, is 2/N = 2/4 = 1/2.

Consider N=7 Houses

With N=7 houses, there are 7 equiprobable choices for the first occupant:

 10.....
 010....
 .010...
 ..010..
 ...010.
 ....010
 .....01

Now we'll add something new: the lengths of the runs of consecutive acceptable houses to the left and right of the chosen house:

 10..... runs = (0, 5)
 010.... runs = (0, 4)
 .010... runs = (1, 3)
 ..010.. runs = (2, 2)
 ...010. runs = (3, 1)
 ....010 runs = (4, 0)
 .....01 runs = (5, 0)

Defining `occ(n)`

This gives me a key hint as to how to analyze the problem. I'll define occ(n) to be the expected number of occupied houses in a row of n houses. So:

 10..... runs = (0, 5); occupancy = occ(0) + 1 + occ(5)
 010.... runs = (0, 4); occupancy = occ(0) + 1 + occ(4)
 .010... runs = (1, 3); occupancy = occ(1) + 1 + occ(3)
 ..010.. runs = (2, 2); occupancy = occ(2) + 1 + occ(2)
 ...010. runs = (3, 1); occupancy = occ(3) + 1 + occ(1)
 ....010 runs = (4, 0); occupancy = occ(4) + 1 + occ(0)
 .....01 runs = (5, 0); occupancy = occ(5) + 1 + occ(0)

So we can say that occ(n) is:

0 when n is 0 (because no houses means no occupants),
1 when n is 1 (because one isolated acceptable house has one occupant),
else the mean, over the n choices for the first occupied house, of the sum of that house plus the occupancy of the runs to the left and right.

We can implement that:



In [1]:

    
from statistics import mean

def occ(n):
    "The expected occupancy for a row of n houses (under misanthrope rules)."
    return (0 if n == 0 else
            1 if n == 1 else
            mean(occ(L) + 1 + occ(R)
                 for (L, R) in runs(n)))

def runs(n):
    """A list [(L, R), ...] where the i-th tuple contains the lengths of the runs
    of acceptable houses to the left and right of house i."""
    return [(max(0, i - 1), max(0, n - i - 2))
            for i in range(n)]

def density(n): return occ(n) / n

Let's check that occ(4) is 2, as we computed it should be:



In [30]:

    
occ(4)









    Out[30]:





2.0

And that runs(7) is what we described above:



In [3]:

    
runs(7)









    Out[3]:





[(0, 5), (0, 4), (1, 3), (2, 2), (3, 1), (4, 0), (5, 0)]

Let's check on n = 7:



In [4]:

    
occ(7)









    Out[4]:





3.3238095238095235



In [5]:

    
density(7)









    Out[5]:





0.47482993197278905

That seems reasonable, but I don't know for sure that it is correct.

Dynamic Programming Version of `occ`

The computation of occ(n) makes multiple calls to occ(n-1), occ(n-2), etc. To avoid re-computing the same calls over and over, we will modify occ to save previous results using dynamic programming. I could implement that in one line with the decorator @functools.lru_cache, but instead I will explicitly manage a list, cache, such that cache[n] holds occ(n):



In [6]:

    
def occ(n, cache=[0, 1]):
    "The expected occupancy for a row of n houses (under misanthrope rules)."
    # Store occ(i) in cache[i] for all as-yet-uncomputed values of i up to n:
    for i in range(len(cache), n+1):
        cache.append(mean(cache[L] + 1 + cache[R]
                          for (L, R) in runs(i)))
    return cache[n]

Let's make sure this new version gets the same results as the old version:



In [32]:

    
occ(4) == 2









    Out[32]:





True



In [33]:

    
density(7)









    Out[33]:





0.47482993197278905

Let's make sure the caching makes computation faster the second time:



In [9]:

    
%time occ(2000)









    



CPU times: user 6.23 s, sys: 115 ms, total: 6.34 s
Wall time: 6.97 s






    Out[9]:





864.9617138385324



In [10]:

    
%time occ(2000)









    



CPU times: user 5 µs, sys: 0 ns, total: 5 µs
Wall time: 9.06 µs






    Out[10]:





864.9617138385324

Plotting `density(n)`

To get a feel for the limit of density(n), start by drawing a plot over some small values of n:



In [11]:

    
%matplotlib inline
import matplotlib.pyplot as plt

def plot_density(ns):
    "Plot density(n) for each n in the list of numbers ns."
    plt.xlabel('n houses'); plt.ylabel('density(n)')
    plt.plot(ns, [density(n) for n in ns], 's-')
    return density(ns[-1])



In [12]:

    
plot_density(range(1, 100))









    Out[12]:





0.4353323288377046

There is something funny going on with the first few values of n. Let's separately look at the first few:



In [13]:

    
plot_density(range(1, 11))









    Out[13]:





0.46203174603174607

And at a wider range:



In [14]:

    
plot_density(range(100, 4000, 50))









    Out[14]:





0.43240754751464183

The density is going down, and the curve is almost but not quite flat.

lim _{n → ∞} density(n)

Thes puzzle is to figure out the limit of density(n) as n goes to infinity. The plot above makes it look like 0.432+something, but we can't answer the question just by plotting; we'll need to switch modes from computational thinking to mathematical thinking.

At this point I started playing around with density(n), looking at various values, differences of values, ratios of values, and ratios of differences of values, hoping to achieve some mathematical insight. Mostly I got dead ends. But eventually I hit on something promising. I looked at the difference between density values (using the function diff), and particularly the difference as you double n:



In [15]:

    
def diff(n, m): return density(n) - density(m)

diff(100, 200)









    Out[15]:





0.0014849853757253895

And compared that to the difference when you double n again:



In [16]:

    
diff(200, 400)









    Out[16]:





0.0007424926878626947

Hmm—I noticed that the first difference is just about twice as much as the second. Let's check:



In [17]:

    
diff(100, 200) / diff(200, 400)









    Out[17]:





2.0

Wow—not only is it close to twice as much, it is exactly twice as much (to the precision of floating point numbers). Let's try other starting values for n:



In [18]:

    
n = 500; diff(n, 2*n) / diff(2*n, 4*n)









    Out[18]:





2.0



In [19]:

    
n = 1000; diff(n, 2*n) / diff(2*n, 4*n)









    Out[19]:





1.9999999999992524

OK, I'm convinced this is real!

Now, what mathematical function behaves like this? I figured out that f(n) = (1 / n) does. The ratio of the differences would be:

$$\frac{f(n) - f(2n)}{f(2n) - f(4n)} = \frac{1/n - 1/(2n)}{1/(2n) - 1 / (4n)}\;\;$$

Multiplying top and bottom by n you get:

$$\frac{1 - 1/2}{1/2 - 1 /4} = \frac{1/2}{1/4} = 2\;\;$$

If the function (1 / n) fits the pattern, then so does any affine transformation of (1 / n), because we are taking the ratio of differences. So that means a density function of the form

density(n) = A + B / n

would fit the patterm. I can try a curve_fit to estimate the parameters A and B:



In [20]:

    
from scipy.optimize import curve_fit

Ns = list(range(100, 10001, 100))

def f(n, A, B): return A + B / n

((A, B), covariance) = curve_fit(f=f, xdata=Ns, ydata=[density(n) for n in Ns])

covariance









    Out[20]:





array([[  7.64810902e-34,  -2.42654280e-31],
       [ -2.42654280e-31,   4.67778766e-28]])

The curve_fit function returns a sequence of parameter values, and a covariance matrix. The fact that all the numbers in the covariance matrix are really small indicates that the parameters are a really good fit. Here are the parameters, A and B:



In [21]:

    
A, B









    Out[21]:





(0.43233235838169343, 0.29699707514520357)

We can plug them into a function that estimates the density:



In [22]:

    
def estimated_density(n): return A + B / n

And we can test how close this function is to the true density function:



In [23]:

    
max(abs(density(n) - estimated_density(n))
    for n in range(200, 4000))









    Out[23]:





3.8857805861880479e-16

That says that, for all values of n from 200 to 4,000, density(n) and estimated_density(n) agree at least through the first 15 decimal places!

We now have a plausible answer to the puzzle:

lim_n→∞ density(n) ≅ A = 0.43233235838169343

Why?

Theis answer is empirically strong (15 decimal places of accuracy) but theoretically weak: we don't have a proof, and we don't have an explanation for why the density function has this form. We need some more mathematical thinking.

I didn't have any ideas, so I looked to see if anyone else had written something about the number 0.43233235838169343. I tried several searches and found two interesting formulas:

Search: [0.4323323] Formula: sinh(1) / exp(1) (Page)
Search: [0.432332358] Formula: 0.5(1-e^(-2)) (Page)

I can verify that the two formulas are equivalent, and that they are indeed equal to A to at least 15 decimal places:



In [24]:

    
from math import sinh, exp, e

S = sinh(1) / exp(1)
E = 0.5 * (1 - e ** (-2))

assert S == E

S, E, A









    Out[24]:





(0.43233235838169365, 0.43233235838169365, 0.43233235838169343)

So I now have a suspicion that

lim _{n → ∞} density(n) = (1 - e^-2) / 2

but I still have no proof, nor any intuition as to why this is so.

John Lamping to the Rescue

I reported my results to Anne Paulson and John Lamping, who had originally related the problem to me, and the next day John wrote back with the following:

I got a derivation of the formula!
Suppose that each house has a different "attractiveness", and that when it is a misanthrope's turn to pick a house, they consider the houses in order, from most attractive to least, and pick the first house not next to any other house. If the attractiveness are chosen independently, this process gives the same probabilities as each misanthrope picking from the available houses randomly.
To be more concrete, let the attractivenesses range from 0 to 1 with uniform probability, with lower numbers being being considered earlier, and hence more attractive. (This makes the math come out easier.)
Given the attractiveness, a, of a house, we can compute the chance that it will end up getting picked. It will get picked if and only if neither house on either side gets considered earlier and gets picked. Lets consider one side, and assume the houses, and their attractivenesses, are labeled a, b, c, d, e, f, g, ... with the house we are considering being a. There are an infinite number of cases where b won't get considered before a and picked. Here are the first few:
a < b (a is considered before b)
a > b > c < d (Someone considered c first among these 4 and picked it. A later person considered b before a, but rejected it because c had already been chosen.)
a > b > c > d > e < f (Someone considered e first among these 6, and picked it. A later person considered d, but rejected it because e was chosen, then considered c, which they picked. Still later, someone considered b, which they rejected because c had been chosen.)
We can write down the probabilities of these cases as a function of the attractiveness of a:
a < b: The chance that b is greater than a: 1 - a.
a > b > c < d: Let y = a - c, so y is between 0 and a. The probability is the integral over the possible y of the chance that b is between a and c, times the chance that d is greater than c, or integral from 0 to a of y * (1 - a + y) dy.
a > b > c > d > e < f: Let y = a - e. The probability is the integral of (the chance that b, c, and d are all between a and e, and ordered right, which is y^3 / 3!, times the chance that f is greater than e, which is (1 - a + y))
If you work out the definite integrals, you get
a < b: 1 - a
a > b > c < d: a^2 / 2 - a^3 / 3!
a > b > c > d > e < f: a^4 / 4! - a^5 / 5!
Add them all up, and you have 1 - a + a^2 / 2 - a^3 / 3! + a ^4 / 4! ... the Taylor expansion for e^-a.
Now, there will be a house at a if both adjacent houses are not picked earlier, so the chance is the square of the chance for one side: e^(-2a). Integrate that from 0 to 1, and you get 1/2 (1 - e^-2).

You can compare John Lamping's solution to the solutions by Jim Ferry and Andrew Mascioli.

Styles of Thinking

It is clear that different write-ups of this problem display different styles of thinking. I'll attempt to name and describe them:

Mathematical Publication Style: This style uses sophisticated mathematics (e.g. generating functions, differentiation, asymptotic analysis, and manipulation of summations). It defines new abstractions without necessarily trying to motivate them first, it is terse and formal, and it gets us to the conclusion in a way that is clearly correct, but does not describe all the steps of how the author came up with the ideas. (Ferry and Mascioli)
Mathematical Exploration Style: Like the Mathematical Publication Style, but with more explanatory prose. (Lamping)
Computational Thinking: A more concrete style; tends to use programs with specific values of n rather than creating a proof for all values of n; produces tables or plots to help guide intuition; verifies results with tests or alternative implementations. (Norvig)

In this specific puzzle, my steps were:

Analyze the problem and implement code to solve it for small values of n.
Plot results and examine them for insight.
Play with the results and get a guess at a partial solution density(n) ≅ A + B / n.
Solve numerically for A and B.
Do a search with the numeric value of A to find a formula with that value.
Given that formula, let Lamping figure out how it corresponds to the problem.

This is mostly computational thinking, with a little mathematical thrown in.

Validation by Anticipating Objections

Is our implementation of occ(n) correct? I think it is, but I can anticipate some objections and answer them:

In occ(n), is it ok to start from all empty houses, rather than considering layouts of partially-occupied houses? Yes, I think it is ok, because the problem states that initially all houses are empty, and each choice of a house breaks the street up into runs of acceptable houses, flanked by unacceptable houses. If we get the computation right for a run of n acceptable houses, then we can get the whole answer right. A key point is that the chosen first house breaks the row of houses into 2 runs of acceptable houses, not 2 runs of unoccupied houses. If it were unoccupied houses, then we would have to also keep track of whether there were occupied houses to the right and/or left of the runs. By considering runs of acceptable houses, eveything is clean and simple.
In occ(7), if the first house chosen is 2, that breaks the street up into runs of 1 and 3 acceptable houses. There is only one way to occupy the 1 house, but there are several ways to occupy the 3 houses. Shouldn't the average give more weight to the 3 houses, since there are more possibilities there? No, I don't think so. We are caclulating occupancy, and there is a specific number (5/3) which is the expected occupancy of 3 houses; it doesn't matter if there is one combination or a million combinations that contribute to that expected value, all that matters is what the expected value is.

Validation by Simulation

A simulation can add credence to our implementation of density(n), for two reasons:

The code for the simulation can have a more direct corresponance to the problem statement.
When two very different implementations get the same result, that is evidence supporting both of them.

The simulation will start with an empty set of occupied houses. Following Lamping's suggestion, we sample n houses (to get a random ordering) and go through them, occupying just the ones that have no neighbor"



In [25]:

    
import random

def simulate(n):
    "Simulate moving in to houses, and return a sorted tuple of occupied houses."
    occupied = set()
    for house in random.sample(range(n), n):
        if (house - 1) not in occupied and (house + 1) not in occupied:
            occupied.add(house)
    return sorted(occupied)

def simulated_density(n, repeat=10000):
    "Estimate density by simulation, repeated `repeat` times."
    return mean(len(simulate(n)) / n 
                for _ in range(repeat))

Let's see if the simulation returns results that match the actual density function and the estimated_density function:



In [26]:

    
print('  n   simul    density  estimated')
for n in (25, 50, 100, 200, 400):
    print('{:3}   {:.3}    {:.3}    {:.3}'
          .format(n, simulated_density(n), density(n), estimated_density(n)))









    



  n   simul    density  estimated
 25   0.444    0.444    0.444
 50   0.438    0.438    0.438
100   0.435    0.435    0.435
200   0.434    0.434    0.434
400   0.433    0.433    0.433

We got perfect agreement (at least to 3 decimal places), suggesting that either our three implementations are all correct, or we've made mistakes in all three.

The simulate function can also give us insights when we look at the results it produces:



In [27]:

    
simulate(7)









    Out[27]:





[0, 2, 4, 6]

Let's repeat that multiple times, and store the results in a Counter, which tracks how many times it has seen each result:



In [28]:

    
from collections import Counter

Counter(tuple(simulate(7)) for _ in range(10000))









    Out[28]:





Counter({(0, 2, 4, 6): 3303,
         (0, 2, 5): 1255,
         (0, 3, 5): 959,
         (0, 3, 6): 783,
         (1, 3, 5): 1426,
         (1, 3, 6): 974,
         (1, 4, 6): 1300})

That says that about 1/3 of the time, things work out so that the 4 even-numbered houses are occupied. But if anybody ever chooses an odd-numbered house, then we are destined to have 3 houses occupied (in one of 6 different ways, of which (1, 3 5) is the most common, probably because it is the only one that has three chances of getting started with an odd-numbered house).

Verification by Test

Another way to gain more confidence in the code is to run a test suite:



In [29]:

    
def test():
    assert occ(0) == 0
    assert occ(1) == occ(2) == 1
    assert occ(3) == 5/3
    assert density(3) == occ(3) / 3
    assert density(100) == occ(100) / 100
    assert runs(3) == [(0, 1), (0, 0), (1, 0)]
    assert runs(7) == [(0, 5), (0, 4), (1, 3), (2, 2), (3, 1), (4, 0), (5, 0)]
    for n in (3, 7, 10, 20, 100, 101, 200, 201):
        for repeat in range(500):
            assert_valid(simulate(n), n)         
    return 'ok'

def assert_valid(occupied, n):
    """Assert that, in this collection of occupied houses, no house is adjacent to an
    occupied house, and every unoccupied position is adjacent to an occupied house."""
    occupied = set(occupied) # coerce to set
    for house in range(n):
        if house in occupied:
            assert (house - 1) not in occupied and (house + 1) not in occupied
        else:
            assert (house - 1) in occupied or (house + 1) in occupied

test()









    Out[29]:





'ok'

Conclusion

I'm happy with my result:

lim _{n → ∞} density(n) = (1 - e^-2) / 2 ≅ 0.43233235838169365

I got the fun of working on the puzzle, the satisfaction of seeing John Lamping work out a proof, and the awe of seeing the mathematically sophisticated solutions of Jim Ferry and Andrew Mascioli, solutions that I know I could never come up with, but that I could get near to by coming at the problem with a different style of thinking.