Jimmy noted last week that he and Noah got slotted into the same pod for daily check-ins three days in a row. That made me wonder how likely that was, and I thought it'd be a good stats problem.
I tried tackling this using an analytical/combinatorial solution and didn't get to a reasonable answer (even with help from Amelia and Lois). So I decided to Monte Carlo it and see what the result was.
Question: what is the probability of *any* two people being in the same pod check-in at least three days in a row?
My (very naive) prediction: I wasn't sure at all how likely this situation would be, other than the fact that it happened once (with what Katie and Jen assured us were random selections. So I'll guess that the odds are ~50%.
In [38]:
import numpy as np
In [39]:
# Parameters for this problem
N = 1000 # Number of trials to run
n_people = 15 # Number of individual people to schedule
n_groups = 5 # Number of groups per day
n_days = 5 # Number of days with scheduled groups
n_inarow = 3 # Number of days in a row in which two people the same pod (target)
In [40]:
# Set up the initial schedule
people = range(n_people)
modval = n_people//n_groups
sched = np.mgrid[0:n_days,0:n_people][1,:,:]
In [41]:
# A couple of useful print functions for later on
def breakline():
print "------------------------------------------"
def print_example(s):
breakline()
header = ["Day {:d}\t".format(x+1) for x in range(n_days)]
print ' '.join(header)
nrows = s.shape[0]
for idx,row in enumerate(s.T):
if not idx % modval:
breakline()
templine = "{}\t"*nrows
print templine.format(*row)
breakline()
In [42]:
# The actual Monte Carlo loop
def run_trials(N,verbose=False):
foundone = 0
successful_example,params = None,None
for i in range(N):
# Randomly shuffle the groups in place for each day
map(np.random.shuffle,sched)
# Set if successful match was found in this trial
daysinarow = False
# Keep looking through possible pairs until one is found
keep_looking = True
while keep_looking:
for person1 in people:
# Groups for Person #1
groups_p1 = [list(row).index(person1)//modval for row in sched]
for person2 in people:
# Can't compare to oneself
if person1 != person2:
# Groups for Person #2
groups_p2 = [list(row).index(person2)//modval for row in sched]
# Look over each sliding window of N days for a match
for j in range(n_days - n_inarow + 1):
sumarr = [x-y for x,y in zip(groups_p1,groups_p2)]
# Check if conditions match
if all(s is 0 for s in sumarr[j:j+n_inarow]):
daysinarow = True
successful_example = sched[::]
params = (person1,person2,j+1,j+n_inarow)
if verbose:
print "\nPersons {} and {} on Days {}-{}".format(
person1,person2,j+1,j+n_inarow)
print_example(sched)
# Found a match; can stop looking in this trial
keep_looking = False
# No pairs on consecutive days found in this trial; exit the loop.
keep_looking = False
# Mark that this trial was successful
if daysinarow:
foundone += 1
return foundone,successful_example,params
In [43]:
# Run trials and report result
successes,example,params = run_trials(N,verbose=False)
if example is not None:
print "Example of a successful trial:"
print "\nPersons {} and {} are on the same pod on Days {}-{}".format(*params)
print_example(example)
print "\n{:.1f}% of the time, two people are \
in the same group at least {} days in a row.\n".format(
successes /float(N)*100.,n_inarow)
That is shockingly close to my naive prediction. I'd still really love to get a proper mathematical justification of this result, but it's very much in line with the observed sample size of 1 success in 1 trial.
In [ ]: