Coupon Collector's Problem

Let us say that we have an infinte supply of balls of 10 colors in an urn. The question is how many times on average should we draw from the urn before we have drawn all 10 colors. This problem is also known as the Coupon Collector's problem. The issue there being, how long it will take for a coupon collector to collect the complete set.

The question came up when I was recently working on how many gas molecules should impinge on a surface before all the vacant sites are to be filled up under the assumption that the sticking coeffecient is 1 if the site is vacant and 0 otherwise, and the molecules reach the surface one after the other.


In [1]:
import numpy as np

In [2]:
def number_of_draws_for_all_colors(ncolors = 10):
    cnt = 0
    colors_drawn_so_far = []
    while len(set(colors_drawn_so_far)) != ncolors:
        colors_drawn_so_far.append(np.random.randint(ncolors, size=1)[0])
        cnt += 1

    return cnt

In [5]:
for ncolors in [10, 100, 1000]:
    ndraws = []
    for itrial in range(1000):
        ndraws.append(number_of_draws_for_all_colors(ncolors = ncolors))
    print 'ncolors: %6.6i; avg_draws_to_all_colors: %4.2f' % (ncolors, np.mean(ndraws)/ncolors)


ncolors: 000010; avg_draws_to_all_colors: 2.92
ncolors: 000100; avg_draws_to_all_colors: 5.18
ncolors: 001000; avg_draws_to_all_colors: 7.50

And the answer seems to be that it gets harder and harder as the number of the colored balls in the urn increases. The exact answer for the problem is $nH_{n}$, where H is the Harmonic number.

The code above takes forever to finish for ncolors >= 1E4. And I guess the code can be optimized further.