Randomization

In the previous chapter, we saw how randomization eliminates selection bias. Let's explain what we mean by randomization, describe several ways we might want to randomly assign treatments, and discuss the components other than the assignment that can be randomized.

Randomization refers to using "a known, well-understood probabilistic scheme" to assign treatments to units (Oehlert, 2010). Randomization "ensures that assignment to the treatment group is statistically independent of all observed or unobserved variables" (Gerber and Green, 2012).

Simple Random Assignment

With simple random assignment, every unit has the same probability of being assigned to a particular treatment group. The probability can be anything greater than zero and less than one. This will approximately determine the number of units in each group. For example, assuming a single treatment group and a single control group, if the probability is 0.75, about 75% will be assigned to the treatment group.

Let's imagine we have 10 units to which we assign a treatment with 0.5 probability. Will our groups be balanced? That is, will we have 5 units in the treatment group and 5 units in the control group? Let's find out.


In [1]:
import numpy as np

n, p = 10, 0.5
np.random.binomial(n, p)


Out[1]:
3

This counts the number of successes—think of "success" as being assigned to the treatment group—in 10 independent trials, where success occurs 50% of the time.

Each time you run the cell above, you'll get a different result—it's not always 5! This is a drawback of simple random assignment.

[Y]ou could flip a coin to assign each of 10 [units] to the treatment condition, but there is only a 24.6% chance of ending up with exactly 5 [units] in treatment and 5 in control (Gerber and Green, 2012)

So that others may reproduce our assignments, we can use a random seed. This is highly recommended, though, in practice, we won't use np.random.binomial(). (Note: I'll always use 42 as the seed.)


In [2]:
np.random.seed(42)
np.random.binomial(n, p)


Out[2]:
4

Complete Random Assignment

If, instead, we'd like to assign exactly $m$ of $N$ units to the treatment group, we can use complete random assignment. Here, as before, each unit has an identical probability of being assigned to the treatment group. Gerber and Green describe three ways to implement complete random assignment:

  • randomly select units until there are $m$ of them in the treatment group
  • enumerate all of the possible ways to select $m$ of $N$ units and randomly select one of those allocations
  • randomly order the $N$ units and select the first $m$

Let's show examples for the second and third approaches.

Enumerate

There are

$$\frac{n!}{r!(n - r)!} = \frac{10!}{5!5!} = 252$$

possible ways to select 5 of 10 units.

We can enumerate these combinations using the itertools module.


In [3]:
from math import factorial

possible_combinations = factorial(10) / (factorial(5) * factorial(10 - 5))

In [4]:
import random
from itertools import combinations

# enumerate the possible ways to select m of N units
enumerated = list(combinations(range(10), 5))

# randomly select one of those allocations
random.seed(42)
select = random.randint(0, possible_combinations-1)
treatment = enumerated[select]
print(list(treatment))


[1, 3, 4, 5, 8]

Here, using the seed of 42, units 1, 3, 4, 5, and 8 get assigned to the treatment group.

Randomly Order


In [5]:
units = list(range(10))
random.seed(42)
random.shuffle(units)
units[:5]


Out[5]:
[7, 3, 2, 8, 5]

Here, using the same seed, units 7, 3, 2, 8, and 5 get assigned to the treatment group.

With either simple or complete random assignment, units are assigned to treatment with equal probability. Thus, the treatment and control groups are random subsets of all units in the sample. This means that treatment status is statistically independent of potential outcomes (Gerber and Green, 2012). In addition, any features that might be associated with the outcome of interest are approximately equally distributed between the groups. In other words, the treatment assignments do not affect the potential outcomes.

Randomizing Other Things

In addition to randomizing treatment assignment, many other components of an experiment can be randomized. This is study-dependent, but we provide some examples below.

  • if responses are measured using multiple instruments, randomize the instruments to be used to measure individual units
  • randomize the order in which treatments are applied
  • randomize the order in which units are measured
  • if responses are survey-based and the order of the questions do not matter (i.e., one question does not depend on another), randomize the order of questions

In effect, we want to design (e.g., by using blocking, which will be described in a subsequent chapter) for any factors that might cause a change in the response and randomize everything else (Oehlert, 2010).


In [ ]: