You are nursing a red knot, Paulina, back to health after finding it injured in your back yard. Normally, the knot is a migratory bird, travelling back and forth from its breeding grounds in the Arctic to Tierra del Fuego down in South America. During her stay with you, you notice that she spends entire days either in front of the air conditioner or next to the warmer window. A data scientist at heart, you begin logging her chosen resting place every day until her departure. These are your notes:
| Day | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Location | A | A | A | W | W | W | W | A | A | A | W | W |
After freeing Paulina the peep back to the wilds, you feel an emptiness in your heart, so you turn to modeling to fill the void. You're interested in modeling the mini-migratory patterns you observed in your knot. You start with the following assumptions:
We should always evaluate our assumptions for plausibility. Which of these assumptions seem reasonable based on your observations? Which seem a little strong?
SOLUTION:
Empirically, Paulina indeed did seem to limit herself to moving at most once per day.
The other two are unknowable given our data!
Let's just roll with the assumptions above for now. They will lead to a simpler model. We have to start somewhere! With the assumptions above, our understanding of Paulina is reduced to four possible events: what are they?
SOLUTION:
On any given day:
Which assumption specifically allowed you to say this?
What events do we NOT have to worry about because of these assumptions?
Now we write down a probability statement for these events:
Paulina stays or moves with probabilities $p_{AA}, p_{WW}, p_{AW}, \text{and} \ p_{WA}$ each day where $p_{ij}$ is the probability of moving from location $i$ to location $j$ given Paulina was at $i$ the day before.
What restrictions do our assumptions place on the probabilities?
They are not changing with each passing day. If Paulina is at $A$, she can only stay at $A$ or move to $W$, so $p_{AA} + p_{AW}=1$. A similar argument holds for $W$.
A natural thing to want to do at this point is estimate our parameters $p_{AA}, p_{WW}, p_{AW}, p_{WA}$. We turn back to our friend, maximum likelihood estimation. This will require us to write down the probability of witnessing our data (given the model):
$\mathbb{P}($A on Day 1 and A on Day 2 $...$ and B on Day 11 and B on Day 12$)$
So we write down the product:
$\mathbb{P}(\text{A on Day 1}) \times \mathbb{P}(\text{A on Day 2}) \times ... \times \mathbb{P}(\text{B on Day 11}) \times \mathbb{P}(\text{B on Day 12})$
Wait, what? That's not right. What does the second statement assume?
SOLUTION: That the events are independent!
What can we legitimately write down?
Our assumption that Paulina only makes decisions based on where she was the day before allows us to further simplify.
Furthermore, the assumption that the decision mechanism doesn't change with time lets us drop the time index from most of this statement!
The only problem that remains is Day 1 since it doesn't have a "yesterday". For now, we'll assume it away: let $\mathbb{P}(\text{A on Day 1})=1$ since we imagine a world where we have complete control over where we place Paulina on Day 1 OR alternatively, where Paulina goes on Day 1 doesn't determine how she makes the rest of her decisions.
Ok, now we can finally put our parameters into this probability model. Rewrite the above with $p_{AA}, p_{WW}, p_{AW}, p_{WA}$.
Whew, now we have a probability statement involving our parameters of interest (if our model is meaningful). The maximum likelihood strategy will have us looking at this probability as a likelihood function. Taking the log:
$$l(p_{AA}, p_{WW}) = 4log(p_{AA}) + 2log(1-p_{AA}) + 4log(p_{WW}) + log(1-p_{WW})$$Talk with the people around you to find a way to maximize the log-likelihood. You should get $\hat{p}_{AA} = \frac{2}{3}$ and $\hat{p}_{WW} = \frac{4}{5}$
The takeaway message from today's discussion shouldn't only be the maximum likelihood calculation at the very end---though this is the first time we have seen that MLE works on dependent events as well! We want to emphasize a human-decision portion of data science: notice that the choice of model was ours ... along with the assumptions that made it work. You have the freedom to be creative with your probability models: MLE is merely a guiding principle to fit your model to the real world. With that said, what would be a reasonable thing to now that we have a fitted model?
SOLUTION: Validate the model! Test any assumption that can be empirically tested! Decide whether or not you can sleep at night with the other assumptions.