It is well-known that once we learn something, we tend forget some things as time passes.
Murdock (1961) presented subjects with a set of memory items (i.e., words or letters) and asked them to recall the items after six different retention intervals: $t=1,3,6,9,12,18$ (in seconds). He recorded the proportion recalled at each retention interval (based on 100 independent trials for each $t$). These data were (respectively)
$$ y=0.94, 0.77, 0.40, 0.26, 0.24, 0.16 $$Our goal: fit a mathematical model that will predict the proportion recalled $y$ as a function of retention interval ($t$)
In [0]:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([1, 3, 6, 9, 12, 18])
Y = np.array([0.94, 0.77, 0.40, 0.26, 0.24, 0.16])
plt.plot(T, Y, 'o')
plt.xlabel('Retention interval (sec.)')
plt.ylabel('Proportion recalled')
plt.show()
In [0]:
Some things to notice:
Two candidate models:
Which one should we use?
In [0]:
# check power function model
plt.plot(np.log(T), np.log(Y), 'o')
plt.xlabel('$\ln t$')
plt.ylabel('$\ln y$')
plt.show()
In [0]:
In [0]:
# check exponential model
plt.plot(T, np.log(Y), 'o')
plt.xlabel('$t$')
plt.ylabel('$\ln y$')
plt.show()
Both are reasonably linear, but neither is a perfect fit!
At this point, our best bet is to find parameter sets for both models that provide best fit to observed data $y$. We will use maximum likelihood estimation.
First, let's cast our data as the number of items recalled correctly on $n=100$ trials.
In [0]:
X = 100*Y
print(X)
Let's assume each of these 100 trials is independent of the others, and consider each trial a success if item is correctly recalled.
Then the probability of correctly recalling $x$ items is:
$$ f(x\mid\theta) = \binom{100}{x}\theta^x(1-\theta)^{100-x} $$The critical parameter here is $\theta$ -- the probability of success on any one trial. How do we determine $\theta$?
Let's assume that probability of recall is governed by a power function. That is, assume
$$ \theta(t) = at^b $$for constants $a,b$.
Then we can write $$ f(x\mid a,b) = \binom{100}{x}(at^b)^x(1-at^b)^{100-x} $$
which we cast as a likelihood
$$ L(a,b\mid x) = \binom{100}{x}(at^b)^x(1-at^b)^{100-x} $$Note that the formula above is for a single observation $x$. But we have 5 observations!
If we assume each is independent from the others, then we can multiply the likelihoods:
$$ L(a,b\mid x=(x_1,\dots,x_5)) = \prod_{i=1}^5 L(a,b\mid x_i) $$Thus we have
$$ \ln L = \ln\Biggl(\prod_{i=1}^5 L(a,b\mid x_i)\Biggr ) $$But since logs turn products into sums, we can write
$$ \ln L = \sum_{i=1}^5 \ln L(a,b\mid x_i) = \sum_{i=1}^5 \Biggl(\ln \binom{100}{x_i} + x_i\ln(at^b) + (100-x_i)\ln(1-at^b)\Biggr)$$Notes:
In [0]:
def nllP(pars):
a, b = pars
tmp1 = X*np.log(a*T**b)
tmp2 = (100-X)*np.log(1-a*T**b)
return(-1*np.sum(tmp1+tmp2))
In [0]:
# check some examples
a = 0.9
b = -0.4
pars = np.array([a,b])
nllP(pars)
Out[0]:
In [0]:
from scipy.optimize import minimize
a_init = np.random.uniform()
b_init = -np.random.uniform()
inits = np.array([a_init, b_init])
mleP = minimize(nllP,
inits,
method="nelder-mead")
print(mleP)
In [0]:
In [0]:
def power(t,pars):
a, b = pars
return(a*t**b)
fitPars = mleP.x
print(f"a={fitPars[0]:.3f}, b={fitPars[1]:.3f}")
x = np.linspace(0.5,18,100)
plt.plot(T,Y,'o')
plt.plot(x, power(x,fitPars))
plt.show()
In [0]:
In [0]:
In [0]:
Often, the "power law of forgetting" is written as $f(t) = at^{-b}$ (e.g., Wixted, 1990), the purpose of which is to reinforce the idea that forgetting = decay. What does this change do to the likelihood function? Use Python to compute MLEs for $a$ and $b$ given the observed data $y$ above.
Demonstrate (either through computation or a mathematical proof) that we can safely ignore the $\binom{100}{x_i}$ term in the likelihood function.
Rubin and Baddeley (1989) measured the proportion of participants who correctly recalled details from a past colloquium talk as a function of time in years. The data below are approximately equal to what they orignally found:
| time (years) | proportion recall |
|---|---|
| 0.05 | 0.38 |
| 0.25 | 0.26 |
| 0.30 | 0.22 |
| 0.60 | 0.20 |
| 0.95 | 0.11 |
| 1.3 | 0.07 |
| 1.4 | 0.16 |
| 1.6 | 0.10 |
| 1.8 | 0.08 |
| 2.5 | 0.05 |
| 2.7 | 0.01 |
For simplicity, assume there were 100 participants. Construct a reasonable model of forgetting for this data, and estimate its parameters.