Here I've provided a sample sanitized dataset that can be used to develop a general model of memory decay, and performance over time. The primary aim would be to predict the likely accuracy of a test of a given type, for a given user, at a given (future) moment in time. That's important since it allows you to schedule reviews at the optimal time for learning.
A second and in some ways more challenging aim would be to be able to summarize the ability of a given user, the strength of a given memory, or the difficulty of a given item, with a stable and interpretable metric. That, for example, might lead you away from some out-of-the-box machine learning techniques like neural nets and random forests, and instead towards more scientifically constrained and carefully structured models.
This is left as an exercise to the interested reader, but I'll offer some additional pointers:
Which of these or additional constraints your model satisfies, its accuracy, and its generalizability, really all depend on the ultimate purpose. What's the output, who will use it, and for what? Giving me an optimal review time, showing me how well I understand something, and advancing cognitive science are all quite different applications with different requirements for the underlying analysis.
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(style='white', palette='Set2')
path = '/Users/iainharlow/googledrive/GitHub/ds30/'
In [2]:
sample = pd.read_csv('cerego_sample.csv.zip')
sample[:10]
Out[2]:
Let's take a look at the data, which constitute a simplified and sanitized sample from Cerego's database. We have rows of events, which are either a study (the first time a learner encounters an item) or a review of some type (when the learner tests themselves on something they're previously learned). We have five meaningful columns:
After having a go at predicting the results yourself, you might like to compare to a simple benchmark model, below. You can modify this simple code to check your own output in the same format. Be sure to reserve an unseen test sample to evaluate your model (in testing the model below I ran it against ~4 million recent reviews).
In [3]:
'''
Calibration. Plot the actual data against the predicted result.
The benchmark model here was run against a larger sample of around 4 million recent reviews.
The benchmark model here has a floor predicted accuracy of 0.15.
'''
data = pd.read_csv('benchmark.csv.zip')
data['sd'] = np.sqrt(data.result/data.reviews*(1-data.result))
sns.set(style='white', palette='Set2')
fig = plt.figure(figsize=(10,10))
plt.plot(np.arange(1,step=0.01),np.arange(1,step=0.01),lw=1,color='k',linestyle='--')
plt.plot(data.predicted,data.result-1.96*data.sd,lw=3,color='#65C2A5',alpha=0.5,linestyle='-')
plt.plot(data.predicted,data.result+1.96*data.sd,lw=3,color='#65C2A5',alpha=0.5,linestyle='-')
plt.plot(data.predicted,data.result,lw=2,color='#65C2A5')
Out[3]:
In [4]:
'''
We can calculate some quick gut-check accuracy metrics.
'''
cutoff = 0.5
data['correct'] = data.reviews*data.result
TP = data[data.predicted>cutoff].correct.sum()
P = data.correct.sum()
TN = data[data.predicted<cutoff].reviews.sum() - data[data.predicted<cutoff].correct.sum()
N = data.reviews.sum() - data.correct.sum()
sensitivity = TP/P
specificity = TN/N
accuracy = (TP+TN)/(P+N)
print('accuracy:', accuracy)
print('sensitivity:', sensitivity)
print('specificity:', specificity)
In [5]:
'''
Plot a ROC curve and calculate the AUC.
'''
from sklearn import metrics
fig = plt.figure(figsize=(10,10))
rocdata = pd.read_csv('benchmark_fpr_tpr.csv.zip') # contains our false positive and true positive rates
fpr = rocdata.fpr
tpr = rocdata.tpr
auc = metrics.auc(fpr,tpr)
print('AUC:',auc)
plt.plot(np.arange(1,step=0.01),np.arange(1,step=0.01),lw=1,color='k',linestyle='--')
plt.plot(fpr,tpr,lw=2,color='#65C2A5')
Out[5]: