In traditional list-learning free recall experiments, remembering is often cast as a binary operation: either an item is recalled or it isn't. This allows for a straight forward matching between the presented and recalled stimuli. However, characterizing and evaluating memory in more realistic contexts (e.g., telling a story to a friend about a recent vacation) is much more nuanced. Real-world recall is continuous, rather than binary. Further, the specific words used to describe an experience may vary considerably across participants. To handle this new data regime, we extended classic methods developed for free-recall list-learning experiments to accomodate naturalistic stimuli. Specifically, we provide a more flexible 'matching function', which quantifies the similarity between stimuli and verbal responses in a continuous manner.
In the tutorial below, we will describe our new analysis approach and demonstrate how to perform the analyses using quail
. To get started, let's load in the example data:
In [1]:
import quail
import numpy as np
import seaborn as sns
from scipy.spatial.distance import cdist
%matplotlib inline
egg = quail.load_example_data(dataset='naturalistic')
The example data used in this tutorial is based on an open dataset from Chen et al., 2017, in which 17 participants viewed and then verbally recounted an episode of the BBC series Sherlock. We fit a topic model to hand-annotated text descriptions of the episode and used the model to transform the video annotations and the recall transcriptions for each subject. We then used a Hidden Markov Model to segment the video and recall models into an (optimal) number of events. The result was a matrix of topic vectors representing the "events" in the video and list of matrices of topic vectors representing participant's recall "events". We created an egg
from these vector representations of the stimulus and verbal recall, where the topic vectors were passed to quail as a stimulus features. Let's take a closer look at the egg:
In [2]:
egg.info()
Here, the egg's pres
field consists of 34 stimulus events (the number of video segments determined by our HMM). Each stimulus event is represented by a dictionary containing the label of the video segment (item
) and a topic vector representing that event (topics
).
In [3]:
# The label of each stimulus event...
egg.get_pres_items().head()
Out[3]:
In [4]:
# ...and their corresponding topic vectors
egg.get_pres_features().head()
Out[4]:
In [5]:
# a closer look at one of the dictionaries
egg.get_pres_features()[0][0][0]
Out[5]:
As you can see above, the dictionary contains a features
key, which holds a 100D topic vector representing a stimulus event and also a temporal
key, which describes the serial position of the stimulus. The rec
field contains the recall events for each subject, similarly represented by a label ('item'
) and topic vectors it comprises ('topics'
).
In [6]:
# The temporal position of each recall event...
egg.get_rec_items().head()
Out[6]:
In [7]:
# ...and their corresponding topic vectors
egg.get_rec_features().head()
Out[7]:
As summarized above, quail
supports the analysis of naturalistic stimuli by providing a more flexible way to match presented stimuli and recall responses. The matching function can be set using the match
keyword argument in egg.analyze
. There are three options: 'exact'
, 'best'
, and 'smooth'
. If match='exact'
, the recall item must be identical to the stimulus to constitute a recall. This is the traditional approach for free recall experiments (either a subject accurately recalled the stimulus item, or did not) but it is not particularly useful with naturalistic data. For the naturalistic options, quail computes a similarity matrix comparing every recall event to every stimulus event. If match='best'
, the recall response that is most similar to a given presented stimulus is labeled as recalled. If match='smooth'
, a weighted-average over recall responses is computed for each presented stimulus, where the weights are derived from the similarity between the stimulus and the recall event. To illustrate this further, let's step through the analysis. First, let's create a matrix representing the presented stimulus where each row is an 'event' and each column is a topic dimension:
In [8]:
pres_mtx = np.array([x['topics'] for x in egg.get_pres_features('topics').iloc[0, :].values])
sns.heatmap(pres_mtx, vmin=0, vmax=1)
Out[8]:
We'll also create a matrix representing the recalled events for a single subject:
In [9]:
rec_mtx = np.array([x['topics'] for x in egg.get_rec_features('topics').iloc[12, :].values if len(x)])
sns.heatmap(rec_mtx, vmin=0, vmax=1)
Out[9]:
To measure similarity between the pres_mtx
and rec_mtx
along this feature dimension, we can use the cdist
function from scipy
. In this example, we will use correlational distance to measure similarity between each presented event and each recalled event:
In [10]:
match_mtx = 1 - cdist(pres_mtx, rec_mtx, 'correlation')
sns.heatmap(match_mtx, vmin=0, vmax=1)
Out[10]:
This matrix quantifies the match between each presented stimulus and each recalled stimulus. The light stripe along the diagonal suggests that this particular subject remembered most of the events in order, since the highest correlation values are roughly along the diagonal.
If match='best'
, each recall event is mapped to the single stimulus event with the most similar feature vector:
In [11]:
np.argmax(match_mtx, 0)
Out[11]:
Note that once the data is distilled into this form, many of the classic list-learning analyses (such as probability of first recall, serial position curve, and lag-conditional response probability curve) can be performed. To do this using quail
, simply set match='best'
, choose a distance function (euclidean by default) and select the features that you would like to use (e.g. features=['topics']
).
In [12]:
spc = egg.analyze(analysis='spc', match='best', distance='correlation', features=['topics'])
spc.get_data().head()
Out[12]:
Each stimulus event is assigned a binary value for each recall event – it either was matched or it was not. To plot it:
In [13]:
spc.plot()
Out[13]:
if match='smooth'
, quail computes a weighted average across all stimulus events for each recall event, where the weights are derived from similarity between the stimulus and recall.
In [14]:
spc = egg.analyze(analysis='spc', match='smooth', distance='correlation', features=['topics'])
spc.data.head()
Out[14]:
In [15]:
spc.plot()
Out[15]:
The distance
argument assigns the distance function quail will use to compute similarity between stimulus and recall events. We support any distance metric included in scipy.spatial.distance.cdist:
In [16]:
spc = egg.analyze(analysis='spc', match='smooth', distance='cosine', features=['topics'])
spc.plot()
Out[16]:
The features
argument tells quail which features to consider when computing distance. This can be a single feature passed as a string, multiple features passed as a list, or all available features (features=None
; default).