This tutorial will go over the basics of the Egg
data object, the essential quail
data structure that contains all the data you need to run analyses and plot the results. An egg is made up of two primary pieces of data:
pres
data - stimuli/features that were presented to a subject
rec
data - stimuli/features that were recalled by the subject.
You cannot create an egg
without both of these components. Additionally, there are a few optional fields:
dist_funcs
dictionary - this field allows you to control the distance functions for each of the stimulus features. For more on this, see the fingerprint tutorial.
meta
dictionary - this is an optional field that allows you to store custom meta data about the dataset, such as the date collected, experiment version etc.
There are also a few other fields and functions to make organizing and modifying eggs
easier (discussed at the bottom). Now, lets dive in and create an egg
from scratch.
In [ ]:
import quail
%matplotlib inline
pres
data structureThe first piece of an egg
is the pres
data, or in other words the stimuli that were presented to the subject. For a single subject's data, the form of the input will be a list of lists, where each list is comprised of the words presented to the subject during a particular study block. Let's create a fake dataset of one subject who saw two encoding lists:
In [ ]:
presented_words = [['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']]
In [ ]:
recalled_words = [['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']]
We now have the two components necessary to build an egg
, so let's do that and then take a look at the result.
In [ ]:
egg = quail.Egg(pres=presented_words, rec=recalled_words)
That's it! We've created our first egg
. Let's take a closer look at how the egg
is setup. We can use the info
method to get a quick snapshot of the egg
:
In [ ]:
egg.info()
Now, let's take a closer look at how the egg
is structured. First, we will check out the pres
field:
In [ ]:
egg.get_pres_items()
As you can see above, the pres
field was turned into a multi-index Pandas DataFrame organized by subject and by list. This is how the pres
data is stored within an egg, which will make more sense when we consider larger datasets with more subjects. Next, let's take a look at the rec
data:
In [ ]:
egg.get_rec_items()
The rec
data is also stored as a DataFrame. Notice that if the number of recalled words is shorter than the number of presented words, those columns are filled with a NaN
value. Now, let's create an egg
with two subject's data and take a look at the result.
In [ ]:
# presented words
sub1_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']]
sub2_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']]
# recalled words
sub1_recalled=[['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']]
sub2_recalled=[['cat', 'goat', 'bat', 'hat'],['horse', 'zebra', 'zoo', 'animal']]
# combine subject data
presented_words = [sub1_presented, sub2_presented]
recalled_words = [sub1_recalled, sub2_recalled]
# create Egg
multisubject_egg = quail.Egg(pres=presented_words, rec=recalled_words)
multisubject_egg.info()
As you can see above, in order to create an egg
with more than one subject's data, all you do is create a list of subjects. Let's see how the pres
data is organized in the egg with more than one subject:
In [ ]:
multisubject_egg.get_pres_items()
Looks identical to the single subject data, but now we have two unique subject identifiers in the DataFrame
. The rec
data is set up in the same way:
In [ ]:
multisubject_egg.get_rec_items()
As you add more subjects, they are simply appended to the bottom of the df with a unique subject identifier.
In [ ]:
cat_features = {
'item': 'cat',
'category': 'animal',
'word_length': 3,
'starting_letter': 'c',
}
Let's try creating an egg with additional stimulus features:
In [ ]:
# presentation features
presented_words = [
[
{
'item': 'cat',
'category': 'animal',
'word_length': 3,
'starting_letter': 'c'
},
{
'item': ' bat',
'category': 'object',
'word_length': 3,
'starting_letter': 'b'
},
{
'item': 'hat',
'category': 'object',
'word_length': 3,
'starting_letter': 'h'
},
{
'item': 'goat',
'category': 'animal',
'word_length': 4,
'starting_letter': 'g'
},
],
[
{
'item': 'zoo',
'category': 'place',
'word_length': 3,
'starting_letter': 'z'
},
{
'item': 'donkey',
'category' : 'animal',
'word_length' : 6,
'starting_letter' : 'd'
},
{
'item': 'zebra',
'category': 'animal',
'word_length': 5,
'starting_letter': 'z'
},
{
'item': 'horse',
'category': 'animal',
'word_length': 5,
'starting_letter': 'h'
},
],
]
recalled_words = [
[
{
'item': ' bat',
'category': 'object',
'word_length': 3,
'starting_letter': 'b'
},
{
'item': 'cat',
'category': 'animal',
'word_length': 3,
'starting_letter': 'c'
},
{
'item': 'goat',
'category': 'animal',
'word_length': 4,
'starting_letter': 'g'
},
{
'item': 'hat',
'category': 'object',
'word_length': 3,
'starting_letter': 'h'
},
],
[
{
'item': 'donkey',
'category' : 'animal',
'word_length' : 6,
'starting_letter' : 'd'
},
{
'item': 'horse',
'category': 'animal',
'word_length': 5,
'starting_letter': 'h'
},
{
'item': 'zoo',
'category': 'place',
'word_length': 3,
'starting_letter': 'z'
},
],
]
# create egg object
egg = quail.Egg(pres=presented_words, rec=recalled_words)
Like before, you can use the get_pres_items
method to retrieve the presented items:
In [ ]:
egg.get_pres_items()
The stimulus features can be accessed by calling the get_pres_features
method:
In [ ]:
egg.get_pres_features()
As described in the fingerprint tutorial, the features
data structure is used to estimate how subjects cluster their recall responses with respect to the features of the encoded stimuli. Briefly, these estimates are derived by computing the similarity of neighboring recall words along each feature dimension. For example, if you recall "dog", and then the next word you recall is "cat", your clustering by category score would increase because the two recalled words are in the same category. Similarly, if after you recall "cat" you recall the word "can", your clustering by starting letter score would increase, since both words share the first letter "c". This logic can be extended to any number of feature dimensions.
Similarity between stimuli can be computed in a number of ways. By default, the distance function for all textual features (like category, starting letter) is binary. In other words, if the words are in the same category (cat, dog), there similarity would be 1, whereas if they are in different categories (cat, can) their similarity would be 0. For numerical features (such as word length), by default similarity between words is computed using Euclidean distance. However, the point of this digression is that you can define your own distance functions by passing a dist_func
dictionary to the Egg
class. This could be for all feature dimensions, or only a subset. Let's see an example:
In [ ]:
dist_funcs = {
'word_length' : lambda x,y: (x-y)**2
}
egg = quail.Egg(pres=presented_words, rec=recalled_words, dist_funcs=dist_funcs)
In the example code above, similarity between words for the word_length feature dimension will now be computed using this custom distance function, while all other feature dimensions will be set to the default.
Lastly, we can add meta data to the egg
. We added this field to help researchers keep their eggs organized by adding custom meta data to the egg
object. The data is added to the egg
by passing the meta
key word argument when creating the egg
:
In [ ]:
meta = {
'Researcher' : 'Andy Heusser',
'Study' : 'Egg Tutorial'
}
egg = quail.Egg(pres=presented_words, rec=recalled_words, meta=meta)
egg.info()
listgroup
and subjgroup
to an egg
While the listgroup
and subjgroup
arguments can be used within the analyze
function, they can also be attached directly to the egg
, allowing you to save condition labels for easy organization and easy data sharing.
To do this, simply pass one or both of the arguments when creating the egg
:
In [ ]:
# presented words
sub1_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']]
sub2_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']]
# recalled words
sub1_recalled=[['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']]
sub2_recalled=[['cat', 'goat', 'bat', 'hat'],['horse', 'zebra', 'zoo', 'animal']]
# combine subject data
presented_words = [sub1_presented, sub2_presented]
recalled_words = [sub1_recalled, sub2_recalled]
# create Egg
multisubject_egg = quail.Egg(pres=presented_words,rec=recalled_words, subjgroup=['condition1', 'condition2'],
listgroup=['early','late'])
egg
Once you have created your egg, you can save it for use later, or to share with colleagues. To do this, simply call the save
method with a filepath:
multisubject_egg.save('myegg')
To load this egg later, simply call the load_egg
function with the path of the egg:
egg = quail.load('myegg')
In [ ]:
# subject 1 data
sub1_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']]
sub1_recalled=[['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']]
# create subject 2 egg
subject1_egg = quail.Egg(pres=sub1_presented, rec=sub1_recalled)
# subject 2 data
sub2_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']]
sub2_recalled=[['cat', 'goat', 'bat', 'hat'],['horse', 'zebra', 'zoo', 'animal']]
# create subject 2 egg
subject2_egg = quail.Egg(pres=sub2_presented, rec=sub2_recalled)
In [ ]:
stacked_eggs = quail.stack_eggs([subject1_egg, subject2_egg])
stacked_eggs.get_pres_items()
In [ ]:
cracked_egg = quail.crack_egg(stacked_eggs, subjects=[1], lists=[0])
cracked_egg.get_pres_items()
Alternatively, you can use the crack
method, which does the same thing:
In [ ]:
stacked_eggs.crack(subjects=[0,1], lists=[1]).get_pres_items()