This notebook prepares for the first level analysis: setting up and examining a General Linear Model that produces response amplitudes for each stimulus class. For this, the data must be preprocessed. We will use GLMdenoise to do this, which is a MATLAB toolbox, and so the actual analysis will be run outside of this notebook (it takes a while and a lot of resources anyway, so this is for the best).

Creating the design matrix

The first thing we need to do is create our design matrix. Our design matrix needs to be in the format time by conditions (where time is in TRs), with a 1 for condition onset. This will be exceedingly sparse, since each condition only shows up once per run (when we move this into matlab, we will make it a sparse matrix). We will have one of these design matrices per run.



In [8]:

    
import sys
sys.path.append('..')
import sfp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import nibabel as nib
import os
%matplotlib inline
from bids.layout import BIDSLayout



In [26]:

    
layout = BIDSLayout("/mnt/Acadia/Projects/spatial_frequency_preferences/BIDS/", validate=False)
# This contains all the design information
tsv_df = pd.read_csv(layout.get(suffix='events', subject='wlsubj001', session='01' ,run=1)[0].path, '\t', na_filter=False)

nii_file = layout.get(suffix='bold', subject='wlsubj001', session='01', run=1)[0].path
# We need to load in the nifti file so we can determine how many TRs there were
nii = nib.load(nii_file)

We grab the number of TRs from our nifti file.



In [27]:

    
n_TRs = nii.shape[3]
print("Num TRs: %d" % n_TRs)









    



Num TRs: 240



In [28]:

    
tsv_df.head()









    Out[28]:







  
    
      
      onset
      duration
      trial_type
      stim_file
      stim_file_index
      note
    
  
  
    
      0
      0.013
      0.3
      48
      unshuffled.npy
      390
      blank trial
    
    
      1
      0.502
      0.3
      48
      unshuffled.npy
      391
      blank trial
    
    
      2
      1.002
      0.3
      48
      unshuffled.npy
      386
      blank trial
    
    
      3
      1.514
      0.3
      48
      unshuffled.npy
      385
      blank trial
    
    
      4
      2.014
      0.3
      48
      unshuffled.npy
      387
      blank trial

We also have the onset times of all of our stimuli, as well as an identifying index we can use to look up its information. We constructed our stimuli in classes, so that each class contains stimuli with all the same frequency parameters (w_r / w_a or w_x / w_y) and different phases. When constructing the events tsv file, we also saved an index for the stimulus class, which we call trial_type.

For constructing the design matrix, we don't care about the individual stimuli that only differ in their phase: our events are the presentations of new stimulus classes / trial_type, so we drop all the other stimuli.



In [29]:

    
design_df = tsv_df.drop_duplicates('trial_type', keep='first')
# Those with a non-empty note field are blank trials. We're not modeling them here, so we drop them
design_df = design_df[design_df.note=="n/a"]

Now we need to convert these to to TR times. First we find the onset times of TRs, in seconds, relative to the first TR. We then create a giant matrix where each row is a different stimulus onset time and then, in each column, subtract a TR onset time (so this matrix will be num_conditions x num_TRs). If we then round this difference-in-time matrix and look for the 0s, we've found what TR onset is closest to the onset of the stimuli. Note that this won't make sense for a lot of entries; some of them start almost exactly halfway through a TR. But, because of how we defined our experiment, our class transitions should happen right around a TR onset (if the timings of the scanner and the stimulus computer were perfect, then they would be exactly the same; as it is they probably differ by a few milliseconds) and so this will work



In [31]:

    
TR = layout.get_metadata(nii_file)['RepetitionTime']
TR_times = [TR * i for i in range(n_TRs)]



In [32]:

    
stim_times = design_df['onset'].values
stim_times = np.expand_dims(stim_times,1)
stim_times = np.repeat(stim_times, n_TRs, 1)



In [33]:

    
time_from_TR = np.round(stim_times - TR_times)
design_df['Onset time (TR)'] = np.where(time_from_TR==0)[1]

And we create our design matrix, iterate through throw our design_df and put a one where each class shows up in a TR.



In [34]:

    
# because the values are 0-indexed
design_matrix = np.zeros((n_TRs, design_df.trial_type.max()+1))

for i, row in design_df.iterrows():
    row = row[['Onset time (TR)', 'trial_type']].astype(int)
    design_matrix[row['Onset time (TR)'], row['trial_type']] = 1

To make sure things work correctly, we look at our axis sums: each class (axis 1) should show up exactly once and each TR (axis 0) should have 0 or 1 classes in it



In [35]:

    
print("Each entry represents one of our %d classes:" % design_matrix.shape[1])
print(design_matrix.sum(0))

print("Each entry represents one of our %d TRs:" % design_matrix.shape[0])
print(design_matrix.sum(1))









    



Each entry represents one of our 48 classes:
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Each entry represents one of our 240 TRs:
[0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0.
 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0.
 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0.
 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0.
 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.
 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0.
 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

And now we can look at our design matrix for this run!



In [36]:

    
sfp.design_matrices.plot_design_matrix(design_matrix, 'Design matrix for run 1')

The function sfp.design_matrices.create_all_design_matrices creates the design matrices for multiple runs and saves them as .tsv files so they can be read into matlab.

In order to create them efficiently, run sfp/design_matrices.py from the command line (see its help string for details as to how).

Actually running the first-level analysis requires matlab and should be run on the cluster, since they require Kendrick Kay's GLMdenoise package and use a lot of memory. After you've finished getting the results, examined the $R^2$ values to make sure they make sense, and arranged them into a pandas DataFrame, then you're ready for the next notebook, where we analyze these results.

	onset	duration	trial_type	stim_file	stim_file_index	note
0	0.013	0.3	48	unshuffled.npy	390	blank trial
1	0.502	0.3	48	unshuffled.npy	391	blank trial
2	1.002	0.3	48	unshuffled.npy	386	blank trial
3	1.514	0.3	48	unshuffled.npy	385	blank trial
4	2.014	0.3	48	unshuffled.npy	387	blank trial