BART trial

This is some initial exploration and analysis related to the Bart Trials. I hope the comments make sense :). Since this is ipython I've intermixed bash code with the python code, I hope this is easy to follow.


In [1]:
from __future__ import absolute_import, division, print_function
import numpy as np
import numpy.linalg as npl
import matplotlib.pyplot as plt
import nibabel as nib
import pandas as pd # new
import os # new
# the last one is a major thing for ipython notebook, don't include in regular python code
%matplotlib inline

Quickly (some rational for additions):

  • pandas: is good because it has Data frame structures similar to R data.frames (I've already make some CSV files using this library)
  • os: is good for file location (instead of trying to use $\texttt{bash}$ into $\texttt{ipython}$)
    • $\texttt{os.chdir}$ $\Leftrightarrow$ $\texttt{cd}$ in $\texttt{bash}$
    • $\texttt{os.listdir}$ $\Leftrightarrow$ $\texttt{ls}$ -> usually I do $\texttt{np.array(os.listdir(...))}$ if the directory is large

Layout of File System, also exploring $\texttt{os}$ library

Below I've provided where my data files are currently located. You may observe that the numbering of the subjects is missing some numbers.


In [2]:
# setting locations of elements, make sure to change this: (smart idea in general, when dealing with a file system_
location_of_data="/Users/BenjaminLeRoy/Desktop/1.Fall2015/Stat 159/project/data/ds009/" 
location_of_subject001=os.path.join(location_of_data,"sub001/")
location_of_simuli="/Users/BenjaminLeRoy/Desktop/test/4d_fmri/"
location_of_present_3d="/Users/BenjaminLeRoy/Desktop/1.Fall2015/Stat 159/project/python_code"
location_of_processed_data="/Users/BenjaminLeRoy/Desktop/1.Fall2015/Stat 159/project/processed_data/"

Folders in the large Data Directory (ds009)


In [3]:
os.chdir(location_of_data)
np.array(os.listdir(location_of_data))
# some subject numbers don't exist (probably removed due to errors mentioned in paper)


Out[3]:
array(['.DS_Store', 'sub001', 'sub002', 'sub003', 'sub004', 'sub005',
       'sub006', 'sub007', 'sub009', 'sub010', 'sub011', 'sub012',
       'sub013', 'sub014', 'sub016', 'sub017', 'sub018', 'sub020',
       'sub021', 'sub023', 'sub024', 'sub025', 'sub026', 'sub028', 'sub029'], 
      dtype='|S9')
  • Ignore the '.DS_Store', we can take it out if we ever want to deal with this as a list (see "Creating Data Frames")

Examining 1 subject's data

These is the folders in the sub001 folder:


In [4]:
os.chdir(location_of_subject001)
np.array(os.listdir(location_of_subject001))


Out[4]:
array(['.DS_Store', 'anatomy', 'behav', 'BOLD', 'model'], 
      dtype='|S9')

File Structure for Individual Subject

I've tried to help you visualize the data by using the tree function (try in your own terminal), you need your directory to be "ds009/sub001"


In [5]:
#Run is in terminal after entering sub001 folder:
#  1)
#  tree BOLD/task001_run001
#  2)
#  tree anatomy/
#  3)
#  tree behav/task001_run001
#  4)
#  tree model/model001/onsets/task001_run001
#  5)
#  tree model/model002/onsets/task001_run001
# 1) BOLD/task001_run001 ├── QA │   ├── DVARS.png │   ├── QA_report.pdf │   ├── confound.txt │   ├── dvars.txt │   ├── fd.png │   ├── fd.txt │   ├── mad.png │   ├── maskmean.png │   ├── meanpsd.png │   ├── qadata.csv │   ├── spike.png │   ├── voxcv.png │   ├── voxmean.png │   └── voxsfnr.png ├── bold.nii └── bold.nii.gz # 2) anatomy ├── highres001.nii.gz ├── highres001_brain.nii.gz ├── highres001_brain_mask.nii.gz ├── inplane001.nii.gz ├── inplane001_brain.nii.gz └── inplane001_brain_mask.nii.gz # 3) behav/task001_run001 └── behavdata.txt # 4) model/model001/onsets/task001_run001 ├── cond001.txt ├── cond002.txt └── cond003.txt # 5) model/model002/onsets/task001_run001 ├── cond001.txt ├── cond002.txt └── cond003.txt

Commentary on the above structure:

  1. Explore for yourself some of the following:
    1. that the model 001 and 002 files seem to be the same (iono why that might be)
    2. BOLD directory ("BOLD/QA") also contains a lot of their images they produced for this individual (maybe try to reproduce?)

Exploring the behavdata.txt

I've included already created files to combine all behavioral data into CSV files, and we can load these csv files with panda. Below is visual of the data:


In [5]:
os.chdir(location_of_processed_data)
behav=pd.read_table("task001_run001_model_data_frame.csv",sep=",")
behav.head(5)


Out[5]:
Unnamed: 0 NumPumps AmtWon NumExpl NumTRs
0 sub001 5.33 46.00 8 239
1 sub002 6.17 39.75 15 281
2 sub003 5.36 42.00 10 232
3 sub004 5.75 42.50 12 270
4 sub005 5.72 44.00 11 271

For BART, we just a have a few items in the Behavior data, and they all make a good amount of sense. Feel free to see the dictionary if you can't guess at them now (or read the pdf files).

I will make comments about $\texttt{NumExpl}$ and $\texttt{NumTRs}$ later so try to figure these out at least :)

Loading in All the Libraries (and additional programs)

I've also imported

  • events2neural which was done in class
  • present_3d a code I have already created, see the example later

In [6]:
#location of my stimuli.py file
os.chdir(location_of_simuli)
from stimuli import events2neural

In [7]:
#locating my Image_Visualizing.py file
os.chdir(location_of_present_3d)
from Image_Visualizing import present_3d

Now for the actual loading of Files and a little Analysis

There are some other observations below, that might be interesting to find


In [8]:
os.chdir(os.path.join(location_of_subject001,"BOLD/task001_run001"))
img=nib.load("bold.nii")
data=img.get_data()
# data.shape # (64, 64, 34, 245)

In [9]:
# just a single 3d image to show case the present_3d function
three_d_image=data[...,0]

In [10]:
# use of my specialized function
full=present_3d(three_d_image)
plt.imshow(full,cmap="gray",interpolation="nearest")
plt.colorbar()


Out[10]:
<matplotlib.colorbar.Colorbar instance at 0x10ccb7fc8>

Data Exploration

1) Is there a major problem in the beginning of the data?
*we will come comment on this later


In [11]:
# cut from middle of the brain
test=data[32,32,15,:] # random voxel
plt.plot(test) # doesn't look like there are problems in the morning


Out[11]:
[<matplotlib.lines.Line2D at 0x112213b10>]

2) Looking at the Conditions/ different types of events in scans


In [12]:
# model (condition data) (will be used to create on/off switches)
os.chdir(os.path.join(location_of_subject001,"model/model001/onsets/task001_run001"))
cond1=np.loadtxt("cond001.txt")
cond2=np.loadtxt("cond002.txt")
cond3=np.loadtxt("cond003.txt")

In [13]:
# Looking at the first to understand values
cond1[:10,:]


Out[13]:
array([[  0.0671,   2.    ,   1.    ],
       [  2.1251,   2.    ,   1.    ],
       [  3.7681,   2.    ,   1.    ],
       [  5.6601,   2.    ,   1.    ],
       [  7.8673,   2.    ,   1.    ],
       [  9.3443,   2.    ,   1.    ],
       [ 19.7831,   2.    ,   1.    ],
       [ 22.0402,   2.    ,   1.    ],
       [ 23.5837,   2.    ,   1.    ],
       [ 25.1434,   2.    ,   1.    ]])

If you remember, there are 3 different types of conditions for the BART trial: (regular, before pop, before save)

  • We already know how many times the first person popped the balloon (see above) 8. So,... I'd bet money that we could figure out which is that one, and the regular should probably be the largest one. In the first draft of this I included some more analysis, but this is a pretty straight forward reason, so lets use it.
  • In the rest of my analysis not included here we saw different lengths of time between elements- and the paper says so as well, this is slightly annoying, but we can deal with it, because we have the start values.

In [14]:
for i in [cond1,cond2,cond3]:
    print(i.shape)


(164, 3)
(8, 3)
(28, 3)

We should notice that the $\texttt{NumTRs}$ in the behavior file (239) is different than the time dimension of the data (245).

I've talked to Jarrod and he thinks the folks just cut out the first 6 recordings, which makes sense as a general practice, I didn't see any note of it anywhere, but Jarrod suggest looking for sumplimentary documents from the paper.


In [15]:
print(str(len(data[0,0,0,:]))+ " is not equal to " + str(behav["NumTRs"][0])) # not the same


245 is not equal to 239

3) Looking at conditional data in different fashions

Problem with dimensions of fMRI data and numTRs


In [16]:
events=events2neural("cond001.txt",2,239) # 1s are non special events
events=np.abs(events-1) # switching 0,1 to 1,0

In [17]:
data_cut=data[...,6:]
# data_cut.shape (64, 64, 34, 239)

We should approach the rest that we can do, the the dimensions are the same :)

Visualizing when the 3 conditions happen:

and that using only the event data will seperate condition 1 from condition 2 and 3


In [18]:
x=np.hstack((cond1[:,0],cond2[:,0],cond3[:,0]))

# specifying which condition they come from (0,1,2)
y=np.zeros((cond1.shape[0]+cond2.shape[0]+cond3.shape[0],))
y[cond1.shape[0]:]=1
y[(cond1.shape[0]+cond2.shape[0]):]+=1

xy=zip(x,y)
xy_sorted=sorted(xy,key= lambda pair:pair[0]) #sorted by x values
x_s,y_s=zip(*xy_sorted) # unzipping
x_s_array=np.array([x for x in x_s])
desired=(x_s_array[1:]-x_s_array[:-1]) 
# difference between the element before and itself (time delay)


# setting up color coding for 3 different types of condition
dictionary_color={0.:"red",1.:"blue",2.:"green"}
colors=[dictionary_color[elem] for elem in y_s[:-1]]

#plot
plt.scatter(x_s_array[:-1]/2,desired,color=colors,label="starts of stimuli")
plt.plot(events*10,label="event neural stimili function")
#plt.plot(events*4) if it's hard to see with just the 10 function
plt.xlabel("Time, by TR")
plt.ylabel("length of time to the next value")
plt.xlim(0-5,239+5)
plt.legend(loc='lower right', shadow=True,fontsize="smaller")
plt.title("Just Checking")


Out[18]:
<matplotlib.text.Text at 0x11222c550>