For inline matplotlib/seaborn figures run this line:
In [1]:
%matplotlib inline
Next, import any packages or modules that you'll use in the notebook:
In [2]:
# For data structures
import pandas as pd
# For operating system functionality
import os.path as op
import os
# Stats stuff
from scipy import stats
import numpy as np
# For plotting
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
Set up seaborn with larger font & white background
In [3]:
sns.set(context='poster', style='whitegrid')
Python is an object-oriented language, meaning that there are classes (e.g., Pandas DataFrame) which (1) describe the contents of objects from that class (e.g., data, columns, datatype), and (2) define operations (i.e., methods) which an object from that class can perform (e.g., append(), groupby()).
A Pandas DataFrame is a 2D labeled data structure, that can have columns of different types (e.g., string, integer). You can specify indices (row labels) and column labels as arguments. You can make DataFrames from a variety of different inputs including:
Here we will create a new instance of the DataFrame class, and assign this object to the variable data. This object will be initialized with the default parameters unless otherwise specified, and will inherit the methods and attributes of the DataFrame class. Check out this site to see the parameters, attributes, and methods of the DataFrame class.
In [4]:
data = pd.DataFrame(columns=['subid', 'cond', 'value'])
data
Out[4]:
In [5]:
new_row = {'subid': 'subj1',
'cond': 'place',
'value': 3}
data = data.append([new_row])
In [6]:
data.head()
Out[6]:
In [7]:
# Make a dict of Series
new_rows = {'subid': pd.Series(['subj1']),
'cond': pd.Series(['face', 'place', 'face']),
'value': pd.Series([5, 3, 7])}
# Create new DF
new_data = pd.DataFrame(new_rows)
# Add to old DF
data = data.append(new_data, ignore_index=True)
In [8]:
data.head()
Out[8]:
In [9]:
data.subid[data.subid.isnull()] = 'subj2'
In [10]:
data.head()
Out[10]:
Now, we can access the attributes of the DataFrame object:
In [11]:
print 'Data shape: ' + str(data.shape)
In [12]:
data2 = pd.DataFrame(columns=['subid', 'group', 'gender'])
new_row = {'subid': 'subj1',
'group': 'control',
'gender': 'female'}
data2 = data2.append([new_row])
data2.head()
Out[12]:
Merging is done w/relational terminology similar to that used in SQL. There are 3 main cases of joining DataFrame objects:
First, try merging using the intersection of keys from both frames:
In [13]:
pd.merge(data, data2, on=['subid'], how='inner')
Out[13]:
Merge using union of keys from both frames:
In [14]:
pd.merge(data, data2, on=['subid'], how='outer')
Out[14]:
In [15]:
pd.concat([data, data2], ignore_index=True)
Out[15]:
Or do the same thing using the append() method
In [16]:
data.append(data2, ignore_index=True)
Out[16]:
In [17]:
data_file = 'objfam_groupcat_euc.csv'
username = os.getlogin()
data_dir = op.join('/Users', username, 'Dropbox/Code/tutorial/')
In [18]:
data_dir
Out[18]:
In [19]:
df = pd.read_csv(op.join(data_dir, data_file))
In [20]:
df.head(n=3)
Out[20]:
In [21]:
df.describe()
Out[21]:
In [22]:
df.iloc[3]
Out[22]:
In [23]:
df.query('Subject == 3 & EuclidDist > 500')
Out[23]:
In [24]:
df[(df.Subject == 3) & (df.EuclidDist > 500)]
Out[24]:
In [25]:
df[(df['Subject'] == 3) & (df['EuclidDist'] > 500)]
Out[25]:
In [26]:
df_identical = df.query('Morph == 1')
df_identical.head()
Out[26]:
In [27]:
df_resp_old = df[df.Response > 3]
df_resp_old.head()
Out[27]:
In [28]:
select_prototypes = [7, 3]
df_select = df[df.Prototype.isin(select_prototypes)]
In [29]:
df_select.head()
Out[29]:
Splitting the data into groups based on some criteria (e.g., subject id), applying a function to each group independently (e.g., mean, median), and combining the results into a data structure
In [30]:
grouped = df.groupby(['Subject', 'Morph'], sort=False).mean().reset_index()
grouped.head()
Out[30]:
In [31]:
table = pd.pivot_table(df, values='RT',
index=['Subject', 'Morph'],
columns=['Response'],
aggfunc=np.median)
In [32]:
table.head()
Out[32]:
In [33]:
table.stack()
Out[33]:
In [34]:
grouped = df.groupby(['Subject', 'Morph']).mean().reset_index()
sns.lmplot(x='Morph', y='Response',
ci=68, data=grouped,
x_estimator=np.mean,
color='dodgerblue')
Out[34]:
In [35]:
sns.distplot(df.EuclidDist, color='darkviolet')
Out[35]:
In [36]:
grouped = df.groupby(["Subject",
"Run",
"Morph"]).Response.mean().reset_index()
sns.tsplot(data=grouped, time='Run',
condition='Morph',
unit='Subject',
value='Response',
estimator=stats.nanmean)
Out[36]:
In [37]:
sns.corrplot(df)
Out[37]:
In [38]:
from IPython.html.widgets import interact, interactive, fixed
from IPython.html import widgets
from IPython.display import clear_output, display, HTML
In [39]:
def plot_bysubid(subid):
sns.lmplot(x='Morph', y='Response',
ci=68, data=df[df.Subject == subid],
x_estimator=np.mean,
color='dodgerblue')
plt.ylim(1,5)
In [40]:
i = interact(plot_bysubid,
subid=widgets.FloatSliderWidget(min=3,
max=25,
step=1,
value=5))
In [41]:
%load_ext rpy2.ipython
In [42]:
%R -i df
In [43]:
%%R
print(str(df))
In [44]:
%%R
# Factor categorical var
df$Subject = factor(df$Subject)
# Load in libraries
require(lme4)
require(lmerTest)
# Run analysis
rs1 = lmer(Response~scale(EuclidDist) + (1|Subject), data=df)
rs2 = lmer(Response~scale(EuclidDist) + (1 + scale(EuclidDist)|Subject), data=df)
print(anova(rs1, rs2))
print(summary(rs2))
In [45]:
import nibabel as nb
from nibabel import load
In [46]:
fmri_file = 'smoothed_timeseries.nii.gz'
fmri_filepath = op.join(data_dir, fmri_file)
In [47]:
fmri_data = load(fmri_filepath)
func_arr = fmri_data.get_data()
In [48]:
print np.shape(func_arr)
print func_arr.shape
In [49]:
TR = 20
plt.imshow(func_arr[:,:,15, TR])
Out[49]:
In [50]:
slices = range(1, np.shape(func_arr)[2], 5)
num_subplots = len(slices)
f, axes = plt.subplots(1,num_subplots)
for ax, slice in zip(axes, slices):
ax.imshow(func_arr[:,:,slice, TR], )
ax.set_axis_off()
plt.tight_layout()
In [51]:
mask_file = 'Bilat-Hippocampus.nii.gz'
mask_filepath = op.join(data_dir, mask_file)
In [52]:
mask_data = load(mask_filepath)
mask_arr = mask_data.get_data().astype(bool)
In [53]:
num_voxels = mask_arr.sum(); print '# Voxels = ' + str(num_voxels)
mask_dim = mask_arr.shape; print 'Mask Dim = ' + str(mask_dim)
In [54]:
plt.imshow(mask_arr[:,:,8])
Out[54]:
In [55]:
fmri_file = 'zstat1.nii.gz'
fmri_filepath = op.join(data_dir, fmri_file)
fmri_data = load(fmri_filepath)
func_arr = fmri_data.get_data()
In [56]:
func_arr.shape
Out[56]:
In [57]:
func_masked = func_arr[mask_arr]
func_masked.shape
Out[57]:
In [58]:
sns.distplot(func_masked, color='dodgerblue')
plt.xlabel('zstat')
plt.vlines(x=0, ymin=0, ymax=.25, linestyles='dashed')
Out[58]:
In [59]:
sns.puppyplot()
Out[59]:
In [59]: