This notebook covers the concepts underlying design efficiency.
In order to examine the factors that affect efficiency, we need to be able to generate experimental designs that vary in their timing and correlation between regressors. Let's first create a function that can generate such designs for us.
In [20]:
import os
import numpy
%matplotlib inline
import sys
sys.path.insert(0,'../utils')
from mkdesign import create_design_singlecondition
import matplotlib.pyplot as plt
#from spm_hrf import spm_hrf
from nipy.modalities.fmri.hemodynamic_models import spm_hrf,compute_regressor
tr=1.0
# the "blockiness" argument controls how block-y the design is
# from 1( pure block) to 0 (pure random)
d,design=create_design_singlecondition(blockiness=0.95)
regressor,_=compute_regressor(design,
'spm',numpy.arange(0,len(d)))
plt.axis([0,400,-0.2,1.2])
plt.plot(d)
plt.plot(regressor,color='red')
Out[20]:
Now that we have our design, let's generate some synthetic data. We will generate AR1 noise to add to the data; this is not a perfect model of the autocorrelation in fMRI, but it's at least a start towards realistic noise.
In [21]:
from statsmodels.tsa.arima_process import arma_generate_sample
ar1_noise=arma_generate_sample([1,0.3],[1,0.],len(regressor))
beta=4
y=regressor.T*beta + ar1_noise
print y.shape
In [22]:
plt.plot(y.T)
Out[22]:
Now let's fit the general linear model to these data. We will ignore serial autocorrelation for now.
In [23]:
X=numpy.vstack((regressor.T,numpy.ones(y.shape))).T
plt.imshow(X,interpolation='nearest',cmap='gray')
plt.axis('auto')
Out[23]:
In [24]:
beta_hat=numpy.linalg.inv(X.T.dot(X)).dot(X.T).dot(y.T)
y_est=X.dot(beta_hat)
plt.plot(y.T,color='blue')
plt.plot(y_est,color='red',linewidth=2)
print X.shape
Now let's make a function to repeatedly generate data and fit the model.
In [25]:
def efficiency_older(X,c=None):
if not c==None:
c=numpy.ones((X.shape[1]))
else:
c=numpy.array(c)
return 1./c.dot(numpy.linalg.inv(X.T.dot(X))).dot(c)
def efficiency(X,c=None):
""" remove the intercept"""
if not c==None:
c=numpy.ones((X.shape[1]))
else:
c=numpy.array(c)
return 1./numpy.trace((numpy.linalg.inv(X[:,:-1].T.dot(X[:,:-1]))))
Now let's write a simulation that creates datasets with varying levels of blockiness, runs the previous function 1000 times for each level, and plots mean efficiency. Note that we don't actually need to run it 1000 times for blockiness=1, since that design is exactly the same each time.
In [26]:
nruns=100
blockiness_vals=numpy.arange(0,1.1,0.1)
meaneff_blockiness=numpy.zeros(len(blockiness_vals))
for b in range(len(blockiness_vals)):
eff=numpy.zeros(nruns)
for i in range(nruns):
d_sim,design_sim=create_design_singlecondition(blockiness=blockiness_vals[b])
regressor_sim,_=compute_regressor(design_sim,'spm',numpy.arange(0,len(d_sim)))
X=numpy.vstack((regressor_sim.T,numpy.ones(y.shape))).T
eff[i]=efficiency(X,c=[1,0])
meaneff_blockiness[b]=numpy.mean(eff)
In [8]:
plt.plot(blockiness_vals,meaneff_blockiness)
plt.xlabel('blockiness')
plt.ylabel('efficiency')
Out[8]:
In [9]:
X.shape
Out[9]:
Now let's do a similar simulation looking at the effects of varying block length between 10 seconds and 120 seconds (in steps of 10). since blockiness is 1.0 here, we only need one run per block length.
In [27]:
blocklenvals=numpy.arange(10,120,1)
meaneff_blocklen=numpy.zeros(len(blocklenvals))
sims=[]
for b in range(len(blocklenvals)):
d_sim,design_sim=create_design_singlecondition(blocklength=blocklenvals[b],blockiness=1.)
regressor_sim,_=compute_regressor(design_sim,'spm',numpy.arange(0,len(d_sim)))
X=numpy.vstack((regressor_sim.T,numpy.ones(y.shape))).T
sims.append(numpy.mean(regressor_sim))
meaneff_blocklen[b]=efficiency(X,c=[1,0])
In [28]:
plt.plot(blocklenvals,meaneff_blocklen)
plt.xlabel('block length')
plt.ylabel('efficiency')
Out[28]:
Now let's look at the effects of correlation between regressors. We first need to create a function to generate a design with two conditions where we can control the correlation between them.
In [29]:
from mkdesign import create_design_twocondition
d,des1,des2=create_design_twocondition(correlation=1.0)
regressor1,_=compute_regressor(des1,'spm',numpy.arange(0,d.shape[0]))
regressor2,_=compute_regressor(des2,'spm',numpy.arange(0,d.shape[0]))
X=numpy.vstack((regressor1.T,regressor2.T,numpy.ones(y.shape))).T
In [32]:
nruns=100
corr_vals_intended=numpy.arange(-1,1.1,0.1)
corr_vals=numpy.zeros(len(corr_vals_intended))
meaneff_corr=numpy.zeros(len(corr_vals))
sumx=numpy.zeros(len(corr_vals))
for b in range(len(corr_vals_intended)):
eff=numpy.zeros(nruns)
corrs=numpy.zeros(nruns)
for i in range(nruns):
d_sim,des1_sim,des2_sim=create_design_twocondition(correlation=corr_vals_intended[b])
regressor1_sim,_=compute_regressor(des1_sim,'spm',numpy.arange(0,d_sim.shape[0]))
regressor2_sim,_=compute_regressor(des2_sim,'spm',numpy.arange(0,d_sim.shape[0]))
X=numpy.vstack((regressor1_sim.T,regressor2_sim.T,numpy.ones(y.shape))).T
# use contrast of first regressor
eff[i]=efficiency(X,c=[1,0,0])
corrs[i]=numpy.corrcoef(X.T)[0,1]
corr_vals[b]=numpy.mean(corrs)
sumx[b]=numpy.sum(X[:,0])
meaneff_corr[b]=numpy.mean(eff)
In [33]:
plt.plot(corr_vals,meaneff_corr)
plt.xlabel('mean correlation between regressors')
plt.ylabel('efficiency')
Out[33]:
Now let's look at efficiency of estimation of the shape of the HRF, rather than detection of the activation effect. This requires that we use a finite impulse response (FIR) model.
In [34]:
d,design=create_design_singlecondition(blockiness=0.0)
regressor,_=compute_regressor(design,'fir',numpy.arange(0,len(d)),fir_delays=numpy.arange(0,16))
plt.imshow(regressor[:50,:],interpolation='nearest',cmap='gray')
Out[34]:
Now let's simulate the FIR model, and estimate the variance of the fits.
In [40]:
nruns=100
blockiness_vals=numpy.arange(0,1.1,0.1)
meaneff_fit_blockiness=numpy.zeros(len(blockiness_vals))
meancorr=[]
for b in range(len(blockiness_vals)):
eff=numpy.zeros(nruns)
cc=numpy.zeros(nruns)
for i in range(nruns):
d_sim,design_sim=create_design_singlecondition(blockiness=blockiness_vals[b])
regressor_sim,_=compute_regressor(design_sim,'fir',
numpy.arange(0,len(d_sim)),fir_delays=numpy.arange(0,16))
X=numpy.vstack((regressor_sim.T,numpy.ones(regressor_sim.shape[0]))).T
eff[i]=efficiency(X)
cc[i]=numpy.corrcoef(X.T)[0,1]
meaneff_fit_blockiness[b]=numpy.mean(eff)
meancorr.append(numpy.mean(cc))
In [37]:
plt.plot(blockiness_vals,meaneff_fit_blockiness)
plt.xlabel('blockiness')
plt.ylabel('efficiency')
Out[37]:
In [41]:
plt.plot(blockiness_vals,meancorr)
Out[41]:
In [ ]:
__Exercise:__ write a function to generate random designs, and then do this a large number of times, each time estimating the efficiency. Then plot the histogram of efficiencies.
In [ ]: