This notebook covers the concepts underlying design efficiency.

In order to examine the factors that affect efficiency, we need to be able to generate experimental designs that vary in their timing and correlation between regressors. Let's first create a function that can generate such designs for us.


In [1]:
import os
import numpy
%matplotlib inline
import sys
sys.path.insert(0,'../utils')
from mkdesign import create_design_singlecondition
import matplotlib.pyplot as plt
#from spm_hrf import spm_hrf
from nipy.modalities.fmri.hemodynamic_models import spm_hrf,compute_regressor

# the "blockiness" argument controls how block-y the design is
# from 1( pure block) to 0 (pure random)
d,design=create_design_singlecondition(blockiness=0.9)
plt.axis([0,400,-0.2,1.2])
plt.plot(d)

tr=1.0


regressor,_=compute_regressor(design,'spm',numpy.arange(0,len(d)))
plt.plot(regressor,color='red')


Out[1]:
[<matplotlib.lines.Line2D at 0x7f2bb83d6b90>]

Now that we have our design, let's generate some synthetic data. We will generate AR1 noise to add to the data; this is not a perfect model of the autocorrelation in fMRI, but it's at least a start towards realistic noise.


In [2]:
from statsmodels.tsa.arima_process import arma_generate_sample

ar1_noise=arma_generate_sample([1,0.3],[1,0.],len(regressor))
beta=4
y=regressor.T*beta + ar1_noise
print y.shape


(1, 400)

In [3]:
plt.plot(y.T)


Out[3]:
[<matplotlib.lines.Line2D at 0x7f2bb4ccba90>]

Now let's fit the general linear model to these data. We will ignore serial autocorrelation for now.


In [4]:
X=numpy.vstack((regressor.T,numpy.ones(y.shape))).T
plt.imshow(X,interpolation='nearest',cmap='gray')
plt.axis('auto')


Out[4]:
(-0.5, 1.5, 400.0, -50.0)

In [5]:
beta_hat=numpy.linalg.inv(X.T.dot(X)).dot(X.T).dot(y.T)
y_est=X.dot(beta_hat)
plt.plot(y.T,color='blue')
plt.plot(y_est,color='red',linewidth=2)
print X.shape


(400, 2)

Now let's make a function to repeatedly generate data and fit the model.


In [6]:
def efficiency(X,c=None):
    if not c==None:
        c=numpy.ones((X.shape[1]))
    else:
        c=numpy.array(c)
    return 1./c.dot(numpy.linalg.inv(X.T.dot(X))).dot(c)

Now let's write a simulation that creates datasets with varying levels of blockiness, runs the previous function 1000 times for each level, and plots mean efficiency. Note that we don't actually need to run it 1000 times for blockiness=1, since that design is exactly the same each time.


In [7]:
nruns=1000
blockiness_vals=numpy.arange(0,1.1,0.1)
meaneff_blockiness=numpy.zeros(len(blockiness_vals))

for b in range(len(blockiness_vals)):
    eff=numpy.zeros(nruns)
    for i in range(nruns):
        d_sim,design_sim=create_design_singlecondition(blockiness=blockiness_vals[b])
        regressor_sim,_=compute_regressor(design_sim,'spm',numpy.arange(0,len(d_sim)))
        X=numpy.vstack((regressor_sim.T,numpy.ones(y.shape))).T
        eff[i]=efficiency(X,c=[1,0])
    meaneff_blockiness[b]=numpy.mean(eff)

In [8]:
plt.plot(blockiness_vals,meaneff_blockiness)
plt.xlabel('blockiness')
plt.ylabel('efficiency')


Out[8]:
<matplotlib.text.Text at 0x7f2bb4b9c750>

In [9]:
X.shape


Out[9]:
(400, 2)

Now let's do a similar simulation looking at the effects of varying block length between 10 seconds and 120 seconds (in steps of 10). since blockiness is 1.0 here, we only need one run per block length.


In [10]:
blocklenvals=numpy.arange(10,120,5)
meaneff_blocklen=numpy.zeros(len(blocklenvals))

for b in range(len(blocklenvals)):
        d_sim,design_sim=create_design_singlecondition(blocklength=blocklenvals[b],blockiness=1.)
        regressor_sim,_=compute_regressor(design_sim,'spm',numpy.arange(0,len(d_sim)))
        X=numpy.vstack((regressor_sim.T,numpy.ones(y.shape))).T
        meaneff_blocklen[b]=efficiency(X,c=[1,0])

In [11]:
plt.plot(blocklenvals,meaneff_blocklen)
plt.xlabel('block length')
plt.ylabel('efficiency')


Out[11]:
<matplotlib.text.Text at 0x7f2bb4ab44d0>

Now let's look at the effects of correlation between regressors. We first need to create a function to generate a design with two conditions where we can control the correlation between them.


In [12]:
from mkdesign import create_design_twocondition

d,des1,des2=create_design_twocondition(correlation=1.0)
regressor1,_=compute_regressor(des1,'spm',numpy.arange(0,d.shape[0]))
regressor2,_=compute_regressor(des2,'spm',numpy.arange(0,d.shape[0]))

X=numpy.vstack((regressor1.T,regressor2.T,numpy.ones(y.shape))).T

In [14]:
nruns=1000
corr_vals=numpy.arange(0,1.1,0.1)
meaneff_corr=numpy.zeros(len(corr_vals))

for b in range(len(corr_vals)):
    eff=numpy.zeros(nruns)
    corrs=numpy.zeros(nruns)
    for i in range(nruns):
        d_sim,des1_sim,des2_sim=create_design_twocondition(correlation=corr_vals[b])
        regressor1_sim,_=compute_regressor(des1_sim,'spm',numpy.arange(0,d_sim.shape[0]))
        regressor2_sim,_=compute_regressor(des2_sim,'spm',numpy.arange(0,d_sim.shape[0]))
        X=numpy.vstack((regressor1_sim.T,regressor2_sim.T,numpy.ones(y.shape))).T
        # use contrast of first regressor
        eff[i]=efficiency(X,c=[1,0,0])
        corrs[i]=numpy.corrcoef(X.T)[0,1]
    corr_vals[b]=numpy.mean(corrs)
    meaneff_corr[b]=numpy.mean(eff)

In [16]:
plt.plot(corr_vals,meaneff_corr)
plt.xlabel('mean correlation between regressors')
plt.ylabel('efficiency')


Out[16]:
<matplotlib.text.Text at 0x7f2bb49e9fd0>

Now let's look at efficiency of estimation of the shape of the HRF, rather than detection of the activation effect. This requires that we use a finite impulse response (FIR) model.


In [17]:
d,design=create_design_singlecondition(blockiness=0.0)
regressor,_=compute_regressor(design,'fir',numpy.arange(0,len(d)),fir_delays=numpy.arange(0,16))
plt.imshow(regressor[:50,:],interpolation='nearest',cmap='gray')


Out[17]:
<matplotlib.image.AxesImage at 0x7f2bb478be10>

Now let's simulate the FIR model, and estimate the variance of the fits.


In [19]:
def efficiency_trace(X):
    """ remove the intercept"""
    return 1./numpy.trace((numpy.linalg.inv(X[:,:-1].T.dot(X[:,:-1]))))

nruns=1000
blockiness_vals=numpy.arange(0,1.1,0.1)
meaneff_fit_blockiness=numpy.zeros(len(blockiness_vals))

for b in range(len(blockiness_vals)):
    eff=numpy.zeros(nruns)
    for i in range(nruns):
        d_sim,design_sim=create_design_singlecondition(blockiness=blockiness_vals[b])
        regressor_sim,_=compute_regressor(design_sim,'fir',
                            numpy.arange(0,len(d_sim)),fir_delays=numpy.arange(0,16))
        X=numpy.vstack((regressor_sim.T,numpy.ones(regressor_sim.shape[0]))).T
        eff[i]=efficiency_trace(X)
        
    meaneff_fit_blockiness[b]=numpy.mean(eff)

In [20]:
plt.plot(blockiness_vals,meaneff_fit_blockiness)
plt.xlabel('blockiness')
plt.ylabel('efficiency')


Out[20]:
<matplotlib.text.Text at 0x7f2bb479be50>

Exercise: write a function to generate random designs, and then do this a large number of times, each time estimating the efficiency. Then plot the histogram of efficiencies.


In [ ]: