This notebook covers the concepts underlying design efficiency.

In order to examine the factors that affect efficiency, we need to be able to generate experimental designs that vary in their timing and correlation between regressors. Let's first create a function that can generate such designs for us.


In [20]:
import os
import numpy
%matplotlib inline
import sys
sys.path.insert(0,'../utils')
from mkdesign import create_design_singlecondition
import matplotlib.pyplot as plt
#from spm_hrf import spm_hrf
from nipy.modalities.fmri.hemodynamic_models import spm_hrf,compute_regressor

tr=1.0

# the "blockiness" argument controls how block-y the design is
# from 1( pure block) to 0 (pure random)
d,design=create_design_singlecondition(blockiness=0.95)
regressor,_=compute_regressor(design,
                    'spm',numpy.arange(0,len(d)))
plt.axis([0,400,-0.2,1.2])
plt.plot(d)
plt.plot(regressor,color='red')


Out[20]:
[<matplotlib.lines.Line2D at 0x7f29de2a8d10>]

Now that we have our design, let's generate some synthetic data. We will generate AR1 noise to add to the data; this is not a perfect model of the autocorrelation in fMRI, but it's at least a start towards realistic noise.


In [21]:
from statsmodels.tsa.arima_process import arma_generate_sample

ar1_noise=arma_generate_sample([1,0.3],[1,0.],len(regressor))
beta=4
y=regressor.T*beta + ar1_noise
print y.shape


(1, 400)

In [22]:
plt.plot(y.T)


Out[22]:
[<matplotlib.lines.Line2D at 0x7f29dda48dd0>]

Now let's fit the general linear model to these data. We will ignore serial autocorrelation for now.


In [23]:
X=numpy.vstack((regressor.T,numpy.ones(y.shape))).T
plt.imshow(X,interpolation='nearest',cmap='gray')
plt.axis('auto')


Out[23]:
(-0.5, 1.5, 400.0, -50.0)

In [24]:
beta_hat=numpy.linalg.inv(X.T.dot(X)).dot(X.T).dot(y.T)
y_est=X.dot(beta_hat)
plt.plot(y.T,color='blue')
plt.plot(y_est,color='red',linewidth=2)
print X.shape


(400, 2)

Now let's make a function to repeatedly generate data and fit the model.


In [25]:
def efficiency_older(X,c=None):
    if not c==None:
        c=numpy.ones((X.shape[1]))
    else:
        c=numpy.array(c)
    return 1./c.dot(numpy.linalg.inv(X.T.dot(X))).dot(c)

def efficiency(X,c=None):
    """ remove the intercept"""
    if not c==None:
        c=numpy.ones((X.shape[1]))
    else:
        c=numpy.array(c)
    return 1./numpy.trace((numpy.linalg.inv(X[:,:-1].T.dot(X[:,:-1]))))

Now let's write a simulation that creates datasets with varying levels of blockiness, runs the previous function 1000 times for each level, and plots mean efficiency. Note that we don't actually need to run it 1000 times for blockiness=1, since that design is exactly the same each time.


In [26]:
nruns=100
blockiness_vals=numpy.arange(0,1.1,0.1)
meaneff_blockiness=numpy.zeros(len(blockiness_vals))

for b in range(len(blockiness_vals)):
    eff=numpy.zeros(nruns)
    for i in range(nruns):
        d_sim,design_sim=create_design_singlecondition(blockiness=blockiness_vals[b])
        regressor_sim,_=compute_regressor(design_sim,'spm',numpy.arange(0,len(d_sim)))
        X=numpy.vstack((regressor_sim.T,numpy.ones(y.shape))).T
        eff[i]=efficiency(X,c=[1,0])
    meaneff_blockiness[b]=numpy.mean(eff)

In [8]:
plt.plot(blockiness_vals,meaneff_blockiness)
plt.xlabel('blockiness')
plt.ylabel('efficiency')


Out[8]:
<matplotlib.text.Text at 0x7f29dde205d0>

In [9]:
X.shape


Out[9]:
(400, 2)

Now let's do a similar simulation looking at the effects of varying block length between 10 seconds and 120 seconds (in steps of 10). since blockiness is 1.0 here, we only need one run per block length.


In [27]:
blocklenvals=numpy.arange(10,120,1)
meaneff_blocklen=numpy.zeros(len(blocklenvals))
sims=[]
for b in range(len(blocklenvals)):
        d_sim,design_sim=create_design_singlecondition(blocklength=blocklenvals[b],blockiness=1.)
        regressor_sim,_=compute_regressor(design_sim,'spm',numpy.arange(0,len(d_sim)))
        X=numpy.vstack((regressor_sim.T,numpy.ones(y.shape))).T
        sims.append(numpy.mean(regressor_sim))
        meaneff_blocklen[b]=efficiency(X,c=[1,0])

In [28]:
plt.plot(blocklenvals,meaneff_blocklen)
plt.xlabel('block length')
plt.ylabel('efficiency')


Out[28]:
<matplotlib.text.Text at 0x7f29dda58f90>

Now let's look at the effects of correlation between regressors. We first need to create a function to generate a design with two conditions where we can control the correlation between them.


In [29]:
from mkdesign import create_design_twocondition

d,des1,des2=create_design_twocondition(correlation=1.0)
regressor1,_=compute_regressor(des1,'spm',numpy.arange(0,d.shape[0]))
regressor2,_=compute_regressor(des2,'spm',numpy.arange(0,d.shape[0]))

X=numpy.vstack((regressor1.T,regressor2.T,numpy.ones(y.shape))).T

In [32]:
nruns=100
corr_vals_intended=numpy.arange(-1,1.1,0.1)

corr_vals=numpy.zeros(len(corr_vals_intended))

meaneff_corr=numpy.zeros(len(corr_vals))
sumx=numpy.zeros(len(corr_vals))

for b in range(len(corr_vals_intended)):
    eff=numpy.zeros(nruns)
    corrs=numpy.zeros(nruns)
    for i in range(nruns):
        d_sim,des1_sim,des2_sim=create_design_twocondition(correlation=corr_vals_intended[b])
        regressor1_sim,_=compute_regressor(des1_sim,'spm',numpy.arange(0,d_sim.shape[0]))
        regressor2_sim,_=compute_regressor(des2_sim,'spm',numpy.arange(0,d_sim.shape[0]))
        X=numpy.vstack((regressor1_sim.T,regressor2_sim.T,numpy.ones(y.shape))).T
        # use contrast of first regressor
        eff[i]=efficiency(X,c=[1,0,0])
        corrs[i]=numpy.corrcoef(X.T)[0,1]
    corr_vals[b]=numpy.mean(corrs)
    sumx[b]=numpy.sum(X[:,0])
    meaneff_corr[b]=numpy.mean(eff)

In [33]:
plt.plot(corr_vals,meaneff_corr)
plt.xlabel('mean correlation between regressors')
plt.ylabel('efficiency')


Out[33]:
<matplotlib.text.Text at 0x7f29dd92a490>

Now let's look at efficiency of estimation of the shape of the HRF, rather than detection of the activation effect. This requires that we use a finite impulse response (FIR) model.


In [34]:
d,design=create_design_singlecondition(blockiness=0.0)
regressor,_=compute_regressor(design,'fir',numpy.arange(0,len(d)),fir_delays=numpy.arange(0,16))
plt.imshow(regressor[:50,:],interpolation='nearest',cmap='gray')


Out[34]:
<matplotlib.image.AxesImage at 0x7f29dd5b7f50>

Now let's simulate the FIR model, and estimate the variance of the fits.


In [40]:
nruns=100
blockiness_vals=numpy.arange(0,1.1,0.1)
meaneff_fit_blockiness=numpy.zeros(len(blockiness_vals))
meancorr=[]
for b in range(len(blockiness_vals)):
    eff=numpy.zeros(nruns)
    cc=numpy.zeros(nruns)
    for i in range(nruns):
        d_sim,design_sim=create_design_singlecondition(blockiness=blockiness_vals[b])
        regressor_sim,_=compute_regressor(design_sim,'fir',
                            numpy.arange(0,len(d_sim)),fir_delays=numpy.arange(0,16))
        X=numpy.vstack((regressor_sim.T,numpy.ones(regressor_sim.shape[0]))).T
        eff[i]=efficiency(X)
        cc[i]=numpy.corrcoef(X.T)[0,1]
        
    meaneff_fit_blockiness[b]=numpy.mean(eff)
    meancorr.append(numpy.mean(cc))

In [37]:
plt.plot(blockiness_vals,meaneff_fit_blockiness)
plt.xlabel('blockiness')
plt.ylabel('efficiency')


Out[37]:
<matplotlib.text.Text at 0x7f29ddee3fd0>

In [41]:
plt.plot(blockiness_vals,meancorr)


Out[41]:
[<matplotlib.lines.Line2D at 0x7f29dd87a910>]

In [ ]:
__Exercise:__ write a function to generate random designs, and then do this a large number of times, each time estimating the efficiency.  Then plot the histogram of efficiencies.

In [ ]: