Neurodesign comparison of design generators

In this notebook, we will compare 3 methods to generate an experimental design:

  • a design optimised using the genetic algorithm
  • a design optimised using simulations
  • a randomly drawn design

We will do so using simulations: what is the resulting observed power when we simulate experiments according to the three designs.


In [22]:
from neurodesign import optimisation,experiment
import matplotlib.pyplot as plt
from scipy.stats import t
import seaborn as sns
import pandas as pd
import numpy as np

%matplotlib inline
%load_ext rpy2.ipython

cycles = 1000
sims = 5000


The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

Optimise designs

First we define the experiment. We will optimise an experiment with a TR of 2 seconds and 250 trials of 0.5 seconds each. There are 4 stimulus types, and we are interested in the shared effect of the first and second stimulus versus baseline, as well as the difference between the first and the fourth stimulus. We assume an autoregressive temporal autocorrelation of 0.3.

We sample ITI's from a truncated exponential distribution with minimum 0.3 seconds and maximum 4 seconds, and the mean is 1 second.


In [2]:
# define the experiment
EXP = experiment(
    TR=2,
    n_trials=450,
    P = [0.25,0.25,0.25],
    C = [[1,0,0],[0,1,0],[0,0,1],[1,0,-1]],
    n_stimuli = 3,
    rho = 0.3,
    resolution=0.1,
    stim_duration=1,
    ITImodel = "exponential",
    ITImin = 0.3,
    ITImean = 1,
    ITImax=4
    )


/Users/Joke/anaconda/lib/python2.7/site-packages/neurodesign/neurodesign.py:410: UserWarning: Warning: the resolution is adjusted to be a multiple of the TR.  New resolution: 0.100000
  warnings.warn("Warning: the resolution is adjusted to be a multiple of the TR.  New resolution: %f"%self.resolution)
/Users/Joke/anaconda/lib/python2.7/site-packages/neurodesign/neurodesign.py:560: RuntimeWarning: divide by zero encountered in log
  res = (h - 1) * np.log(s) + h * np.log(l) - l * s - np.log(gamma(h))

In [3]:
POP_Max = optimisation(
    experiment=EXP,
    weights=[0,0.5,0.25,0.25],
    preruncycles = cycles,
    cycles = 2,
    optimisation='GA'
    )

POP_Max.optimise()


100% |########################################################################|
100% |########################################################################|
Out[3]:
<neurodesign.neurodesign.optimisation at 0x110fd2510>

In [4]:
EXP.FeMax = POP_Max.exp.FeMax
EXP.FdMax = POP_Max.exp.FdMax

Below we define two populations of designs. We will optimise one using the genetic algorithm, and the other using randomly drawn designs.

We optimise for statistical power (weights = [0,1,0,0]). We run 100 cycles.


In [5]:
POP_GA = optimisation(
    experiment=EXP,
    weights=[0,0.5,0.25,0.25],
    preruncycles = 2,
    cycles = cycles,
    seed=1,
    outdes=5,
    I=10,
    folder='/tmp/',
    optimisation='GA'
    )

POP_RN = optimisation(
    experiment=EXP,
    weights=[0,0.5,0.25,0.25],
    preruncycles = 2,
    cycles = cycles,
    seed=100,
    outdes=5,
    I=50,
    G=10,
    folder='/tmp/',
    optimisation='simulation'
    )

In [6]:
POP_GA.optimise()


100% |########################################################################|
Out[6]:
<neurodesign.neurodesign.optimisation at 0x11c2638d0>

In [7]:
POP_RN.optimise()


100% |########################################################################|
Out[7]:
<neurodesign.neurodesign.optimisation at 0x11c263990>

Below, we show how the efficiency scores improve over cycles for both algorithms, although the Genetic Algorithm clearly improves faster and reaches a higher plateau.


In [8]:
plt.plot(POP_GA.optima,label='Genetic Algorithm')
plt.plot(POP_RN.optima,label='Simulation')
plt.legend()
plt.savefig("output/test_scores.pdf")


Below, we repeat the random design generator, but we search only 100 designs and one generation. As such, this is a random design.


In [9]:
# 1 gen
POP_JO = optimisation(
    experiment=EXP,
    weights=[0,0.5,0.25,0.25],
    preruncycles = 1,
    cycles = 1,
    seed=1,
    outdes=5,
    G=100,
    folder='/tmp/',
    optimisation='simulation'
    )
POP_JO.optimise()


100% |########################################################################|
Out[9]:
<neurodesign.neurodesign.optimisation at 0x11cb4fa90>

In [10]:
#collect scores and take average
scores = [x.F for x in POP_JO.designs]

median_idx = np.where(scores == np.median(scores))[0][0]
rnd_median = POP_JO.designs[median_idx]

# get PI
BTI_l = np.percentile(scores,5)
BTI_u = np.percentile(scores,95)

In [11]:
print("Optimisation score - random: %s \n\
Optimisation score - genetic algorithm: %s \n\
Optimisation score - simulation (90 percent PI): %s-%s"%(POP_RN.optima[::-1][0],
    POP_GA.optima[::-1][0],BTI_l,BTI_u))


Optimisation score - random: 0.7933121936748979 
Optimisation score - genetic algorithm: 0.867326561918992 
Optimisation score - simulation (90 percent PI): 0.607035712415793-0.6984508300540047

Let's look at the resulting experimental designs.


In [12]:
des = np.array([POP_GA.bestdesign.Xconv,POP_RN.bestdesign.Xconv,rnd_median.Xconv])
labels = ['Genetic Algorithm','Simulation','Median random design']
plt.figure(figsize=(10,7))
for ind,label in enumerate(labels):
    plt.subplot(3,1,ind+1)
    plt.plot(des[ind,:,:])
    plt.title(label)
    plt.tick_params(axis = 'x',which = 'both', bottom = 'off', labelbottom='off')

plt.savefig("output/designs.pdf")



In [13]:
des = np.array([POP_GA.bestdesign.Xconv,POP_RN.bestdesign.Xconv]+[x.Xconv for x in POP_JO.designs])

Simulate data

We continue with the best designs from the two algorithms and the random design. Below, we simulate data in one voxel that is significantly related to the task. We assume beta values of (0.5, 0, -0.5).


In [23]:
# create datatables 
tp = des.shape[1]
Y = np.zeros([tp,sims,des.shape[0]])

for i in range(sims):
    rnd = np.random.normal(0,1,tp)
    for lb in range(Y.shape[2]):
        Y[:,i,lb] = np.dot(des[lb,:,:],np.array([0.5,0,-0.5]))+rnd

ids = [0,1,median_idx]

In [24]:
plt.plot(Y[:,1:3,1])


Out[24]:
[<matplotlib.lines.Line2D at 0x11d287e50>,
 <matplotlib.lines.Line2D at 0x11d287950>]

We analyse the data using R below.


In [25]:
%%R -i des,Y,sims,ids -o tvals_main,tvals_diff,pows
tvals_main <- array(NA,dim=c(sims,3))
tvals_diff <- array(NA,dim=c(sims,3))
pows <- array(NA,dim=c(dim(Y)[3],2))

threshold <- qt(0.95,df=(dim(des)[2]-2))

i = 1
for (method in 1:dim(Y)[3]){
    ts_main <- c()
    ts_diff <- c()
    for (sim in 1:sims){
        dif <- des[method,,1]-des[method,,2]
        fit_main <- lm(Y[,sim,method]~des[method,,])
        fit_diff <- lm(Y[,sim,method]~dif)
        ts_main[sim] <- summary(fit_main)$coef[2,3]
        ts_diff[sim] <- summary(fit_diff)$coef[2,3]
        }
    if ((method-1) %in% ids){
        tvals_main[,i] <- ts_main
        tvals_diff[,i] <- ts_diff
        i <- i+1
    }
    pows[method,1] <- mean(ts_main>threshold)
    pows[method,2] <- mean(ts_diff>threshold)
}

This is what the distributions for the two contrasts look like.


In [26]:
nms = ['Main effect','Contrast effect']
plt.figure(figsize=(18,4))
for idx,tv in enumerate([tvals_main,tvals_diff]):
    plt.subplot(1,2,idx+1)
    for idy,method in enumerate(labels):
        sns.distplot(tv[:,idy],label=method)
    plt.title(nms[idx])
plt.legend()
plt.savefig("output/distributions.pdf")



In [27]:
pows.shape


Out[27]:
(103, 2)

Observed power


In [31]:
# We're assuming a single threshold on a single test, a representative simplification.
threshold = t.ppf(0.95,des.shape[1]-2)
nms = ['main effect','contrast effect']
out = {label:[] for label in labels}
for idx in range(2):
    for idy,method in enumerate(labels):
        if idy < 2:
            print("The power for the %s with %s: %f"%(nms[idx],method,pows[idy,idx]))
    med = np.percentile(pows[2:,idx],50)
    ll = np.percentile(pows[2:,idx],5)
    ul = np.percentile(pows[2:,idx],95)                
    print("The median for the %s with a randomly drawn design: %f"%(nms[idx],med))
    print("The 90 percent PI for the %s with a randomly drawn design: %f-%f"%(nms[idx],
                  ll,ul))


The power for the main effect with Genetic Algorithm: 0.449800
The power for the main effect with Simulation: 0.395600
The median for the main effect with a randomly drawn design: 0.261400
The 90 percent PI for the main effect with a randomly drawn design: 0.221400-0.312000
The power for the contrast effect with Genetic Algorithm: 0.733400
The power for the contrast effect with Simulation: 0.748800
The median for the contrast effect with a randomly drawn design: 0.358400
The 90 percent PI for the contrast effect with a randomly drawn design: 0.211400-0.758400