Figure: How large should effect sizes be in neuroimaging to have sufficient power?

Specification of alternative

In a brain map in an MNI template, with smoothness of 3 times the voxelsize, there is one active region with voxelwise effect size D. The (spatial) size of the region is relatively small (<200 voxels). We want to know how large D should be in order to have 80% power to detect the region using voxelwise FWE thresholding using Random Field Theory. Detect the region means that the maximum in the activated area exceeds the significance threshold.

Strategy

  1. Compute the voxelwise threshold for the specified smoothness and volume
    • FweThres = 5.12
  2. Define the alternative hypothesis, so that the omnibus power is 80%
  3. How large should the maximum statistic in a (small) region be to exceed the voxelwise threshold with 0.8 power?
    • muMax = 4.00
  4. How does this voxel statistic translate to Cohen's D for a given sample size?
    • See Figure

In [1]:
% matplotlib inline
from __future__ import division
import os
import nibabel as nib
import numpy as np
from neuropower import peakdistribution
import scipy.integrate as integrate
import pandas as pd
import matplotlib.pyplot as plt
import palettable.colorbrewer as cb

if not 'FSLDIR' in os.environ.keys():
    raise Exception('This notebook requires that FSL is installed and the FSLDIR environment variable is set')

1. What is the voxelwise threshold?


In [2]:
# From smoothness + mask to ReselCount

FWHM = 3
ReselSize = FWHM**3
MNI_mask = nib.load(os.path.join(os.getenv('FSLDIR'),'data/standard/MNI152_T1_2mm_brain_mask.nii.gz')).get_data()
Volume = np.sum(MNI_mask)
ReselCount = Volume/ReselSize

print("ReselSize: "+str(ReselSize))
print("Volume: "+str(Volume))
print("ReselCount: "+str(ReselCount))
print("------------")

# From ReselCount to FWE treshold

FweThres_cmd = 'ptoz 0.05 -g %s' %ReselCount
FweThres = os.popen(FweThres_cmd).read()

print("FWE voxelwise GRF threshold: "+str(FweThres))


ReselSize: 27
Volume: 228483
ReselCount: 8462.33333333
------------
FWE voxelwise GRF threshold: 5.123062

2. Definition of alternative

Detect 1 region

We define a 'success' as a situation in which the maximum in the active field exceeds the threshold.


In [3]:
Power = 0.8

3. How large statistic in a field be to exceed the threshold with power 0.80?

We quantify this by computing the expected local maximum in the field (which is a null field elevated by value D). We use the distribution of local maxima of Cheng&Schwartzman to compute the power/effect size.


In [4]:
muRange = np.arange(1.8,5,0.01)
muSingle = []
for muMax in muRange:
# what is the power to detect a maximum 
    power = 1-integrate.quad(lambda x:peakdistribution.peakdens3D(x,1),-20,float(FweThres)-muMax)[0]
    if power>Power:
        muSingle.append(muMax)
        break
print("The power is sufficient for one region if mu equals: "+str(muSingle[0]))


The power is sufficient for one region if mu equals: 4.0

extra analysis: power 0.90


In [5]:
muRange = np.arange(1.8,5,0.01)
muNinety = []
for muMax in muRange:
# what is the power to detect a maximum 
    power = 1-integrate.quad(lambda x:peakdistribution.peakdens3D(x,1),-20,float(FweThres)-muMax)[0]
    if power>0.90:
        muNinety.append(muMax)
        break
print("The power is sufficient for one region if mu equals: "+str(muNinety[0]))


The power is sufficient for one region if mu equals: 4.33

5. From the required voxel statistic to Cohen's D for a given sample size


In [6]:
# Read in data
Data = pd.read_csv("../SampleSize/neurosynth_study_data.txt",sep=" ",header=None,names=['year','n'])
Data['source']='Tal'
Data=Data[Data.year!=1997] #remove year with 1 entry
David = pd.read_csv("../SampleSize/david_sampsizedata.txt",sep=" ",header=None,names=['year','n'])
David['source']='David'
Data=Data.append(David)

In [7]:
# add detectable effect
Data['deltaSingle']=muSingle[0]/np.sqrt(Data['n'])
Data['deltaNinety']=muNinety[0]/np.sqrt(Data['n'])

In [8]:
# add jitter for figure
stdev = 0.01*(max(Data.year)-min(Data.year))
Data['year_jitter'] = Data.year+np.random.randn(len(Data))*stdev

In [9]:
# Compute medians per year (for smoother)
Medians = pd.DataFrame({'year':
                     np.arange(start=np.min(Data.year),stop=np.max(Data.year)+1),
                      'TalMdSS':'nan',
                     'DavidMdSS':'nan',
                      'TalMdDSingle':'nan',
                     'DavidMdDSingle':'nan',
                      'TalMdDNinety':'nan',
                     'DavidMdDNinety':'nan',
                        'MdSS':'nan',
                        'DSingle':'nan',
                        'DNinety':'nan'
                       })

In [10]:
for yearInd in (range(len(Medians))):
    # Compute medians for Tal's data
    yearBoolTal = np.array([a and b for a,b in zip(Data.source=="Tal",Data.year==Medians.year[yearInd])])
    Medians.TalMdSS[yearInd] = np.median(Data.n[yearBoolTal])
    Medians.TalMdDSingle[yearInd] = np.median(Data.deltaSingle[yearBoolTal])
    Medians.TalMdDNinety[yearInd] = np.median(Data.deltaNinety[yearBoolTal])
    # Compute medians for David's data
    yearBoolDavid = np.array([a and b for a,b in zip(Data.source=="David",Data.year==Medians.year[yearInd])])
    Medians.DavidMdSS[yearInd] = np.median(Data.n[yearBoolDavid])
    Medians.DavidMdDSingle[yearInd] = np.median(Data.deltaSingle[yearBoolDavid])
    Medians.DavidMdDNinety[yearInd] = np.median(Data.deltaNinety[yearBoolDavid])
    # Compute medians for all data
    yearBool = np.array(Data.year==Medians.year[yearInd])
    Medians.MdSS[yearInd] = np.median(Data.n[yearBool])
    Medians.DSingle[yearInd] = np.median(Data.deltaSingle[yearBool])   
    Medians.DNinety[yearInd] = np.median(Data.deltaNinety[yearBool])


/Users/Joke/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:15: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [11]:
Medians[0:5]


Out[11]:
DNinety DSingle DavidMdDNinety DavidMdDSingle DavidMdSS MdSS TalMdDNinety TalMdDSingle TalMdSS year
0 1.36927 1.26491 1.36927 1.26491 10 10 NaN NaN NaN 1996.0
1 1.25324 1.15772 1.25324 1.15772 12 12 NaN NaN NaN 1997.0
2 1.22544 1.13205 1.22544 1.13205 12.5 12.5 NaN NaN NaN 1998.0
3 1.48711 1.37377 1.48711 1.37377 8.5 8.5 NaN NaN NaN 1999.0
4 1.33741 1.23548 1.33741 1.23548 10.5 10.5 NaN NaN NaN 2000.0

In [12]:
# add logscale

Medians['MdSSLog'] = [np.log(x) for x in Medians.MdSS]
Medians['TalMdSSLog'] = [np.log(x) for x in Medians.TalMdSS]
Medians['DavidMdSSLog'] = [np.log(x) for x in Medians.DavidMdSS]

Data['nLog']= [np.log(x) for x in Data.n]

The figure per List (Tal or David)


In [13]:
twocol = cb.qualitative.Paired_12.mpl_colors

fig,axs = plt.subplots(1,2,figsize=(12,5))
fig.subplots_adjust(hspace=.5,wspace=.3)
axs=axs.ravel()
axs[0].plot(Data.year_jitter[Data.source=="Tal"],Data['nLog'][Data.source=="Tal"],"r.",color=twocol[0],alpha=0.5,label="")
axs[0].plot(Data.year_jitter[Data.source=="David"],Data['nLog'][Data.source=="David"],"r.",color=twocol[2],alpha=0.5,label="")
axs[0].plot(Medians.year,Medians.TalMdSSLog,color=twocol[1],lw=3,label="Neurosynth")
axs[0].plot(Medians.year,Medians.DavidMdSSLog,color=twocol[3],lw=3,label="David et al.")
axs[0].set_xlim([1993,2016])
axs[0].set_ylim([0,8])
axs[0].set_xlabel("Year")
axs[0].set_ylabel("Median Sample Size")
axs[0].legend(loc="upper left",frameon=False)
#labels=[1,5,10,20,50,150,500,1000,3000]
labels=[1,4,16,64,256,1024,3000]
axs[0].set_yticks(np.log(labels))
axs[0].set_yticklabels(labels)


axs[1].plot(Data.year_jitter[Data.source=="Tal"],Data.deltaSingle[Data.source=="Tal"],"r.",color=twocol[0],alpha=0.5,label="")
axs[1].plot(Data.year_jitter[Data.source=="David"],Data.deltaSingle[Data.source=="David"],"r.",color=twocol[2],alpha=0.5,label="")
axs[1].plot(Medians.year,Medians.TalMdDSingle,color=twocol[1],lw=3,label="Neurosynth")
axs[1].plot(Medians.year,Medians.DavidMdDSingle,color=twocol[3],lw=3,label="David et al.")
axs[1].set_xlim([1993,2016])
axs[1].set_ylim([0,3])
axs[1].set_xlabel("Year")
axs[1].set_ylabel("Effect Size with 80% power")
axs[1].legend(loc="upper right",frameon=False)
plt.savefig('Figure1.svg',dpi=600)
plt.show()


Print median sample size and power for Neurosynth data


In [14]:
Medians.loc[Medians.year>2010, lambda df: ['year', 'TalMdSS', 'TalMdDSingle']]


Out[14]:
year TalMdSS TalMdDSingle
15 2011.0 20 0.894427
16 2012.0 20 0.894427
17 2013.0 21 0.872872
18 2014.0 22 0.852803
19 2015.0 28.5 0.749355

Compute median of sample sizes over last 5 years, for use in correlation simulation notebook.


In [15]:
yearBoolTal = np.array([a and b for a,b in zip(Data.source=="Tal",Data.year>2010)])
print('Median sample size (2011-2015):',np.median(Data.n[yearBoolTal]))


('Median sample size (2011-2015):', 22.0)

Compute number of single-group studies with sample sizes over 100


In [16]:
for year in range(2011,2016):
    bigstudyBoolTal = np.array([a and b and c for a,b,c in zip(Data.source=="Tal",Data.n>100,Data.year==year)])
    yearBoolTal = np.array([a and b for a,b in zip(Data.source=="Tal",Data.year==year)])


    print(year,np.sum(yearBoolTal),np.sum(Data[bigstudyBoolTal].n>99))


(2011, 85, 0)
(2012, 110, 8)
(2013, 131, 8)
(2014, 126, 8)
(2015, 96, 17)

Load neurosynth group data and compute median group size for 2015


In [17]:
groupData = pd.read_csv("../SampleSize/neurosynth_group_data.txt",sep=" ",header=None,names=['year','n'])
yearBoolGroup=np.array([a for a in groupData.year==year])
print('%d groups found in 2015 (over %d studies)'%(np.sum(yearBoolGroup),np.unique(groupData[yearBoolGroup].index).shape[0]))
print('median group size in 2015: %f'%np.median(groupData[yearBoolGroup].n))


199 groups found in 2015 (over 93 studies)
median group size in 2015: 19.000000

Figure for 90% power


In [20]:
twocol = cb.qualitative.Paired_12.mpl_colors

fig,axs = plt.subplots(1,2,figsize=(12,5))
fig.subplots_adjust(hspace=.5,wspace=.3)
axs=axs.ravel()
axs[0].plot(Data.year_jitter[Data.source=="Tal"],Data['nLog'][Data.source=="Tal"],"r.",color=twocol[0],alpha=0.5,label="")
axs[0].plot(Data.year_jitter[Data.source=="David"],Data['nLog'][Data.source=="David"],"r.",color=twocol[2],alpha=0.5,label="")
axs[0].plot(Medians.year,Medians.TalMdSSLog,color=twocol[1],lw=3,label="Neurosynth")
axs[0].plot(Medians.year,Medians.DavidMdSSLog,color=twocol[3],lw=3,label="David et al.")
axs[0].set_xlim([1993,2016])
axs[0].set_ylim([0,8])
axs[0].set_xlabel("Year")
axs[0].set_ylabel("Median Sample Size")
axs[0].legend(loc="upper left",frameon=False)
#labels=[1,5,10,20,50,150,500,1000,3000]
labels=[1,4,16,64,256,1024,3000]
axs[0].set_yticks(np.log(labels))
axs[0].set_yticklabels(labels)


axs[1].plot(Data.year_jitter[Data.source=="Tal"],Data.deltaNinety[Data.source=="Tal"],"r.",color=twocol[0],alpha=0.5,label="")
axs[1].plot(Data.year_jitter[Data.source=="David"],Data.deltaNinety[Data.source=="David"],"r.",color=twocol[2],alpha=0.5,label="")
axs[1].plot(Medians.year,Medians.TalMdDSingle,color="grey",lw=3,label="80% power")
axs[1].plot(Medians.year,Medians.DavidMdDSingle,color="grey",lw=3,label="")
axs[1].plot(Medians.year,Medians.TalMdDNinety,color=twocol[1],lw=3,label="Neurosynth, 90% power")
axs[1].plot(Medians.year,Medians.DavidMdDNinety,color=twocol[3],lw=3,label="David et al., 90% power")
axs[1].set_xlim([1993,2016])
axs[1].set_ylim([0,3])
axs[1].set_xlabel("Year")
axs[1].set_ylabel("Effect Size with 80% power")
axs[1].legend(loc="upper right",frameon=False)
plt.savefig('Figure_8090.svg',dpi=600)
plt.show()