Figure: How large should effect sizes be in neuroimaging to have sufficient power?

Specification of alternative

In a brain map in an MNI template, with smoothness of 3 times the voxelsize, there is one active region with voxelwise effect size D. The (spatial) size of the region is relatively small (<200 voxels). We want to know how large D should be in order to have 80% power to detect the region using voxelwise FWE thresholding using Random Field Theory. Detect the region means that the maximum in the activated area exceeds the significance threshold.

Strategy

Compute the voxelwise threshold for the specified smoothness and volume
- FweThres = 5.12
Define the alternative hypothesis, so that the omnibus power is 80%
How large should the maximum statistic in a (small) region be to exceed the voxelwise threshold with 0.8 power?
- muMax = 4.00
How does this voxel statistic translate to Cohen's D for a given sample size?
- See Figure



In [1]:

    
% matplotlib inline
from __future__ import division
import os
import nibabel as nib
import numpy as np
from neuropower import peakdistribution
import scipy.integrate as integrate
import pandas as pd
import matplotlib.pyplot as plt
import palettable.colorbrewer as cb

if not 'FSLDIR' in os.environ.keys():
    raise Exception('This notebook requires that FSL is installed and the FSLDIR environment variable is set')

1. What is the voxelwise threshold?



In [2]:

    
# From smoothness + mask to ReselCount

FWHM = 3
ReselSize = FWHM**3
MNI_mask = nib.load(os.path.join(os.getenv('FSLDIR'),'data/standard/MNI152_T1_2mm_brain_mask.nii.gz')).get_data()
Volume = np.sum(MNI_mask)
ReselCount = Volume/ReselSize

print("ReselSize: "+str(ReselSize))
print("Volume: "+str(Volume))
print("ReselCount: "+str(ReselCount))
print("------------")

# From ReselCount to FWE treshold

FweThres_cmd = 'ptoz 0.05 -g %s' %ReselCount
FweThres = os.popen(FweThres_cmd).read()

print("FWE voxelwise GRF threshold: "+str(FweThres))









    



ReselSize: 27
Volume: 228483
ReselCount: 8462.33333333
------------
FWE voxelwise GRF threshold: 5.123062

2. Definition of alternative

Detect 1 region

We define a 'success' as a situation in which the maximum in the active field exceeds the threshold.



In [3]:

    
Power = 0.8

3. How large statistic in a field be to exceed the threshold with power 0.80?

We quantify this by computing the expected local maximum in the field (which is a null field elevated by value D). We use the distribution of local maxima of Cheng&Schwartzman to compute the power/effect size.



In [4]:

    
muRange = np.arange(1.8,5,0.01)
muSingle = []
for muMax in muRange:
# what is the power to detect a maximum 
    power = 1-integrate.quad(lambda x:peakdistribution.peakdens3D(x,1),-20,float(FweThres)-muMax)[0]
    if power>Power:
        muSingle.append(muMax)
        break
print("The power is sufficient for one region if mu equals: "+str(muSingle[0]))









    



The power is sufficient for one region if mu equals: 4.0

extra analysis: power 0.90



In [5]:

    
muRange = np.arange(1.8,5,0.01)
muNinety = []
for muMax in muRange:
# what is the power to detect a maximum 
    power = 1-integrate.quad(lambda x:peakdistribution.peakdens3D(x,1),-20,float(FweThres)-muMax)[0]
    if power>0.90:
        muNinety.append(muMax)
        break
print("The power is sufficient for one region if mu equals: "+str(muNinety[0]))









    



The power is sufficient for one region if mu equals: 4.33

5. From the required voxel statistic to Cohen's D for a given sample size



In [6]:

    
# Read in data
Data = pd.read_csv("../SampleSize/neurosynth_study_data.txt",sep=" ",header=None,names=['year','n'])
Data['source']='Tal'
Data=Data[Data.year!=1997] #remove year with 1 entry
David = pd.read_csv("../SampleSize/david_sampsizedata.txt",sep=" ",header=None,names=['year','n'])
David['source']='David'
Data=Data.append(David)



In [7]:

    
# add detectable effect
Data['deltaSingle']=muSingle[0]/np.sqrt(Data['n'])
Data['deltaNinety']=muNinety[0]/np.sqrt(Data['n'])



In [8]:

    
# add jitter for figure
stdev = 0.01*(max(Data.year)-min(Data.year))
Data['year_jitter'] = Data.year+np.random.randn(len(Data))*stdev



In [9]:

    
# Compute medians per year (for smoother)
Medians = pd.DataFrame({'year':
                     np.arange(start=np.min(Data.year),stop=np.max(Data.year)+1),
                      'TalMdSS':'nan',
                     'DavidMdSS':'nan',
                      'TalMdDSingle':'nan',
                     'DavidMdDSingle':'nan',
                      'TalMdDNinety':'nan',
                     'DavidMdDNinety':'nan',
                        'MdSS':'nan',
                        'DSingle':'nan',
                        'DNinety':'nan'
                       })



In [10]:

    
for yearInd in (range(len(Medians))):
    # Compute medians for Tal's data
    yearBoolTal = np.array([a and b for a,b in zip(Data.source=="Tal",Data.year==Medians.year[yearInd])])
    Medians.TalMdSS[yearInd] = np.median(Data.n[yearBoolTal])
    Medians.TalMdDSingle[yearInd] = np.median(Data.deltaSingle[yearBoolTal])
    Medians.TalMdDNinety[yearInd] = np.median(Data.deltaNinety[yearBoolTal])
    # Compute medians for David's data
    yearBoolDavid = np.array([a and b for a,b in zip(Data.source=="David",Data.year==Medians.year[yearInd])])
    Medians.DavidMdSS[yearInd] = np.median(Data.n[yearBoolDavid])
    Medians.DavidMdDSingle[yearInd] = np.median(Data.deltaSingle[yearBoolDavid])
    Medians.DavidMdDNinety[yearInd] = np.median(Data.deltaNinety[yearBoolDavid])
    # Compute medians for all data
    yearBool = np.array(Data.year==Medians.year[yearInd])
    Medians.MdSS[yearInd] = np.median(Data.n[yearBool])
    Medians.DSingle[yearInd] = np.median(Data.deltaSingle[yearBool])   
    Medians.DNinety[yearInd] = np.median(Data.deltaNinety[yearBool])









    



/Users/Joke/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:15: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Joke/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [11]:

    
Medians[0:5]









    Out[11]:






  
    
      
      DNinety
      DSingle
      DavidMdDNinety
      DavidMdDSingle
      DavidMdSS
      MdSS
      TalMdDNinety
      TalMdDSingle
      TalMdSS
      year
    
  
  
    
      0
      1.36927
      1.26491
      1.36927
      1.26491
      10
      10
      NaN
      NaN
      NaN
      1996.0
    
    
      1
      1.25324
      1.15772
      1.25324
      1.15772
      12
      12
      NaN
      NaN
      NaN
      1997.0
    
    
      2
      1.22544
      1.13205
      1.22544
      1.13205
      12.5
      12.5
      NaN
      NaN
      NaN
      1998.0
    
    
      3
      1.48711
      1.37377
      1.48711
      1.37377
      8.5
      8.5
      NaN
      NaN
      NaN
      1999.0
    
    
      4
      1.33741
      1.23548
      1.33741
      1.23548
      10.5
      10.5
      NaN
      NaN
      NaN
      2000.0



In [12]:

    
# add logscale

Medians['MdSSLog'] = [np.log(x) for x in Medians.MdSS]
Medians['TalMdSSLog'] = [np.log(x) for x in Medians.TalMdSS]
Medians['DavidMdSSLog'] = [np.log(x) for x in Medians.DavidMdSS]

Data['nLog']= [np.log(x) for x in Data.n]

The figure per List (Tal or David)



In [13]:

    
twocol = cb.qualitative.Paired_12.mpl_colors

fig,axs = plt.subplots(1,2,figsize=(12,5))
fig.subplots_adjust(hspace=.5,wspace=.3)
axs=axs.ravel()
axs[0].plot(Data.year_jitter[Data.source=="Tal"],Data['nLog'][Data.source=="Tal"],"r.",color=twocol[0],alpha=0.5,label="")
axs[0].plot(Data.year_jitter[Data.source=="David"],Data['nLog'][Data.source=="David"],"r.",color=twocol[2],alpha=0.5,label="")
axs[0].plot(Medians.year,Medians.TalMdSSLog,color=twocol[1],lw=3,label="Neurosynth")
axs[0].plot(Medians.year,Medians.DavidMdSSLog,color=twocol[3],lw=3,label="David et al.")
axs[0].set_xlim([1993,2016])
axs[0].set_ylim([0,8])
axs[0].set_xlabel("Year")
axs[0].set_ylabel("Median Sample Size")
axs[0].legend(loc="upper left",frameon=False)
#labels=[1,5,10,20,50,150,500,1000,3000]
labels=[1,4,16,64,256,1024,3000]
axs[0].set_yticks(np.log(labels))
axs[0].set_yticklabels(labels)


axs[1].plot(Data.year_jitter[Data.source=="Tal"],Data.deltaSingle[Data.source=="Tal"],"r.",color=twocol[0],alpha=0.5,label="")
axs[1].plot(Data.year_jitter[Data.source=="David"],Data.deltaSingle[Data.source=="David"],"r.",color=twocol[2],alpha=0.5,label="")
axs[1].plot(Medians.year,Medians.TalMdDSingle,color=twocol[1],lw=3,label="Neurosynth")
axs[1].plot(Medians.year,Medians.DavidMdDSingle,color=twocol[3],lw=3,label="David et al.")
axs[1].set_xlim([1993,2016])
axs[1].set_ylim([0,3])
axs[1].set_xlabel("Year")
axs[1].set_ylabel("Effect Size with 80% power")
axs[1].legend(loc="upper right",frameon=False)
plt.savefig('Figure1.svg',dpi=600)
plt.show()

Print median sample size and power for Neurosynth data



In [14]:

    
Medians.loc[Medians.year>2010, lambda df: ['year', 'TalMdSS', 'TalMdDSingle']]









    Out[14]:






  
    
      
      year
      TalMdSS
      TalMdDSingle
    
  
  
    
      15
      2011.0
      20
      0.894427
    
    
      16
      2012.0
      20
      0.894427
    
    
      17
      2013.0
      21
      0.872872
    
    
      18
      2014.0
      22
      0.852803
    
    
      19
      2015.0
      28.5
      0.749355

Compute median of sample sizes over last 5 years, for use in correlation simulation notebook.



In [15]:

    
yearBoolTal = np.array([a and b for a,b in zip(Data.source=="Tal",Data.year>2010)])
print('Median sample size (2011-2015):',np.median(Data.n[yearBoolTal]))









    



('Median sample size (2011-2015):', 22.0)

Compute number of single-group studies with sample sizes over 100



In [16]:

    
for year in range(2011,2016):
    bigstudyBoolTal = np.array([a and b and c for a,b,c in zip(Data.source=="Tal",Data.n>100,Data.year==year)])
    yearBoolTal = np.array([a and b for a,b in zip(Data.source=="Tal",Data.year==year)])


    print(year,np.sum(yearBoolTal),np.sum(Data[bigstudyBoolTal].n>99))









    



(2011, 85, 0)
(2012, 110, 8)
(2013, 131, 8)
(2014, 126, 8)
(2015, 96, 17)

Load neurosynth group data and compute median group size for 2015



In [17]:

    
groupData = pd.read_csv("../SampleSize/neurosynth_group_data.txt",sep=" ",header=None,names=['year','n'])
yearBoolGroup=np.array([a for a in groupData.year==year])
print('%d groups found in 2015 (over %d studies)'%(np.sum(yearBoolGroup),np.unique(groupData[yearBoolGroup].index).shape[0]))
print('median group size in 2015: %f'%np.median(groupData[yearBoolGroup].n))









    



199 groups found in 2015 (over 93 studies)
median group size in 2015: 19.000000

Figure for 90% power



In [20]:

    
twocol = cb.qualitative.Paired_12.mpl_colors

fig,axs = plt.subplots(1,2,figsize=(12,5))
fig.subplots_adjust(hspace=.5,wspace=.3)
axs=axs.ravel()
axs[0].plot(Data.year_jitter[Data.source=="Tal"],Data['nLog'][Data.source=="Tal"],"r.",color=twocol[0],alpha=0.5,label="")
axs[0].plot(Data.year_jitter[Data.source=="David"],Data['nLog'][Data.source=="David"],"r.",color=twocol[2],alpha=0.5,label="")
axs[0].plot(Medians.year,Medians.TalMdSSLog,color=twocol[1],lw=3,label="Neurosynth")
axs[0].plot(Medians.year,Medians.DavidMdSSLog,color=twocol[3],lw=3,label="David et al.")
axs[0].set_xlim([1993,2016])
axs[0].set_ylim([0,8])
axs[0].set_xlabel("Year")
axs[0].set_ylabel("Median Sample Size")
axs[0].legend(loc="upper left",frameon=False)
#labels=[1,5,10,20,50,150,500,1000,3000]
labels=[1,4,16,64,256,1024,3000]
axs[0].set_yticks(np.log(labels))
axs[0].set_yticklabels(labels)


axs[1].plot(Data.year_jitter[Data.source=="Tal"],Data.deltaNinety[Data.source=="Tal"],"r.",color=twocol[0],alpha=0.5,label="")
axs[1].plot(Data.year_jitter[Data.source=="David"],Data.deltaNinety[Data.source=="David"],"r.",color=twocol[2],alpha=0.5,label="")
axs[1].plot(Medians.year,Medians.TalMdDSingle,color="grey",lw=3,label="80% power")
axs[1].plot(Medians.year,Medians.DavidMdDSingle,color="grey",lw=3,label="")
axs[1].plot(Medians.year,Medians.TalMdDNinety,color=twocol[1],lw=3,label="Neurosynth, 90% power")
axs[1].plot(Medians.year,Medians.DavidMdDNinety,color=twocol[3],lw=3,label="David et al., 90% power")
axs[1].set_xlim([1993,2016])
axs[1].set_ylim([0,3])
axs[1].set_xlabel("Year")
axs[1].set_ylabel("Effect Size with 80% power")
axs[1].legend(loc="upper right",frameon=False)
plt.savefig('Figure_8090.svg',dpi=600)
plt.show()

	DNinety	DSingle	DavidMdDNinety	DavidMdDSingle	DavidMdSS	MdSS	TalMdDNinety	TalMdDSingle	TalMdSS	year
0	1.36927	1.26491	1.36927	1.26491	10	10	NaN	NaN	NaN	1996.0
1	1.25324	1.15772	1.25324	1.15772	12	12	NaN	NaN	NaN	1997.0
2	1.22544	1.13205	1.22544	1.13205	12.5	12.5	NaN	NaN	NaN	1998.0
3	1.48711	1.37377	1.48711	1.37377	8.5	8.5	NaN	NaN	NaN	1999.0
4	1.33741	1.23548	1.33741	1.23548	10.5	10.5	NaN	NaN	NaN	2000.0

	year	TalMdSS	TalMdDSingle
15	2011.0	20	0.894427
16	2012.0	20	0.894427
17	2013.0	21	0.872872
18	2014.0	22	0.852803
19	2015.0	28.5	0.749355