Example notebook: Structure with pop assignments

This notebook shows how to use the ipyrad.analysis toolkit to generate structure input files that use population information.

Required software


In [1]:
# conda install ipyrad -c ipyrad
# conda install structure clumpp -c ipyrad
# conda install toytree -c eaton-lab

In [2]:
import ipyrad.analysis as ipa
import toyplot

Create a structure analysis object

If you include a 'mapfile' then we will use locus information to subsample just a single SNP from each locus so that the resulting data file will meet the expectations of structure that SNPs are "unlinked". If you create multiple replicates files using different random seeds then different SNPs will be selected in each rep.


In [3]:
s = ipa.structure(
        name="test", 
        workdir="analysis-structure",
        data="analysis-ipyrad/ped_min10_outfiles/ped_min10.str",
        mapfile="analysis-ipyrad/ped_min10_outfiles/ped_min10.snps.map",
    )

Set params for the structure analysis

These values are used to generate the "mainparams" and "extraparams" files for structure.


In [4]:
## set run parameters (you probably want to run >10X this long)
s.mainparams.burnin = 1000
s.mainparams.numreps = 5000

## tell structure to expect popdata & popflag
s.mainparams.popdata = 1
s.mainparams.popflag = 1

## print all mainparams
s.mainparams


Out[4]:
burnin             1000                
extracols          0                   
label              1                   
locdata            0                   
mapdistances       0                   
markernames        0                   
markovphase        0                   
missing            -9                  
notambiguous       -999                
numreps            5000                
onerowperind       0                   
phased             0                   
phaseinfo          0                   
phenotype          0                   
ploidy             2                   
popdata            1                   
popflag            1                   
recessivealleles   0                   

In [5]:
## tell structure to use popinfo
s.extraparams.usepopinfo = 1

## print all other extraparams
s.extraparams


Out[5]:
admburnin           500                 
alpha               1.0                 
alphamax            10.0                
alphapriora         1.0                 
alphapriorb         2.0                 
alphapropsd         0.025               
ancestdist          0                   
ancestpint          0.9                 
computeprob         1                   
echodata            0                   
fpriormean          0.01                
fpriorsd            0.05                
freqscorr           1                   
gensback            2                   
inferalpha          1                   
inferlambda         0                   
intermedsave        0                   
lambda_             1.0                 
linkage             0                   
locispop            0                   
locprior            0                   
locpriorinit        1.0                 
log10rmax           1.0                 
log10rmin           -4.0                
log10rpropsd        0.1                 
log10rstart         -2.0                
maxlocprior         20.0                
metrofreq           10                  
migrprior           0.01                
noadmix             0                   
numboxes            1000                
onefst              0                   
pfrompopflagonly    0                   
popalphas           0                   
popspecificlambda   0                   
printlambda         1                   
printlikes          0                   
printnet            1                   
printqhat           0                   
printqsum           1                   
randomize           0                   
reporthitrate       0                   
seed                12345               
sitebysite          0                   
startatpopinfo      0                   
unifprioralpha      1                   
updatefreq          10000               
usepopinfo          1                   

By default the 'header' of the str file is empty


In [6]:
s.header


Out[6]:
labels popdata popflag locdata phenotype
0 29154_superba
1 30556_thamno
2 30686_cyathophylla
3 32082_przewalskii
4 33413_thamno
5 33588_przewalskii
6 35236_rex
7 35855_rex
8 38362_rex
9 39618_rex
10 40578_rex
11 41478_cyathophylloides
12 41954_cyathophylloides

You can fill it in by filling the header attribute lists

popdata is the a priori population assignment of an individual to a population. Assignments should be non-zero integers (e.g., 1, 2, 3). Zero is reserved to mean that there is no a priori assignment. popflag indicates whether or not to use the population assignment in the analysis (1) or to leave it to be inferred (0). So in the example below seven samples have assigned populations (popflag=1), and six samples will have their population assignments inferred (popflag=0). The popdata information will only be used for the seven individuals with assigned pops.


In [7]:
## assign popdata and popflag by entering a list of values
s.popdata = [1, 3, 1, 2, 3, 2, 3, 3, 3, 3, 3, 1, 1]
s.popflag = [1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

## print the header information
s.header


Out[7]:
labels popdata popflag locdata phenotype
0 29154_superba 1 1
1 30556_thamno 3 1
2 30686_cyathophylla 1 0
3 32082_przewalskii 2 0
4 33413_thamno 3 0
5 33588_przewalskii 2 0
6 35236_rex 3 0
7 35855_rex 3 0
8 38362_rex 3 1
9 39618_rex 3 1
10 40578_rex 3 1
11 41478_cyathophylloides 1 1
12 41954_cyathophylloides 1 1

Write the input files and run STRUCTURE

This will write the .str file (and subsample SNPs if you included a mapfile) with the header information included, and it will write a mainparams and extraparams file with the parameter settings that we entered above.


In [8]:
## writes three input files and then call structure yourself
s.write_structure_files(kpop=3)


Out[8]:
('/home/deren/Documents/ipyrad/tests/analysis-structure/tmp-test-3-1.mainparams.txt',
 '/home/deren/Documents/ipyrad/tests/analysis-structure/tmp-test-3-1.extraparams.txt',
 '/home/deren/Documents/ipyrad/tests/analysis-structure/tmp-test-3-1.strfile.txt')

Or, submit jobs directly to the cluster

If you start an ipcluster instance (see other tutorials) you can submit structure jobs directly to the cluster and easily collect the results, like below.


In [9]:
## connect to a running ipcluster instance
import ipyparallel as ipp
ipyclient = ipp.Client()
print "connected to {} cores".format(len(ipyclient))


connected to 40 cores

In [10]:
## submit job replicates to ipyclient
s.run(kpop=3, nreps=3, ipyclient=ipyclient)


submitted 3 structure jobs [test-K-3]

In [11]:
## wait for jobs to finish
ipyclient.wait()


Out[11]:
True

Collect results and plot


In [12]:
## get table of summarized results
table = s.get_clumpp_table(3)


mean scores across 3 replicates.

In [13]:
## reorder table by membership in groups
table.sort_values(by=[0, 1, 2], inplace=True)

In [14]:
## build barplot
canvas = toyplot.Canvas(width=500, height=250)
axes = canvas.cartesian(bounds=("10%", "90%", "10%", "45%"))
axes.bars(table)

## add labels to x-axis
ticklabels = [i for i in table.index.tolist()]
axes.x.ticks.locator = toyplot.locator.Explicit(labels=ticklabels)
axes.x.ticks.labels.angle = -60
axes.x.ticks.show = True
axes.x.ticks.labels.offset = 10
axes.x.ticks.labels.style = {"font-size": "12px"}


30556_thamno33413_thamno35236_rex35855_rex38362_rex39618_rex40578_rex32082_przewalskii33588_przewalskii29154_superba30686_cyathophylla41478_cyathophylloides41954_cyathophylloides0.00.51.0