Single-step Bayesian Regression (Incomplete Genomic Data)

In [48]:
using JWAS,JWAS.Datasets,DataFrames,CSV, LinearAlgebra

In [49]:
phenofile  = Datasets.dataset("example","phenotypes_ssbr.txt")
pedfile    = Datasets.dataset("example","pedigree.txt")
genofile   = Datasets.dataset("example","genotypes.txt")

phenotypes = CSV.read(phenofile,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);


The delimiter in pedigree.txt is ','.
Pedigree informatin:
#individuals: 12
#sires:       4
#dams:        5
#founders:    3

In [51]:
first(phenotypes,5)


Out[51]:

5 rows × 9 columns

IDy1y2y3x1x2x3damweights
StringFloat64Float64Float64Float64Int64StringString⍰Float64
1a1-0.063.58-1.180.92mmissing1.0
2a2-0.64.90.880.31fmissing1.0
3a3-2.073.190.730.72fmissing1.0
4a4-2.636.97-0.830.61ma21.0
5a52.313.5-1.520.42ma21.0
Single-trait Single-step Bayesian Regression (Incomplete Genomic Data)

In [52]:
model_equation1  ="y1 = intercept + x1*x3 + x2 + x3 + ID + dam";

In [53]:
model1 = build_model(model_equation1);

In [54]:
set_covariate(model1,"x1");

In [55]:
set_random(model1,"x2");
set_random(model1,"ID dam",pedigree);

In [56]:
add_genotypes(model1,genofile,separator=',');


The delimiter in genotypes.txt is ','.
The header (marker IDs) is provided in genotypes.txt.
5 markers on 7 individuals were added.

In [57]:
out1=runMCMC(model1,phenotypes,methods="RR-BLUP",single_step_analysis=true,pedigree=pedigree);


Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
The number of observations with both phenotype and pedigree information used in the analysis is 8.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
calculating A inverse
  0.000082 seconds (205 allocations: 16.031 KiB)
imputing missing genotypes
  0.195236 seconds (190 allocations: 23.781 KiB, 99.87% gc time)
completed imputing genotypes
Missing values are found in independent variables: dam.

The prior for marker effects variance is calculated from the genetic variance and π.
The mean of the prior for the marker effects variance is: 0.496268



A Linear Mixed Model was build using model equations:

y1 = intercept + x1*x3 + x2 + x3 + ID + dam

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1*x3           interaction  fixed                2
x2              factor       random               2
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
ϵ               factor       random               5
J               covariate    fixed                1

MCMC Information:

methods                                     RR-BLUP
                            incomplete genomic data
                       (i.e., single-step analysis)
chain_length                                    100
burnin                                            0
estimatePi                                    false
estimateScale                                 false
starting_value                                false
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2):              [1.008]
random effect variances (y1:ID,y1:dam): [1.008 0.0; 0.0 1.008]
random effect variances (y1:ϵ): [1.0080000162124634]
residual variances:                           1.008
genetic variances (genomic):                  1.008
marker effect variances:                      0.496
π                                               0.0

Degree of freedom for hyper-parameters:

residual variances:                           4.000
random effect variances:                      5.000
random effect variances:                      5.000
polygenic effect variances:                   6.000
marker effect variances:                      4.000



The file MCMC_samples_residual_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_polygenic_effects_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_pi.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.J.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ϵ.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.x2_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ID_y1.dam_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ϵ_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y1.txt already exists!!! It is overwritten by the new output.


The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.



In [58]:
keys(out1)


Out[58]:
Base.KeySet for a Dict{Any,Any} with 6 entries. Keys:
  "marker effects"
  "EBV_y1"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "marker effects variance"

In [59]:
out1["EBV_y1"]


Out[59]:

7 rows × 3 columns

IDEBVPEV
AnyAnyAny
1a10.3082666.46324
2a3-0.9927176.05282
3a4-0.3996137.90331
4a50.6714014.17104
5a60.03883377.78589
6a7-1.417566.00083
7a8-0.1665788.37988

In [60]:
out1["marker effects"]


Out[60]:

5 rows × 5 columns

TraitMarker_IDEstimateStd_ErrorModel_Frequency
AnyAnyAnyAnyAny
1y1m1-0.1127870.5359231.0
2y1m2-0.2564120.6705661.0
3y1m30.2721580.6073861.0
4y1m4-0.2752050.5638261.0
5y1m5-0.001330830.5350761.0
Multi-trait Single-step Bayesian Regression (Incomplete Genomic Data)

In [61]:
model_equation2 ="y1 = intercept + x1 + x3 + ID + dam
                  y2 = intercept + x1 + x2 + x3 + ID
                  y3 = intercept + x1 + x1*x3 + x2 + ID";

In [62]:
model2 = build_model(model_equation2);

In [63]:
set_covariate(model2,"x1");

In [64]:
set_random(model2,"x2");
set_random(model2,"ID dam",pedigree);


x2 is not found in model equation 1.
dam is not found in model equation 2.
dam is not found in model equation 3.

In [65]:
add_genotypes(model2,genofile,separator=',');


The delimiter in genotypes.txt is ','.
The header (marker IDs) is provided in genotypes.txt.
5 markers on 7 individuals were added.

In [66]:
out2=runMCMC(model2,phenotypes,methods="BayesC",estimatePi=true,single_step_analysis=true,pedigree=pedigree);


Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
The number of observations with both phenotype and pedigree information used in the analysis is 8.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
calculating A inverse
  0.000049 seconds (205 allocations: 16.031 KiB)
imputing missing genotypes
  0.124649 seconds (190 allocations: 23.781 KiB, 99.87% gc time)
completed imputing genotypes
Missing values are found in independent variables: dam.

Pi (Π) is not provided.
Pi (Π) is generated assuming all markers have effects on all traits.

The prior for marker effects covariance matrix is calculated from genetic covariance matrix and Π.
The mean of the prior for the marker effects covariance matrix is:
 0.496268  0.0       0.0     
 0.0       0.431625  0.0     
 0.0       0.0       0.114775



A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x3 + ID + dam
y2 = intercept + x1 + x2 + x3 + ID
y3 = intercept + x1 + x1*x3 + x2 + ID

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
x2              factor       random               2
x1*x3           interaction  fixed                2
ϵ               factor       random               5
J               covariate    fixed                1

MCMC Information:

methods                                      BayesC
                            incomplete genomic data
                       (i.e., single-step analysis)
chain_length                                    100
burnin                                            0
estimatePi                                     true
estimateScale                                 false
starting_value                                false
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y2:x2,y3:x2):
 0.876  0.0  
 0.0    0.233
random effect variances (y1:ID,y2:ID,y3:ID,y1:dam):
 1.008  0.0    0.0    0.0  
 0.0    0.876  0.0    0.0  
 0.0    0.0    0.233  0.0  
 0.0    0.0    0.0    1.008
random effect variances (y1:ϵ,y2:ϵ,y3:ϵ):
 1.008f0  0.0f0    0.0f0  
 0.0f0    0.876f0  0.0f0  
 0.0f0    0.0f0    0.233f0
residual variances:           
 1.008  0.0    0.0  
 0.0    0.876  0.0  
 0.0    0.0    0.233
genetic variances (polygenic):
 1.008  0.0    0.0    0.0  
 0.0    0.876  0.0    0.0  
 0.0    0.0    0.233  0.0  
 0.0    0.0    0.0    1.008
genetic variances (genomic):  
 1.008  0.0    0.0  
 0.0    0.876  0.0  
 0.0    0.0    0.233
marker effect variances:      
 0.496  0.0    0.0  
 0.0    0.432  0.0  
 0.0    0.0    0.115

Π: (Y(yes):included; N(no):excluded)

["y1", "y2", "y3"]         probability
["Y", "Y", "N"]                 0.0
["N", "N", "N"]                 0.0
["Y", "N", "N"]                 0.0
["N", "Y", "Y"]                 0.0
["Y", "N", "Y"]                 0.0
["N", "N", "Y"]                 0.0
["Y", "Y", "Y"]                 1.0
["N", "Y", "N"]                 0.0

Degree of freedom for hyper-parameters:

residual variances:                           7.000
random effect variances:                      6.000
random effect variances:                      7.000
polygenic effect variances:                   8.000
marker effect variances:                      7.000



The file MCMC_samples_residual_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_polygenic_effects_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y2.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y3.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_pi.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.J.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y2.J.txt is created to save MCMC samples for y2:J.
The file MCMC_samples_y3.J.txt is created to save MCMC samples for y3:J.
The file MCMC_samples_y1.ϵ.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y2.ϵ.txt is created to save MCMC samples for y2:ϵ.
The file MCMC_samples_y3.ϵ.txt is created to save MCMC samples for y3:ϵ.
The file MCMC_samples_y2.x2_y3.x2_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ID_y2.ID_y3.ID_y1.dam_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ϵ_y2.ϵ_y3.ϵ_variances.txt is created to save MCMC samples for y1:ϵ_y2:ϵ_y3:ϵ_variances.
The file MCMC_samples_EBV_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y2.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y3.txt already exists!!! It is overwritten by the new output.
running MCMC for BayesC...100%|█████████████████████████| Time: 0:00:01

The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.



In [67]:
keys(out2)


Out[67]:
Base.KeySet for a Dict{Any,Any} with 9 entries. Keys:
  "marker effects"
  "EBV_y2"
  "EBV_y1"
  "Pi"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "EBV_y3"
  "marker effects variance"

In [68]:
out1["location parameters"]


Out[68]:

37 rows × 5 columns

TraitEffectLevelEstimateStd_Error
AnyAnyAnyAnyAny
1y1interceptintercept-3.572662.6059
2y1x1*x3x1 * m-4.199066.89503
3y1x1*x3x1 * f0.4797280.560252
4y1x22-0.05295230.916848
5y1x21-0.07127170.674382
6y1x3m5.862293.02209
7y1x3f2.341393.67728
8y1IDa20.060830.990601
9y1IDa10.1370810.945244
10y1IDa3-0.4695581.20428
11y1IDa7-0.4792491.10237
12y1IDa4-0.04082580.96728
13y1IDa6-0.08030281.06643
14y1IDa90.08850071.02335
15y1IDa50.1383240.989212
16y1IDa10-0.4516770.993673
17y1IDa12-0.1082761.06282
18y1IDa11-0.1917311.16108
19y1IDa80.01238020.881637
20y1dama2-0.07858360.907202
21y1dama10.2057051.08132
22y1dama3-0.4715631.34062
23y1dama7-0.7400430.951426
24y1dama40.06067741.08693
25y1dama60.006983661.27143
26y1dama90.1654791.12386
27y1dama50.04035781.21389
28y1dama10-0.2265811.18702
29y1dama120.04427611.1841
30y1dama11-0.1175481.27627
&vellip&vellip&vellip&vellip&vellip&vellip