Single-step Bayesian Regression (Incomplete Genomic Data)



In [48]:

    
using JWAS,JWAS.Datasets,DataFrames,CSV, LinearAlgebra



In [49]:

    
phenofile  = Datasets.dataset("example","phenotypes_ssbr.txt")
pedfile    = Datasets.dataset("example","pedigree.txt")
genofile   = Datasets.dataset("example","genotypes.txt")

phenotypes = CSV.read(phenofile,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);









    



The delimiter in pedigree.txt is ','.
Pedigree informatin:
#individuals: 12
#sires:       4
#dams:        5
#founders:    3



In [51]:

    
first(phenotypes,5)









    Out[51]:




5 rows × 9 columns
ID y1 y2 y3 x1 x2 x3 dam weights
String Float64 Float64 Float64 Float64 Int64 String String⍰ Float64
1 a1 -0.06 3.58 -1.18 0.9 2 m missing 1.0
2 a2 -0.6 4.9 0.88 0.3 1 f missing 1.0
3 a3 -2.07 3.19 0.73 0.7 2 f missing 1.0
4 a4 -2.63 6.97 -0.83 0.6 1 m a2 1.0
5 a5 2.31 3.5 -1.52 0.4 2 m a2 1.0

Single-trait Single-step Bayesian Regression (Incomplete Genomic Data)



In [52]:

    
model_equation1  ="y1 = intercept + x1*x3 + x2 + x3 + ID + dam";



In [53]:

    
model1 = build_model(model_equation1);



In [54]:

    
set_covariate(model1,"x1");



In [55]:

    
set_random(model1,"x2");
set_random(model1,"ID dam",pedigree);



In [56]:

    
add_genotypes(model1,genofile,separator=',');









    



The delimiter in genotypes.txt is ','.
The header (marker IDs) is provided in genotypes.txt.
5 markers on 7 individuals were added.



In [57]:

    
out1=runMCMC(model1,phenotypes,methods="RR-BLUP",single_step_analysis=true,pedigree=pedigree);









    



Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
The number of observations with both phenotype and pedigree information used in the analysis is 8.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
calculating A inverse
  0.000082 seconds (205 allocations: 16.031 KiB)
imputing missing genotypes
  0.195236 seconds (190 allocations: 23.781 KiB, 99.87% gc time)
completed imputing genotypes
Missing values are found in independent variables: dam.

The prior for marker effects variance is calculated from the genetic variance and π.
The mean of the prior for the marker effects variance is: 0.496268



A Linear Mixed Model was build using model equations:

y1 = intercept + x1*x3 + x2 + x3 + ID + dam

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1*x3           interaction  fixed                2
x2              factor       random               2
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
ϵ               factor       random               5
J               covariate    fixed                1

MCMC Information:

methods                                     RR-BLUP
                            incomplete genomic data
                       (i.e., single-step analysis)
chain_length                                    100
burnin                                            0
estimatePi                                    false
estimateScale                                 false
starting_value                                false
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2):              [1.008]
random effect variances (y1:ID,y1:dam): [1.008 0.0; 0.0 1.008]
random effect variances (y1:ϵ): [1.0080000162124634]
residual variances:                           1.008
genetic variances (genomic):                  1.008
marker effect variances:                      0.496
π                                               0.0

Degree of freedom for hyper-parameters:

residual variances:                           4.000
random effect variances:                      5.000
random effect variances:                      5.000
polygenic effect variances:                   6.000
marker effect variances:                      4.000



The file MCMC_samples_residual_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_polygenic_effects_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_pi.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.J.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ϵ.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.x2_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ID_y1.dam_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ϵ_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y1.txt already exists!!! It is overwritten by the new output.


The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.



In [58]:

    
keys(out1)









    Out[58]:





Base.KeySet for a Dict{Any,Any} with 6 entries. Keys:
  "marker effects"
  "EBV_y1"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "marker effects variance"



In [59]:

    
out1["EBV_y1"]









    Out[59]:




7 rows × 3 columns
ID EBV PEV
Any Any Any
1 a1 0.308266 6.46324
2 a3 -0.992717 6.05282
3 a4 -0.399613 7.90331
4 a5 0.671401 4.17104
5 a6 0.0388337 7.78589
6 a7 -1.41756 6.00083
7 a8 -0.166578 8.37988



In [60]:

    
out1["marker effects"]









    Out[60]:




5 rows × 5 columns
Trait Marker_ID Estimate Std_Error Model_Frequency
Any Any Any Any Any
1 y1 m1 -0.112787 0.535923 1.0
2 y1 m2 -0.256412 0.670566 1.0
3 y1 m3 0.272158 0.607386 1.0
4 y1 m4 -0.275205 0.563826 1.0
5 y1 m5 -0.00133083 0.535076 1.0

Multi-trait Single-step Bayesian Regression (Incomplete Genomic Data)



In [61]:

    
model_equation2 ="y1 = intercept + x1 + x3 + ID + dam
                  y2 = intercept + x1 + x2 + x3 + ID
                  y3 = intercept + x1 + x1*x3 + x2 + ID";



In [62]:

    
model2 = build_model(model_equation2);



In [63]:

    
set_covariate(model2,"x1");



In [64]:

    
set_random(model2,"x2");
set_random(model2,"ID dam",pedigree);









    



x2 is not found in model equation 1.
dam is not found in model equation 2.
dam is not found in model equation 3.



In [65]:

    
add_genotypes(model2,genofile,separator=',');









    



The delimiter in genotypes.txt is ','.
The header (marker IDs) is provided in genotypes.txt.
5 markers on 7 individuals were added.



In [66]:

    
out2=runMCMC(model2,phenotypes,methods="BayesC",estimatePi=true,single_step_analysis=true,pedigree=pedigree);









    



Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
The number of observations with both phenotype and pedigree information used in the analysis is 8.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
calculating A inverse
  0.000049 seconds (205 allocations: 16.031 KiB)
imputing missing genotypes
  0.124649 seconds (190 allocations: 23.781 KiB, 99.87% gc time)
completed imputing genotypes
Missing values are found in independent variables: dam.

Pi (Π) is not provided.
Pi (Π) is generated assuming all markers have effects on all traits.

The prior for marker effects covariance matrix is calculated from genetic covariance matrix and Π.
The mean of the prior for the marker effects covariance matrix is:
 0.496268  0.0       0.0     
 0.0       0.431625  0.0     
 0.0       0.0       0.114775



A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x3 + ID + dam
y2 = intercept + x1 + x2 + x3 + ID
y3 = intercept + x1 + x1*x3 + x2 + ID

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
x2              factor       random               2
x1*x3           interaction  fixed                2
ϵ               factor       random               5
J               covariate    fixed                1

MCMC Information:

methods                                      BayesC
                            incomplete genomic data
                       (i.e., single-step analysis)
chain_length                                    100
burnin                                            0
estimatePi                                     true
estimateScale                                 false
starting_value                                false
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y2:x2,y3:x2):
 0.876  0.0  
 0.0    0.233
random effect variances (y1:ID,y2:ID,y3:ID,y1:dam):
 1.008  0.0    0.0    0.0  
 0.0    0.876  0.0    0.0  
 0.0    0.0    0.233  0.0  
 0.0    0.0    0.0    1.008
random effect variances (y1:ϵ,y2:ϵ,y3:ϵ):
 1.008f0  0.0f0    0.0f0  
 0.0f0    0.876f0  0.0f0  
 0.0f0    0.0f0    0.233f0
residual variances:           
 1.008  0.0    0.0  
 0.0    0.876  0.0  
 0.0    0.0    0.233
genetic variances (polygenic):
 1.008  0.0    0.0    0.0  
 0.0    0.876  0.0    0.0  
 0.0    0.0    0.233  0.0  
 0.0    0.0    0.0    1.008
genetic variances (genomic):  
 1.008  0.0    0.0  
 0.0    0.876  0.0  
 0.0    0.0    0.233
marker effect variances:      
 0.496  0.0    0.0  
 0.0    0.432  0.0  
 0.0    0.0    0.115

Π: (Y(yes):included; N(no):excluded)

["y1", "y2", "y3"]         probability
["Y", "Y", "N"]                 0.0
["N", "N", "N"]                 0.0
["Y", "N", "N"]                 0.0
["N", "Y", "Y"]                 0.0
["Y", "N", "Y"]                 0.0
["N", "N", "Y"]                 0.0
["Y", "Y", "Y"]                 1.0
["N", "Y", "N"]                 0.0

Degree of freedom for hyper-parameters:

residual variances:                           7.000
random effect variances:                      6.000
random effect variances:                      7.000
polygenic effect variances:                   8.000
marker effect variances:                      7.000



The file MCMC_samples_residual_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_polygenic_effects_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y2.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y3.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_pi.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.J.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y2.J.txt is created to save MCMC samples for y2:J.
The file MCMC_samples_y3.J.txt is created to save MCMC samples for y3:J.
The file MCMC_samples_y1.ϵ.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y2.ϵ.txt is created to save MCMC samples for y2:ϵ.
The file MCMC_samples_y3.ϵ.txt is created to save MCMC samples for y3:ϵ.
The file MCMC_samples_y2.x2_y3.x2_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ID_y2.ID_y3.ID_y1.dam_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ϵ_y2.ϵ_y3.ϵ_variances.txt is created to save MCMC samples for y1:ϵ_y2:ϵ_y3:ϵ_variances.
The file MCMC_samples_EBV_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y2.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y3.txt already exists!!! It is overwritten by the new output.






    



running MCMC for BayesC...100%|█████████████████████████| Time: 0:00:01






    




The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.



In [67]:

    
keys(out2)









    Out[67]:





Base.KeySet for a Dict{Any,Any} with 9 entries. Keys:
  "marker effects"
  "EBV_y2"
  "EBV_y1"
  "Pi"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "EBV_y3"
  "marker effects variance"



In [68]:

    
out1["location parameters"]









    Out[68]:




37 rows × 5 columns
Trait Effect Level Estimate Std_Error
Any Any Any Any Any
1 y1 intercept intercept -3.57266 2.6059
2 y1 x1*x3 x1 * m -4.19906 6.89503
3 y1 x1*x3 x1 * f 0.479728 0.560252
4 y1 x2 2 -0.0529523 0.916848
5 y1 x2 1 -0.0712717 0.674382
6 y1 x3 m 5.86229 3.02209
7 y1 x3 f 2.34139 3.67728
8 y1 ID a2 0.06083 0.990601
9 y1 ID a1 0.137081 0.945244
10 y1 ID a3 -0.469558 1.20428
11 y1 ID a7 -0.479249 1.10237
12 y1 ID a4 -0.0408258 0.96728
13 y1 ID a6 -0.0803028 1.06643
14 y1 ID a9 0.0885007 1.02335
15 y1 ID a5 0.138324 0.989212
16 y1 ID a10 -0.451677 0.993673
17 y1 ID a12 -0.108276 1.06282
18 y1 ID a11 -0.191731 1.16108
19 y1 ID a8 0.0123802 0.881637
20 y1 dam a2 -0.0785836 0.907202
21 y1 dam a1 0.205705 1.08132
22 y1 dam a3 -0.471563 1.34062
23 y1 dam a7 -0.740043 0.951426
24 y1 dam a4 0.0606774 1.08693
25 y1 dam a6 0.00698366 1.27143
26 y1 dam a9 0.165479 1.12386
27 y1 dam a5 0.0403578 1.21389
28 y1 dam a10 -0.226581 1.18702
29 y1 dam a12 0.0442761 1.1841
30 y1 dam a11 -0.117548 1.27627
&vellip &vellip &vellip &vellip &vellip &vellip

	ID	y1	y2	y3	x1	x2	x3	dam	weights
	String	Float64	Float64	Float64	Float64	Int64	String	String⍰	Float64
1	a1	-0.06	3.58	-1.18	0.9	2	m	missing	1.0
2	a2	-0.6	4.9	0.88	0.3	1	f	missing	1.0
3	a3	-2.07	3.19	0.73	0.7	2	f	missing	1.0
4	a4	-2.63	6.97	-0.83	0.6	1	m	a2	1.0
5	a5	2.31	3.5	-1.52	0.4	2	m	a2	1.0

	ID	EBV	PEV
	Any	Any	Any
1	a1	0.308266	6.46324
2	a3	-0.992717	6.05282
3	a4	-0.399613	7.90331
4	a5	0.671401	4.17104
5	a6	0.0388337	7.78589
6	a7	-1.41756	6.00083
7	a8	-0.166578	8.37988

	Trait	Marker_ID	Estimate	Std_Error	Model_Frequency
	Any	Any	Any	Any	Any
1	y1	m1	-0.112787	0.535923	1.0
2	y1	m2	-0.256412	0.670566	1.0
3	y1	m3	0.272158	0.607386	1.0
4	y1	m4	-0.275205	0.563826	1.0
5	y1	m5	-0.00133083	0.535076	1.0

	Trait	Effect	Level	Estimate	Std_Error
	Any	Any	Any	Any	Any
1	y1	intercept	intercept	-3.57266	2.6059
2	y1	x1*x3	x1 * m	-4.19906	6.89503
3	y1	x1*x3	x1 * f	0.479728	0.560252
4	y1	x2	2	-0.0529523	0.916848
5	y1	x2	1	-0.0712717	0.674382
6	y1	x3	m	5.86229	3.02209
7	y1	x3	f	2.34139	3.67728
8	y1	ID	a2	0.06083	0.990601
9	y1	ID	a1	0.137081	0.945244
10	y1	ID	a3	-0.469558	1.20428
11	y1	ID	a7	-0.479249	1.10237
12	y1	ID	a4	-0.0408258	0.96728
13	y1	ID	a6	-0.0803028	1.06643
14	y1	ID	a9	0.0885007	1.02335
15	y1	ID	a5	0.138324	0.989212
16	y1	ID	a10	-0.451677	0.993673
17	y1	ID	a12	-0.108276	1.06282
18	y1	ID	a11	-0.191731	1.16108
19	y1	ID	a8	0.0123802	0.881637
20	y1	dam	a2	-0.0785836	0.907202
21	y1	dam	a1	0.205705	1.08132
22	y1	dam	a3	-0.471563	1.34062
23	y1	dam	a7	-0.740043	0.951426
24	y1	dam	a4	0.0606774	1.08693
25	y1	dam	a6	0.00698366	1.27143
26	y1	dam	a9	0.165479	1.12386
27	y1	dam	a5	0.0403578	1.21389
28	y1	dam	a10	-0.226581	1.18702
29	y1	dam	a12	0.0442761	1.1841
30	y1	dam	a11	-0.117548	1.27627
&vellip	&vellip	&vellip	&vellip	&vellip	&vellip