Bayesian Linear Additive Genetic Model

In [21]:
using JWAS,JWAS.Datasets,DataFrames,CSV

In [22]:
phenofile  = Datasets.dataset("example","phenotypes.txt")
pedfile    = Datasets.dataset("example","pedigree.txt")

phenotypes = CSV.read(phenofile,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);


The delimiter in pedigree.txt is ','.
Pedigree informatin:
#individuals: 12
#sires:       4
#dams:        5
#founders:    3

In [23]:
first(phenotypes,5)


Out[23]:

5 rows × 9 columns

IDy1y2y3x1x2x3damweights
StringFloat64⍰Float64⍰Float64⍰Float64Float64StringString⍰Float64
1a1-0.063.58-1.180.92.0mmissing1.0
2a3-2.073.191.00.72.0fmissing1.0
3a4-2.636.97-0.830.61.0ma21.0
4a52.31missing-1.520.42.0ma21.0
5a7missingmissingmissing0.10.1ma31.0
Univariate Linear Additive Genetic Model

In [24]:
model_equation1  ="y1 = intercept + x1*x3 + x2 + x3 + ID + dam";

In [25]:
model1 = build_model(model_equation1);

In [26]:
set_covariate(model1,"x1");

In [27]:
set_random(model1,"x2");
set_random(model1,"ID dam",pedigree);

In [28]:
out1=runMCMC(model1,phenotypes);


Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for all traits included in the model for individual a7 in the row 5 are missing. This record is deleted.
The number of observations with both phenotype and pedigree information used in the analysis is 4.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Missing values are found in independent variables: dam.

A Linear Mixed Model was build using model equations:

y1 = intercept + x1*x3 + x2 + x3 + ID + dam

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1*x3           interaction  fixed                2
x2              factor       random               2
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12

MCMC Information:

methods                   conventional (no markers)
chain_length                                    100
burnin                                            0
estimateScale                                 false
starting_value                                false
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2):              [1.253]
random effect variances (y1:ID,y1:dam): [2.507 0.0; 0.0 2.507]
residual variances:                           1.253

Degree of freedom for hyper-parameters:

residual variances:                           4.000
random effect variances:                      5.000
polygenic effect variances:                   6.000



The file MCMC_samples_residual_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_polygenic_effects_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.x2_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ID_y1.dam_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_genetic_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_heritability.txt already exists!!! It is overwritten by the new output.
running MCMC for conventional (no markers)...100%|██████| Time: 0:00:00

The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.



In [29]:
keys(out1)


Out[29]:
Base.KeySet for a Dict{Any,Any} with 6 entries. Keys:
  "EBV_y1"
  "heritability"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "genetic_variance"

In [30]:
out1["polygenic effects covariance matrix"]


Out[30]:

4 rows × 3 columns

CovarianceEstimateStd_Error
AnyAnyAny
1y1:ID_y1:ID4.00023.48931
2y1:ID_y1:dam0.3261462.61556
3y1:dam_y1:ID0.3261462.61556
4y1:dam_y1:dam3.824784.50704
Multivariate Linear Additive Genetic Model

In [31]:
model_equation2 ="y1 = intercept + x1 + x3 + ID + dam
                  y2 = intercept + x1 + x2 + x3 + ID
                  y3 = intercept + x1 + x1*x3 + x2 + ID";

In [32]:
model2 = build_model(model_equation2);

In [33]:
set_covariate(model2,"x1");

In [34]:
set_random(model2,"x2");
set_random(model2,"ID dam",pedigree);


x2 is not found in model equation 1.
dam is not found in model equation 2.
dam is not found in model equation 3.

In [35]:
out2=runMCMC(model2,phenotypes);


Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for all traits included in the model for individual a7 in the row 5 are missing. This record is deleted.
The number of observations with both phenotype and pedigree information used in the analysis is 4.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Missing values are found in independent variables: dam.

A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x3 + ID + dam
y2 = intercept + x1 + x2 + x3 + ID
y3 = intercept + x1 + x1*x3 + x2 + ID

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
x2              factor       random               2
x1*x3           interaction  fixed                2

MCMC Information:

methods                   conventional (no markers)
chain_length                                    100
burnin                                            0
estimateScale                                 false
starting_value                                false
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y2:x2,y3:x2):
 1.081  0.0  
 0.0    0.316
random effect variances (y1:ID,y2:ID,y3:ID,y1:dam):
 2.507  0.0    0.0    0.0  
 0.0    2.161  0.0    0.0  
 0.0    0.0    0.632  0.0  
 0.0    0.0    0.0    2.507
residual variances:           
 1.253  0.0    0.0  
 0.0    1.081  0.0  
 0.0    0.0    0.316
genetic variances (polygenic):
 2.507  0.0    0.0    0.0  
 0.0    2.161  0.0    0.0  
 0.0    0.0    0.632  0.0  
 0.0    0.0    0.0    2.507

Degree of freedom for hyper-parameters:

residual variances:                           7.000
random effect variances:                      6.000
polygenic effect variances:                   8.000



The file MCMC_samples_residual_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_polygenic_effects_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y2.x2_y3.x2_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ID_y2.ID_y3.ID_y1.dam_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y2.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y3.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_genetic_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_heritability.txt already exists!!! It is overwritten by the new output.
running MCMC for conventional (no markers)...100%|██████| Time: 0:00:00

The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.



In [36]:
keys(out2)


Out[36]:
Base.KeySet for a Dict{Any,Any} with 8 entries. Keys:
  "EBV_y2"
  "EBV_y1"
  "heritability"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "EBV_y3"
  "genetic_variance"

In [37]:
out2["location parameters"]


Out[37]:

64 rows × 5 columns

TraitEffectLevelEstimateStd_Error
AnyAnyAnyAnyAny
1y1interceptintercept-13.25528.67802
2y1x1x110.63747.5289
3y1x3m5.638656.78181
4y1x3f3.881347.0109
5y1IDa20.6257270.948119
6y1IDa1-0.1028211.36501
7y1IDa30.1180261.17951
8y1IDa7-0.008133421.25624
9y1IDa40.3912111.36237
10y1IDa6-0.1751391.30038
11y1IDa90.3222731.75786
12y1IDa50.9596720.905914
13y1IDa100.3068781.2159
14y1IDa120.3616321.53531
15y1IDa110.3640751.63428
16y1IDa80.08035291.54358
17y1dama20.5955871.26182
18y1dama10.103561.59261
19y1dama30.01351361.35755
20y1dama7-0.07953041.84925
21y1dama40.4887611.66216
22y1dama6-0.1612321.45572
23y1dama90.1426371.93583
24y1dama50.7171941.68037
25y1dama100.4398731.79305
26y1dama120.4543391.92791
27y1dama110.1978571.68761
28y1dama80.2100021.53829
29y2interceptintercept8.399691.68005
30y2x1x1-8.383492.75414
&vellip&vellip&vellip&vellip&vellip&vellip

In [38]:
out2["EBV_y3"]


Out[38]:

12 rows × 3 columns

IDEBVPEV
AnyAnyAny
1a2-0.1804060.305792
2a1-0.2747320.321981
3a30.1979550.317888
4a7-0.03334570.313894
5a4-0.2287290.275388
6a6-0.007383970.239737
7a9-0.1251050.342197
8a5-0.4019510.245787
9a10-0.342160.393678
10a12-0.2141990.304896
11a11-0.1894370.34082
12a8-0.1032720.319101

In [39]:
out2["polygenic effects covariance matrix"]


Out[39]:

16 rows × 3 columns

CovarianceEstimateStd_Error
AnyAnyAny
1y1:ID_y1:ID1.805571.00455
2y1:ID_y2:ID0.09006070.830421
3y1:ID_y3:ID0.0199250.355726
4y1:ID_y1:dam0.4097640.653957
5y2:ID_y1:ID0.09006070.830421
6y2:ID_y2:ID2.005411.89668
7y2:ID_y3:ID-0.06853860.466511
8y2:ID_y1:dam-0.1471210.927323
9y3:ID_y1:ID0.01992490.355726
10y3:ID_y2:ID-0.06853860.466511
11y3:ID_y3:ID0.4758730.301468
12y3:ID_y1:dam-0.1129910.327674
13y1:dam_y1:ID0.4097640.653957
14y1:dam_y2:ID-0.1471210.927323
15y1:dam_y3:ID-0.1129910.327674
16y1:dam_y1:dam1.975171.06663

In [40]:
out2["genetic_variance"]


Out[40]:

9 rows × 3 columns

CovarianceEstimateStd_Error
AnyAnyAny
1y1_y13.105132.4985
2y1_y2-0.2505411.02221
3y1_y3-0.06046720.414207
4y2_y1-0.2505411.02221
5y2_y21.103821.07214
6y2_y3-0.04615740.27032
7y3_y1-0.06046720.414207
8y3_y2-0.04615740.27032
9y3_y30.2648210.180307