Bayesian Linear Mixed Models (Genomic Data)



In [1]:

    
using JWAS,JWAS.Datasets,DataFrames,CSV



In [2]:

    
phenofile  = Datasets.dataset("example","phenotypes.txt")
pedfile    = Datasets.dataset("example","pedigree.txt")
genofile   = Datasets.dataset("example","genotypes.txt")

phenotypes = CSV.read(phenofile,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);









    



The delimiter in pedigree.txt is ','.
Pedigree informatin:
#individuals: 12
#sires:       4
#dams:        5
#founders:    3



In [3]:

    
first(phenotypes,5)









    Out[3]:




5 rows × 9 columns
ID y1 y2 y3 x1 x2 x3 dam weights
String Float64⍰ Float64⍰ Float64⍰ Float64 Float64 String String⍰ Float64
1 a1 -0.06 3.58 -1.18 0.9 2.0 m missing 1.0
2 a3 -2.07 3.19 1.0 0.7 2.0 f missing 1.0
3 a4 -2.63 6.97 -0.83 0.6 1.0 m a2 1.0
4 a5 2.31 missing -1.52 0.4 2.0 m a2 1.0
5 a7 missing missing missing 0.1 0.1 m a3 1.0

Univariate Linear Mixed Model (Genomic data)



In [4]:

    
model_equation1  ="y1 = intercept + x1*x3 + x2 + x3 + ID + dam";



In [5]:

    
model1 = build_model(model_equation1);



In [6]:

    
set_covariate(model1,"x1");



In [7]:

    
set_random(model1,"x2");
set_random(model1,"ID dam",pedigree);



In [8]:

    
add_genotypes(model1,genofile,separator=',');









    



The delimiter in genotypes.txt is ','.
The header (marker IDs) is provided in genotypes.txt.
5 markers on 7 individuals were added.



In [9]:

    
out1=runMCMC(model1,phenotypes,methods="BayesC",estimatePi=true);









    



Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for all traits included in the model for individual a7 in the row 5 are missing. This record is deleted.
The number of observations with both genotypes and phenotypes used in the analysis is 4.
The number of observations with both phenotype and pedigree information used in the analysis is 4.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Missing values are found in independent variables: dam.

The prior for marker effects variance is calculated from the genetic variance and π.
The mean of the prior for the marker effects variance is: 0.617255



A Linear Mixed Model was build using model equations:

y1 = intercept + x1*x3 + x2 + x3 + ID + dam

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1*x3           interaction  fixed                2
x2              factor       random               2
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12

MCMC Information:

methods                                      BayesC
                              complete genomic data
                   (i.e., non-single-step analysis)
chain_length                                    100
burnin                                            0
estimatePi                                     true
estimateScale                                 false
starting_value                                false
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2):              [1.253]
random effect variances (y1:ID,y1:dam): [1.253 0.0; 0.0 1.253]
residual variances:                           1.253
genetic variances (genomic):                  1.253
marker effect variances:                      0.617
π                                               0.0

Degree of freedom for hyper-parameters:

residual variances:                           4.000
random effect variances:                      5.000
polygenic effect variances:                   6.000
marker effect variances:                      4.000



The file MCMC_samples_residual_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_polygenic_effects_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y1.txt is created to save MCMC samples for marker_effects_y1.
The file MCMC_samples_marker_effects_variances.txt is created to save MCMC samples for marker_effects_variances.
The file MCMC_samples_pi.txt is created to save MCMC samples for pi.
The file MCMC_samples_y1.x2_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ID_y1.dam_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_genetic_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_heritability.txt already exists!!! It is overwritten by the new output.






    



running MCMC for BayesC...100%|█████████████████████████| Time: 0:00:02






    



The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.



In [10]:

    
keys(out1)









    Out[10]:





Base.KeySet for a Dict{Any,Any} with 9 entries. Keys:
  "marker effects"
  "EBV_y1"
  "Pi"
  "heritability"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "genetic_variance"
  "marker effects variance"



In [11]:

    
out1["EBV_y1"]









    Out[11]:




7 rows × 3 columns
ID EBV PEV
Any Any Any
1 a1 0.446483 1.80448
2 a3 0.246589 1.93395
3 a4 -1.20395 1.3241
4 a5 -0.334794 1.48462
5 a6 0.239163 2.05027
6 a7 0.532515 2.21422
7 a8 -0.431805 1.82995



In [12]:

    
out1["Pi"]









    Out[12]:




1 rows × 3 columns
π Estimate Std_Error
Any Any Any
1 π 0.427833 0.279688

Multivariate Linear Mixed Model (Genomic data)



In [13]:

    
model_equation2 ="y1 = intercept + x1 + x3 + ID + dam
                  y2 = intercept + x1 + x2 + x3 + ID
                  y3 = intercept + x1 + x1*x3 + x2 + ID";



In [14]:

    
model2 = build_model(model_equation2);



In [15]:

    
set_covariate(model2,"x1");



In [16]:

    
set_random(model2,"x2");
set_random(model2,"ID dam",pedigree);









    



x2 is not found in model equation 1.
dam is not found in model equation 2.
dam is not found in model equation 3.



In [17]:

    
add_genotypes(model2,genofile,separator=',');









    



The delimiter in genotypes.txt is ','.
The header (marker IDs) is provided in genotypes.txt.
5 markers on 7 individuals were added.



In [18]:

    
out2=runMCMC(model2,phenotypes,methods="BayesC",estimatePi=true);









    



Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for all traits included in the model for individual a7 in the row 5 are missing. This record is deleted.
The number of observations with both genotypes and phenotypes used in the analysis is 4.
The number of observations with both phenotype and pedigree information used in the analysis is 4.
Prior information for genomic variance is not provided and is generated from the data.
Prior information for residual variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Prior information for random effect variance is not provided and is generated from the data.
Missing values are found in independent variables: dam.

Pi (Π) is not provided.
Pi (Π) is generated assuming all markers have effects on all traits.

The prior for marker effects covariance matrix is calculated from genetic covariance matrix and Π.
The mean of the prior for the marker effects covariance matrix is:
 0.617255  0.0       0.0     
 0.0       0.532118  0.0     
 0.0       0.0       0.155597



A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x3 + ID + dam
y2 = intercept + x1 + x2 + x3 + ID
y3 = intercept + x1 + x1*x3 + x2 + ID

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
x2              factor       random               2
x1*x3           interaction  fixed                2

MCMC Information:

methods                                      BayesC
                              complete genomic data
                   (i.e., non-single-step analysis)
chain_length                                    100
burnin                                            0
estimatePi                                     true
estimateScale                                 false
starting_value                                false
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y2:x2,y3:x2):
 1.081  0.0  
 0.0    0.316
random effect variances (y1:ID,y2:ID,y3:ID,y1:dam):
 1.253  0.0    0.0    0.0  
 0.0    1.081  0.0    0.0  
 0.0    0.0    0.316  0.0  
 0.0    0.0    0.0    1.253
residual variances:           
 1.253  0.0    0.0  
 0.0    1.081  0.0  
 0.0    0.0    0.316
genetic variances (polygenic):
 1.253  0.0    0.0    0.0  
 0.0    1.081  0.0    0.0  
 0.0    0.0    0.316  0.0  
 0.0    0.0    0.0    1.253
genetic variances (genomic):  
 1.253  0.0    0.0  
 0.0    1.081  0.0  
 0.0    0.0    0.316
marker effect variances:      
 0.617  0.0    0.0  
 0.0    0.532  0.0  
 0.0    0.0    0.156

Π: (Y(yes):included; N(no):excluded)

["y1", "y2", "y3"]         probability
["Y", "Y", "N"]                 0.0
["N", "N", "N"]                 0.0
["Y", "N", "N"]                 0.0
["N", "Y", "Y"]                 0.0
["Y", "N", "Y"]                 0.0
["N", "N", "Y"]                 0.0
["Y", "Y", "Y"]                 1.0
["N", "Y", "N"]                 0.0

Degree of freedom for hyper-parameters:

residual variances:                           7.000
random effect variances:                      6.000
polygenic effect variances:                   8.000
marker effect variances:                      7.000



The file MCMC_samples_residual_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_polygenic_effects_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_marker_effects_y2.txt is created to save MCMC samples for marker_effects_y2.
The file MCMC_samples_marker_effects_y3.txt is created to save MCMC samples for marker_effects_y3.
The file MCMC_samples_marker_effects_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_pi.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y2.x2_y3.x2_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_y1.ID_y2.ID_y3.ID_y1.dam_variances.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y1.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y2.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_EBV_y3.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_genetic_variance.txt already exists!!! It is overwritten by the new output.
The file MCMC_samples_heritability.txt already exists!!! It is overwritten by the new output.






    



running MCMC for BayesC...100%|█████████████████████████| Time: 0:00:03






    



The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.



In [19]:

    
keys(out2)









    Out[19]:





Base.KeySet for a Dict{Any,Any} with 11 entries. Keys:
  "location parameters"
  "polygenic effects covariance matrix"
  "EBV_y3"
  "genetic_variance"
  "EBV_y2"
  "EBV_y1"
  "marker effects"
  "heritability"
  "residual variance"
  "Pi"
  "marker effects variance"



In [20]:

    
out2["Pi"]









    Out[20]:




8 rows × 3 columns
π Estimate Std_Error
Any Any Any
1 [1.0, 1.0, 0.0] 0.136707 0.118103
2 [0.0, 0.0, 0.0] 0.135537 0.110178
3 [1.0, 0.0, 0.0] 0.147073 0.118746
4 [0.0, 1.0, 1.0] 0.110508 0.0945137
5 [1.0, 0.0, 1.0] 0.108752 0.117394
6 [0.0, 0.0, 1.0] 0.109641 0.10215
7 [1.0, 1.0, 1.0] 0.129829 0.117483
8 [0.0, 1.0, 0.0] 0.121955 0.106232



In [21]:

    
out2["EBV_y2"]









    Out[21]:




7 rows × 3 columns
ID EBV PEV
Any Any Any
1 a1 -1.15508 0.718204
2 a3 0.657493 2.40265
3 a4 -0.586396 1.05399
4 a5 -0.65314 1.99496
5 a6 -0.164568 1.64079
6 a7 -0.723695 1.48565
7 a8 -0.814014 2.1087

	ID	y1	y2	y3	x1	x2	x3	dam	weights
	String	Float64⍰	Float64⍰	Float64⍰	Float64	Float64	String	String⍰	Float64
1	a1	-0.06	3.58	-1.18	0.9	2.0	m	missing	1.0
2	a3	-2.07	3.19	1.0	0.7	2.0	f	missing	1.0
3	a4	-2.63	6.97	-0.83	0.6	1.0	m	a2	1.0
4	a5	2.31	missing	-1.52	0.4	2.0	m	a2	1.0
5	a7	missing	missing	missing	0.1	0.1	m	a3	1.0

	ID	EBV	PEV
	Any	Any	Any
1	a1	0.446483	1.80448
2	a3	0.246589	1.93395
3	a4	-1.20395	1.3241
4	a5	-0.334794	1.48462
5	a6	0.239163	2.05027
6	a7	0.532515	2.21422
7	a8	-0.431805	1.82995

	π	Estimate	Std_Error
	Any	Any	Any
1	[1.0, 1.0, 0.0]	0.136707	0.118103
2	[0.0, 0.0, 0.0]	0.135537	0.110178
3	[1.0, 0.0, 0.0]	0.147073	0.118746
4	[0.0, 1.0, 1.0]	0.110508	0.0945137
5	[1.0, 0.0, 1.0]	0.108752	0.117394
6	[0.0, 0.0, 1.0]	0.109641	0.10215
7	[1.0, 1.0, 1.0]	0.129829	0.117483
8	[0.0, 1.0, 0.0]	0.121955	0.106232

	ID	EBV	PEV
	Any	Any	Any
1	a1	-1.15508	0.718204
2	a3	0.657493	2.40265
3	a4	-0.586396	1.05399
4	a5	-0.65314	1.99496
5	a6	-0.164568	1.64079
6	a7	-0.723695	1.48565
7	a8	-0.814014	2.1087