In [1]:
using JWAS,JWAS.Datasets,DataFrames,CSV


┌ Info: Precompiling JWAS [c9a035f4-d403-5e6b-8649-6be755bc4798]
└ @ Base loading.jl:1273

In [2]:
phenofile  = Datasets.dataset("example","phenotypes.txt")
pedfile    = Datasets.dataset("example","pedigree.txt")
genofile   = Datasets.dataset("example","genotypes.txt")

phenotypes = CSV.read(phenofile,delim = ',',header=true,missingstrings=["NA"])
pedigree   = get_pedigree(pedfile,separator=",",header=true);


The delimiter in pedigree.txt is ','.
Pedigree informatin:
#individuals: 12
#sires:       4
#dams:        5
#founders:    3

In [3]:
first(phenotypes,5)


Out[3]:

5 rows × 9 columns

IDy1y2y3x1x2x3damweights
StringFloat64⍰Float64⍰Float64⍰Float64Float64StringString⍰Float64
1a1-0.063.58-1.180.92.0mmissing0.125
2a3-2.073.191.00.72.0fmissing0.125
3a4-2.636.97-0.830.61.0ma21.0
4a52.31missing-1.520.42.0ma20.5
5a7missingmissingmissing0.10.1ma30.25
Univariate Linear Mixed Model (Genomic data)

In [4]:
model_equation1  ="y1 = intercept + x1*x3 + x2 + x3 + ID + dam";

In [5]:
R      = 1.0
model1 = build_model(model_equation1,R);

In [6]:
set_covariate(model1,"x1");

In [7]:
G1 = 1.0
G2 = [1.0 0.5
      0.5 1.0]
set_random(model1,"x2",G1);
set_random(model1,"ID dam",pedigree,G2);

In [8]:
G3 =1.0
add_genotypes(model1,genofile,G3,separator=',');


The delimiter in genotypes.txt is ','.
The header (marker IDs) is provided in genotypes.txt.
5 markers on 7 individuals were added.

In [9]:
outputMCMCsamples(model1,"x2")
out1=runMCMC(model1,phenotypes,methods="BayesC",estimatePi=true,chain_length=5000,output_samples_frequency=100);


Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for all traits included in the model for individual a7 in the row 5 are missing. This record is deleted.
The number of observations with both genotypes and phenotypes used in the analysis is 4.
The number of observations with both phenotype and pedigree information used in the analysis is 4.
Missing values are found in independent variables: dam.

The prior for marker effects variance is calculated from the genetic variance and π.
The mean of the prior for the marker effects variance is: 0.492462



A Linear Mixed Model was build using model equations:

y1 = intercept + x1*x3 + x2 + x3 + ID + dam

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1*x3           interaction  fixed                2
x2              factor       random               2
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12

MCMC Information:

methods                                      BayesC
                              complete genomic data
                   (i.e., non-single-step analysis)
chain_length                                   5000
burnin                                            0
estimatePi                                     true
estimateScale                                 false
starting_value                                false
printout_frequency                             5001
output_samples_frequency                        100
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:x2):                [1.0]
random effect variances (y1:ID,y1:dam):   [1.0 0.5; 0.5 1.0]
residual variances:                           1.000
genetic variances (genomic):                  1.000
marker effect variances:                      0.492
π                                               0.0

Degree of freedom for hyper-parameters:

residual variances:                           4.000
random effect variances:                      5.000
polygenic effect variances:                   6.000
marker effect variances:                      4.000



The file MCMC_samples_residual_variance.txt is created to save MCMC samples for residual_variance.
The file MCMC_samples_polygenic_effects_variance.txt is created to save MCMC samples for polygenic_effects_variance.
The file MCMC_samples_marker_effects_y1.txt is created to save MCMC samples for marker_effects_y1.
The file MCMC_samples_marker_effects_variances.txt is created to save MCMC samples for marker_effects_variances.
The file MCMC_samples_pi.txt is created to save MCMC samples for pi.
The file MCMC_samples_y1.x2.txt is created to save MCMC samples for y1:x2.
The file MCMC_samples_y1.x2_variances.txt is created to save MCMC samples for y1:x2_variances.
The file MCMC_samples_y1.ID_y1.dam_variances.txt is created to save MCMC samples for y1:ID_y1:dam_variances.
The file MCMC_samples_EBV_y1.txt is created to save MCMC samples for EBV_y1.
The file MCMC_samples_genetic_variance.txt is created to save MCMC samples for genetic_variance.
The file MCMC_samples_heritability.txt is created to save MCMC samples for heritability.
running MCMC for BayesC...100%|█████████████████████████| Time: 0:00:03

The version of Julia and Platform in use:

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.


GWAS


In [10]:
marker_effects_file="MCMC_samples_marker_effects_y1.txt"


Out[10]:
"MCMC_samples_marker_effects_y1.txt"

In [11]:
GWAS(marker_effects_file,header=true)


Compute the model frequency for each marker (the probability the marker is included in the model).
Out[11]:

5 rows × 2 columns

marker_IDmodelfrequency
Abstract…Float64
1m10.58
2m20.68
3m30.58
4m40.58
5m50.56

In [15]:
map_file = Datasets.dataset("example","map.txt")
out=GWAS(marker_effects_file,map_file,model1,header=true,window_size="1 Mb",threshold=0.001)


Compute the posterior probability of association of the genomic window that explains more than 0.001 of the total genetic variance.
Out[15]:
(3×13 DataFrame. Omitted printing of 6 columns
│ Row │ trait │ window │ chr    │ wStart  │ wEnd    │ start_SNP │ end_SNP │
│     │ Int64Int64StringInt64Int64Int64Int64   │
├─────┼───────┼────────┼────────┼─────────┼─────────┼───────────┼─────────┤
│ 1   │ 1     │ 3      │ 2      │ 0       │ 1000000 │ 70350     │ 101135  │
│ 2   │ 1     │ 1      │ 1      │ 0       │ 1000000 │ 16977     │ 434311  │
│ 3   │ 1     │ 2      │ 1      │ 1000000 │ 2000000 │ 1025513   │ 1025513 │,)

In [ ]: