Take RNA-Seq counts and get differentially expressed genes

Install packages



In [ ]:

    
#Install needed packages
source("https://bioconductor.org/biocLite.R")
biocLite("edgeR")
biocLite("compcodeR")
biocLite("EBSeq")
biocLite("DESeq2")
install.packages("heatmaply")
install.packages("manhattanly")
#devtools::install_github('hadley/ggplot2')
install.packages("RColorBrewer")









    



Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent
  version of R; see http://bioconductor.org/install
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) 'edgeR'
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Old packages: 'XML'
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) 'compcodeR'
Warning message in install.packages(pkgs = doing, lib = lib, ...):
"installation of package 'compcodeR' had non-zero exit status"Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Old packages: 'XML'
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) 'EBSeq'
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Old packages: 'XML'

Load required libraries



In [1]:

    
#load packages
library("edgeR")
library("EBSeq")
library("DESeq2")
library("plotly")
library("compcodeR")
library("heatmaply")
library("manhattanly")
library("reshape2")









    



Loading required package: limma
Loading required package: blockmodeling
Loading required package: gplots

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Loading required package: testthat
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from 'package:limma':

    plotMA

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min


Attaching package: 'S4Vectors'

The following object is masked from 'package:testthat':

    compare

The following object is masked from 'package:gplots':

    space

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ggplot2

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:IRanges':

    slice

The following object is masked from 'package:S4Vectors':

    rename

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout







    



Error in library("compcodeR"): there is no package called 'compcodeR'
Traceback:

1. library("compcodeR")
2. stop(txt, domain = NA)

Load count matrix from RSEM output



In [12]:

    
count.matrix = read.table("wt_treat_counts_genes.matrix")
sample.annot = data.frame(condition = c(1, 2))
rownames(sample.annot) = colnames(count.matrix[,1:2])
info.parameters = list(dataset = "DE_test", uID = "11111")
#Save read count data for a condition to get it in the right
# format for differential expression R package
cpd = compData(count.matrix = count.matrix[,1:2],
sample.annotations = sample.annot,
info.parameters = info.parameters)
saveRDS(cpd, "wt_treat.rds")
head(count.matrix)









    





wt_mesc.genes.results prdm14_mesc.genes.results mir290_mesc.genes.results syk1_mesc.genes.results

	0610005C13Rik   9.00   4.00   6.00  16.00
	0610007P14Rik 336.00 314.00 266.00 292.00
	0610009B22Rik 149.95 113.59 165.36 141.44
	0610009L18Rik   7.00   4.00   9.00   4.00
	0610009O20Rik 666.96 516.00 635.99 578.99
	0610010B08Rik 181.08 185.48 314.89 147.58

Run differential expression analysis, showing EBSeq w/ default parameters



In [5]:

    
runDiffExp(data.file = "wt_treat.rds", result.extent = "EBSeq",
           Rmdfunction = "EBSeq.createRmd",output.directory = ".", 
           norm.method = "median", norm.path="True")
#look up all different diff exp R possible runs w/ help(runDiffExp)
#runDiffExp(data.file = "wt_prdm14.rds", result.extent = "DESeq2",Rmdfunction = "DESeq2.createRmd",output.directory = ".", fit.type="mean", test="Wald")









    




processing file: tempcode.Rmd






    



  |................................                                 |  50%
  ordinary text without R code

  |.................................................................| 100%
label: unnamed-chunk-1 (with options) 
List of 6
 $ echo   : logi TRUE
 $ eval   : logi TRUE
 $ include: logi TRUE
 $ message: logi TRUE
 $ error  : logi TRUE
 $ warning: logi TRUE








    



output file: tempcode.md







    




TRUE

Suck in diff exp results and FPKM RSEM output



In [13]:

    
out_EB = readRDS("wt_treat_EBSeq.rds")
rsem_fpkm_matrix=data.matrix(read.table("diff_exp_fpkm.txt",header=TRUE,row.names=1,sep="\t"))
head(rsem_fpkm_matrix,20)
eb_results = 1-unlist(out_EB@result.table["posterior.DE"])
#adding .000001 pseudocount
fold_change = log2(rsem_fpkm_matrix[,2]/(rsem_fpkm_matrix[,1]))
eb_de_out = data.frame(P=eb_results[is.finite(eb_results)],fold_change=fold_change[is.finite(eb_results)], gene=rownames(out_EB@result.table)[is.finite(eb_results)])
colnames(eb_de_out) = c("P","EFFECTSIZE","SNP")
#out_DESeq2 = readRDS("wt_prdm14_DESeq2.rds")









    





wt prdm14 mir290 syk1

	0610005C13Rik  0.63  0.31  0.45  1.15
	0610007P14Rik 22.21 23.05 18.14 19.69
	0610009B22Rik 15.03 12.64 17.10 14.46
	0610009L18Rik  0.91  0.58  1.21  0.53
	0610009O20Rik 21.36 18.36 21.02 18.92
	0610010B08Rik  3.05  3.47  5.47  2.54
	0610010F05Rik  8.83  7.68 12.47  6.22
	0610010K14Rik 73.21 63.44 66.81 63.51
	0610012G03Rik 12.38 12.05 12.45  9.93
	0610030E20Rik  3.46  1.45  3.83  2.30
	0610031O16Rik  0.00  0.00  0.00  0.00
	0610037L13Rik 13.11  9.32 14.40 10.66
	0610038B21Rik  0.33  0.12  0.36  0.40
	0610039H22Rik  0.99  0.49  0.45  0.56
	0610039K10Rik  0.66  0.65  1.72  0.70
	0610040B10Rik  1.13  0.42  0.97  0.58
	0610040F04Rik  2.63  1.24  2.57  2.07
	0610040J01Rik  0.41  0.46  0.26  0.46
	0610043K17Rik  0.13  0.44  0.14  0.00
	1010001N08Rik  0.36  0.00  0.00  0.12

Make volcano plot of results

Note this is very heavy so it might get laggy!



In [ ]:

    
embed_notebook(volcanoly(eb_de_out,highlight="Gapdh",snp="SNP",annotation1 = "EFFECTSIZE",annotation2="P",
                         title="EBSeq DE Analysis of \n WT vs Treatment",
                         ylab="log10 of 1-Posterior Probability of DE", xlab="Log2 fold change"))

	wt_mesc.genes.results	prdm14_mesc.genes.results	mir290_mesc.genes.results	syk1_mesc.genes.results
0610005C13Rik	9.00	4.00	6.00	16.00
0610007P14Rik	336.00	314.00	266.00	292.00
0610009B22Rik	149.95	113.59	165.36	141.44
0610009L18Rik	7.00	4.00	9.00	4.00
0610009O20Rik	666.96	516.00	635.99	578.99
0610010B08Rik	181.08	185.48	314.89	147.58

	wt	prdm14	mir290	syk1
0610005C13Rik	0.63	0.31	0.45	1.15
0610007P14Rik	22.21	23.05	18.14	19.69
0610009B22Rik	15.03	12.64	17.10	14.46
0610009L18Rik	0.91	0.58	1.21	0.53
0610009O20Rik	21.36	18.36	21.02	18.92
0610010B08Rik	3.05	3.47	5.47	2.54
0610010F05Rik	8.83	7.68	12.47	6.22
0610010K14Rik	73.21	63.44	66.81	63.51
0610012G03Rik	12.38	12.05	12.45	9.93
0610030E20Rik	3.46	1.45	3.83	2.30
0610031O16Rik	0.00	0.00	0.00	0.00
0610037L13Rik	13.11	9.32	14.40	10.66
0610038B21Rik	0.33	0.12	0.36	0.40
0610039H22Rik	0.99	0.49	0.45	0.56
0610039K10Rik	0.66	0.65	1.72	0.70
0610040B10Rik	1.13	0.42	0.97	0.58
0610040F04Rik	2.63	1.24	2.57	2.07
0610040J01Rik	0.41	0.46	0.26	0.46
0610043K17Rik	0.13	0.44	0.14	0.00
1010001N08Rik	0.36	0.00	0.00	0.12