Take RNA-Seq counts and get differentially expressed genes

Install packages


In [ ]:
#Install needed packages
source("https://bioconductor.org/biocLite.R")
biocLite("edgeR")
biocLite("compcodeR")
biocLite("EBSeq")
biocLite("DESeq2")
install.packages("heatmaply")
install.packages("manhattanly")
#devtools::install_github('hadley/ggplot2')
install.packages("RColorBrewer")


Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent
  version of R; see http://bioconductor.org/install
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) 'edgeR'
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Old packages: 'XML'
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) 'compcodeR'
Warning message in install.packages(pkgs = doing, lib = lib, ...):
"installation of package 'compcodeR' had non-zero exit status"Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Old packages: 'XML'
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) 'EBSeq'
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Old packages: 'XML'

Load required libraries


In [1]:
#load packages
library("edgeR")
library("EBSeq")
library("DESeq2")
library("plotly")
library("compcodeR")
library("heatmaply")
library("manhattanly")
library("reshape2")


Loading required package: limma
Loading required package: blockmodeling
Loading required package: gplots

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Loading required package: testthat
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from 'package:limma':

    plotMA

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min


Attaching package: 'S4Vectors'

The following object is masked from 'package:testthat':

    compare

The following object is masked from 'package:gplots':

    space

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ggplot2

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:IRanges':

    slice

The following object is masked from 'package:S4Vectors':

    rename

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Error in library("compcodeR"): there is no package called 'compcodeR'
Traceback:

1. library("compcodeR")
2. stop(txt, domain = NA)

Load count matrix from RSEM output


In [12]:
count.matrix = read.table("wt_treat_counts_genes.matrix")
sample.annot = data.frame(condition = c(1, 2))
rownames(sample.annot) = colnames(count.matrix[,1:2])
info.parameters = list(dataset = "DE_test", uID = "11111")
#Save read count data for a condition to get it in the right
# format for differential expression R package
cpd = compData(count.matrix = count.matrix[,1:2],
sample.annotations = sample.annot,
info.parameters = info.parameters)
saveRDS(cpd, "wt_treat.rds")
head(count.matrix)


wt_mesc.genes.resultsprdm14_mesc.genes.resultsmir290_mesc.genes.resultssyk1_mesc.genes.results
0610005C13Rik 9.00 4.00 6.00 16.00
0610007P14Rik336.00314.00266.00292.00
0610009B22Rik149.95113.59165.36141.44
0610009L18Rik 7.00 4.00 9.00 4.00
0610009O20Rik666.96516.00635.99578.99
0610010B08Rik181.08185.48314.89147.58

Run differential expression analysis, showing EBSeq w/ default parameters


In [5]:
runDiffExp(data.file = "wt_treat.rds", result.extent = "EBSeq",
           Rmdfunction = "EBSeq.createRmd",output.directory = ".", 
           norm.method = "median", norm.path="True")
#look up all different diff exp R possible runs w/ help(runDiffExp)
#runDiffExp(data.file = "wt_prdm14.rds", result.extent = "DESeq2",Rmdfunction = "DESeq2.createRmd",output.directory = ".", fit.type="mean", test="Wald")



processing file: tempcode.Rmd
  |................................                                 |  50%
  ordinary text without R code

  |.................................................................| 100%
label: unnamed-chunk-1 (with options) 
List of 6
 $ echo   : logi TRUE
 $ eval   : logi TRUE
 $ include: logi TRUE
 $ message: logi TRUE
 $ error  : logi TRUE
 $ warning: logi TRUE


output file: tempcode.md

TRUE

Suck in diff exp results and FPKM RSEM output


In [13]:
out_EB = readRDS("wt_treat_EBSeq.rds")
rsem_fpkm_matrix=data.matrix(read.table("diff_exp_fpkm.txt",header=TRUE,row.names=1,sep="\t"))
head(rsem_fpkm_matrix,20)
eb_results = 1-unlist(out_EB@result.table["posterior.DE"])
#adding .000001 pseudocount
fold_change = log2(rsem_fpkm_matrix[,2]/(rsem_fpkm_matrix[,1]))
eb_de_out = data.frame(P=eb_results[is.finite(eb_results)],fold_change=fold_change[is.finite(eb_results)], gene=rownames(out_EB@result.table)[is.finite(eb_results)])
colnames(eb_de_out) = c("P","EFFECTSIZE","SNP")
#out_DESeq2 = readRDS("wt_prdm14_DESeq2.rds")


wtprdm14mir290syk1
0610005C13Rik 0.63 0.31 0.45 1.15
0610007P14Rik22.2123.0518.1419.69
0610009B22Rik15.0312.6417.1014.46
0610009L18Rik 0.91 0.58 1.21 0.53
0610009O20Rik21.3618.3621.0218.92
0610010B08Rik 3.05 3.47 5.47 2.54
0610010F05Rik 8.83 7.6812.47 6.22
0610010K14Rik73.2163.4466.8163.51
0610012G03Rik12.3812.0512.45 9.93
0610030E20Rik 3.46 1.45 3.83 2.30
0610031O16Rik 0.00 0.00 0.00 0.00
0610037L13Rik13.11 9.3214.4010.66
0610038B21Rik 0.33 0.12 0.36 0.40
0610039H22Rik 0.99 0.49 0.45 0.56
0610039K10Rik 0.66 0.65 1.72 0.70
0610040B10Rik 1.13 0.42 0.97 0.58
0610040F04Rik 2.63 1.24 2.57 2.07
0610040J01Rik 0.41 0.46 0.26 0.46
0610043K17Rik 0.13 0.44 0.14 0.00
1010001N08Rik 0.36 0.00 0.00 0.12

Make volcano plot of results

Note this is very heavy so it might get laggy!


In [ ]:
embed_notebook(volcanoly(eb_de_out,highlight="Gapdh",snp="SNP",annotation1 = "EFFECTSIZE",annotation2="P",
                         title="EBSeq DE Analysis of \n WT vs Treatment",
                         ylab="log10 of 1-Posterior Probability of DE", xlab="Log2 fold change"))