Riyue Bao, Ph.D. Center for Research Informatics, The University of Chicago. November 13, 2016
The workshop materials are accessible on Github licensed via LGPLv3.
<img src='ipynb_data/assets/RNAseq.workflow.png', title = 'RNAseq workflow', width = 1000, height = 1000>
The test datasets used in this workshop are from Fog. et al. 2015. Loss of PRDM11 promotes MYC-driven lymphomagenesis. Blood, 125(8):1272-81 http://www.bloodjournal.org/content/125/8/1272.long?sso-checked=true
For this workshop, the machine you are using has everything pre-compiled and pre-installed. It is ready for analysis.
In the future, if you'd like to use the pipleine on your own machines, download analysis pipeline from Github and follow the instructions to install.
{bash}
git clonehttps://github.com/cribioinfo/CRI-Workshop-AMIA-2016-RNAseq.git
Detailed documentation of the pipeline can be found on Github README and wiki.
Go to [New] button on top of the notebook. In the dropdown menu, click [Terminal].
BigDataScript
& Perl
{bash}
##-- commands
pwd
cd ~/CRI-Workshop-AMIA-2016-RNAseq
ls -al
ls -al notebook_ext
ls -al pipeline
{bash}
##-- commands
pwd
cd ~/CRI-Workshop-AMIA-2016-RNAseq/pipeline/test
./Build_RNAseq.DLBC.sh &
##-- START Thu Oct 27 15:57:38 UTC 2016 Running ../Build_RNAseq.pl
##-- START Thu Oct 27 15:57:38 UTC 2016 Running Submit_RNAseq.DLBC.sh
##-- running ... 3 ~ 4 minutes
##-- END Thu Oct 27 16:01:25 UTC 2016
/home/ubuntu/data/rnaseq/fullset
/home/ubuntu/dev/rnaseq/subset
Unless pointed out otherwise, all commands shown apply to the test datasets only. Do not use them directly on other datasets. Refer to the pipeline documentation for instructions on how to set up the pipeline for your own projects.
``` fastqc --extract -o $out.dir -t 2 --nogroup $r1.fastq.gz ```
<img src='ipynb_data/assets/Figure1.png', title = 'Figure1', width = 600, height = 600>
``` java -Xmx4G -jar trimmomatic-0.36.jar SE -threads 4 -phred33 $r1.fastq.gz $r1.trim.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:5 TRAILING:5 MINLEN:36 SLIDINGWINDOW:4:15 ```
<img src='ipynb_data/assets/Figure2.png', title = 'Figure2', width = 600, height = 600>
``` STAR --runMode alignReads --genomeLoad NoSharedMemory --outFileNamePrefix $out.prefix --readFilesCommand zcat --genomeDir $refgenome.dir --readFilesIn $r1.trim.fastq.gz --runThreadN 2 --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonicalUnannotated --outSAMtype BAM SortedByCoordinate ```
<img src='ipynb_data/assets/Figure3.png', title = 'Figure3', width = 750, height = 1000>
Goal
Method
bedtools, version 2.26.0
Scaling factor can be supplied to normalize coverage across samples (library size)
``` java -Xmx4G -jar picard.jar CollectRnaSeqMetrics I=$sample.bam O=$out.rnaseq_metrics REF_FLAT=$refgeneanno.refflat.txt RIBOSOMAL_INTERVALS=$refgeneanno.rRNA.interval_list STRAND=$strandness CHART=$out.rnaseq_metrics.pdf METRIC_ACCUMULATION_LEVEL=SAMPLE VALIDATION_STRINGENCY=LENIENT java -Xmx4G -jar picard.jar CollectMultipleMetrics I=$sample.bam O=$out R=$refgenome.fa PROGRAM=CollectAlignmentSummaryMetrics PROGRAM=CollectInsertSizeMetrics PROGRAM=QualityScoreDistribution PROGRAM=MeanQualityByCycle VALIDATION_STRINGENCY=LENIENT ```
``` infer_experiment.py -i $sample.bam -r $refgeneanno.bed12 -s 200000 > $sample.infer_experiment ```
``` bedtools genomecov -bga -split -ibam $sample.bam -scale $scale.factor > $sample.cov.bedgraph bedGraphToBigWig $sample.cov.bedgraph $refgenome.chrom.size $sample.cov.bigwig ```
In [1]:
from IPython.display import IFrame
IFrame('ipynb_data/assets/multiqc_report.html', width=1000, height=700)
Out[1]:
<img src='ipynb_data/assets/Figure4.png', title = 'Figure4', width = 600, height = 600>
Which sample (S1-4) has the most severe contamination from genomic DNA?
<img src='ipynb_data/assets/Figure5.png', title = 'Figure5', width = 600, height = 600>
Which sample (S1-4) has the most insufficient rRNA depletion in lib prep? (refer to previous figure)
<img src='ipynb_data/assets/Figure6.png', title = 'Figure6', width = 600, height = 600>
``` featureCounts -s 0 -F GTF -t exon -g gene_id -Q 255 -J --primary -a $refgeneanno.gtf -T 2 -G $refgenome.fa -o $sample.star.featurecounts.raw_counts.txt $sample.star.merged.bam ```
<img src='ipynb_data/assets/Figure7.png', title = 'Figure7', width = 800, height = 580>
In [2]:
from IPython.display import IFrame
IFrame('ipynb_data/assets/Run_RNAseq.bds.20161021_204005_581.report.html', width=1000, height=700)
Out[2]:
<img src='ipynb_data/assets/Figure8.png', title = 'Figure8', width = 700, height = 600>
<img src='ipynb_data/assets/Figure9-1.png', title = 'Figure9-1', width = 800, height = 600> <img src='ipynb_data/assets/Figure9-2.png', title = 'Figure9-2', width = 800, height = 600> <img src='ipynb_data/assets/Figure9-3.png', title = 'Figure9-3', width = 800, height = 600>