This IPython notebook file demonstrates some of the functionality available in poretools, and an example run report for a recent R7 chemistry nanopore run.
Some of the examples call out to R for plotting, so to make this work in this notebook we need to load the rpy2.ipython
module.
In [53]:
%load_ext rpy2.ipython
poretools can run either on an individual FAST5 file, a directory containing FAST5 files, or a tar archive of FAST5 files. Here we set up a $directory
variable for use for the rest of the tutorial. You could change this and run the same commands on your data.
In [18]:
directory='/mnt/borage/nick/nanopore/data/Flowcell6/downloads'
In [65]:
!find $directory -maxdepth 1 -name "*.fast5" | wc -l
There are 60,196 FAST5 files in the directory.
poretools has a number of different command line options. Running poretools with no parameters gives us a brief list (and complies with Torsten's first rule)
In [21]:
!poretools
We can get more information if we run poretools with the -h (help) option.
In [22]:
!poretools -h
Let's start with a simple one, the stats
command, this will give us some basic statistics about our reads.
The -q
option stops poretools
outputting any warning messages.
In [68]:
!poretools stats -q $directory
How do we have 104,969 reads from 60,196 FAST5 files? That's because forward, reverse and two-directional reads are all counted separately.
In [69]:
!poretools stats -q --type fwd $directory
We have 53,914 forward reads in our total dataset.
In [71]:
!poretools stats -q --type rev $directory
In [70]:
!poretools stats -q --type 2D $directory
We have 21,433 two-direction reads, which is about 40% of the reads which have been base-called and about 72% of the reads that have a detectable complement strand.
In [73]:
!poretools readstats -q $directory > readstats.txt
In [ ]:
!wc -l readstats.txt
readstats
gives you a line per every FAST5 file in your dataset. The columns are:
In [75]:
!head -10 readstats.txt
One useful plot you can easily do with the output of read stats is to plot the number of events in forward reads against reverse reads. Ideally every read would have a similar number, which would indicate the hairpin is correctly attached and the strand translocation rate is controlled by the enzyme.
In [79]:
%R stats=read.table("readstats.txt", sep="\t")
%R stats=subset(stats, V4 < 20000 & V5 < 20000)
Out[79]:
In [77]:
%R smoothScatter(stats$V4,stats$V5)
In [ ]:
!poretools winner -q --type 2D $directory > winner.fasta
This will just use the header data to generate a squiggle plot:
In [126]:
!head -1 winner.fasta | sed 's/>.* //' | xargs poretools squiggle --saveas png --num-facets 12
In [127]:
squiggle=!head -1 winner.fasta | sed 's/>.* //'
squiggle=squiggle[0] + '.png'
Image(squiggle)
Out[127]:
In [128]:
!poretools yield_plot -q --theme-bw --saveas yield_plot.png --plot-type reads $directory
In [ ]:
!poretools yield_plot -q --theme-bw --saveas yield_plot.pdf --plot-type reads $directory
In [86]:
Image("yield_plot.png")
Out[86]:
In [87]:
!poretools hist -q --theme-bw --min-length 1000 --max-length 40000 --saveas hist.png $directory
In [129]:
!poretools hist -q --theme-bw --min-length 1000 --max-length 40000 --saveas hist.pdf $directory
In [88]:
Image("hist.png")
Out[88]:
In [89]:
!poretools nucdist -q $directory
In [90]:
!poretools qualdist -q $directory