In [1]:
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
import porekit
%matplotlib inline
In [2]:
df = pd.read_hdf("../examples/data/ru9_meta.h5", "meta")
In [3]:
porekit.plots.read_length_distribution(df);
This is a histogram showing the distribution of read length. In this case it's the max of template and complement length. This plots ignores a small part of the longest reads in order to be more readable.
In [4]:
porekit.plots.reads_vs_time(df);
In [5]:
porekit.plots.yield_curves(df);
This plot shows the sequence yields in Megabases over time.
In [6]:
porekit.plots.template_vs_complement(df);
In the standard 2D library preparation, a "hairpin" is attached to one end of double stranded DNA. Then, when the strand goes through the nanopore, first one strand translocates, then the hairpin and finally the complement. Because template and complement both carry the same information, they can be used to improve accuracy of the basecalling.
However, not all molecules have a hairpin attached, not all have a complement strand, and in most cases, the template and complement length does not match completely. This can be seen in the plot above, where most data points are on a diagonal with template and complement length being almost the same. There are more points under the diagonal than above it, and there is a solid line at the bottom, showing reads with no complement.
In [7]:
porekit.plots.occupancy(df);
This shows the occupancy of pores over time. In General, pores break over time, which is a major factor in limiting the total yield over the lifetime of a flowcell.
The squiggle_dots function takes a Fast5 File and outputs a plot of all event means as dots on a graph. This way of plotting event data does a better job at characterizing a long read than the traditional "squiggle" plot. In this example there is a marked difference between the traces of the template and the complement, as segmented by the detected hairpin section.
In [8]:
fast5 = porekit.Fast5File(df.iloc[1002].absolute_filename)
porekit.plots.squiggle_dots(fast5)
fast5.close()
The plots inside porekit.plots are designed to work best inside the Jupyter notebook when exploring nanopore data interactively, and showing nanopore data as published notebooks or presentations. This is why they use colors and a wide aspect ratio.
But the plots can be customized somewhat using standard matplotlib. Every plot function returns a figure and an axis object:
In [9]:
f, ax = porekit.plots.read_length_distribution(df)
f.suptitle("Hello World");
f.set_figwidth(6)
Sometimes you want to subdivide a figure into multiple plots. You can do it like this:
In [10]:
f, axes = plt.subplots(1,2)
f.set_figwidth(14)
ax1, ax2 = axes
porekit.plots.read_length_distribution(df, ax=ax1);
porekit.plots.yield_curves(df, ax=ax2);
If you want to go beyond those relatively simple customizations, you may want to just copy and paste some code from porekit/plots.py and go from there. The plots are relatively simple overall.