Barcode_split API Usage

Running Jupyter notebook

If you want to run pycoQC interactively in Jupyter you need to install Jupyter manually. If you installed pycoQC in a virtual environment then install Jupyter in the same virtual environment.

pip3 install notebook

Launch the notebook in a shell terminal

jupyter notebook

If it does not auto-start, open the following URL in you favorite web browser http://localhost:8888/tree

From Jupyter homepage you can navigate to the directory you want to work in and create a new Python3 Notebook.

Imports


In [3]:
# Run cell with Ctrl + Enter

# Import main pycoQC module
from pycoQC.Barcode_split import Barcode_split

# Import helper functions from pycoQC
from pycoQC.common import jhelp, head, ls

Running Barcode_split


In [2]:
jhelp(Barcode_split)


Barcode_split (summary_file, barcode_file, output_dir, output_unclassified, min_barcode_percent, verbose, quiet)

Parse Albacore sequencing_summary.txt file and split per barcode By default, data for low frequency barcodes and unclassified reads are not written in the output directory


  • summary_file (required) [str]

Path to a sequencing_summary generated by Albacore 1.0.0 + (read_fast5_basecaller.py) / Guppy 2.1.3+ (guppy_basecaller). One can also pass multiple space separated file paths or a UNIX style regex matching multiple files

  • barcode_file (default: "") [str]

Path to the barcode_file generated by Guppy 2.1.3+ (guppy_barcoder) or Deepbinner 0.2.0+. This is not a required file. One can also pass multiple space separated file paths or a UNIX style regex matching multiple files

  • output_dir (default: "") [str]

Folder where to output split barcode data

  • output_unclassified (default: False) [bool]

If True unclassified barcodes are also written in a file. By default they are skiped

  • min_barcode_percent (default: 0.1) [float]

Minimal percent of total reads to write barcode reads in file.

  • verbose (default: False) [bool]

Increase verbosity

  • quiet (default: False) [bool]

Reduce verbosity

Basic usage


In [4]:
Barcode_split (
    summary_file="./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz",
    output_unclassified=True,
    output_dir="./results/")


Import data from sequencing summary file(s) and cleanup
	Read files and import in a dataframe
Check input data files
Parse data files
Merge data
	Cleanup missing barcodes values
	Cleaning up low frequency barcodes
Split data per barcode
	Processing data for Barcode barcode02
	Processing data for Barcode barcode07
	Processing data for Barcode barcode08
	Processing data for Barcode barcode09
	Processing data for Barcode barcode10
	Processing data for Barcode barcode11
	Processing data for Barcode barcode12
	Processing data for Barcode unclassified
Barcode Counts
              Counts  Write
barcode02          2  False
barcode07          1  False
barcode08         30  False
barcode09       9945   True
barcode10      12644   True
barcode11      13594   True
barcode12       9813   True
unclassified    3971   True

With externaly provided barcodes


In [5]:
Barcode_split (
    summary_file="./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz",
    barcode_file="./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz",
    output_dir="./results/")


Import data from sequencing summary file(s) and cleanup
	Read files and import in a dataframe
Check input data files
Parse data files
Merge data
	Cleanup missing barcodes values
	Cleaning up low frequency barcodes
Split data per barcode
	Processing data for Barcode 1
	Processing data for Barcode 2
	Processing data for Barcode 3
	Processing data for Barcode 4
	Processing data for Barcode 5
	Processing data for Barcode 6
	Processing data for Barcode 7
	Processing data for Barcode 8
	Processing data for Barcode unclassified
Barcode Counts
              Counts  Write
1                534   True
2                206   True
3                562   True
4                579   True
5                590   True
6                655   True
7                271   True
8                378   True
unclassified     224  False

If no barcode an error is raised


In [6]:
Barcode_split (
    summary_file="./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz",
   output_dir="./results/")


Import data from sequencing summary file(s) and cleanup
	Read files and import in a dataframe
Check input data files
Parse data files
Merge data
---------------------------------------------------------------------------
pycoQCError                               Traceback (most recent call last)
<ipython-input-6-2f136ce0e4dc> in <module>
      1 Barcode_split (
      2     summary_file="./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz",
----> 3    output_dir="./results/")

~/Programming/Packages/pycoQC/pycoQC/Barcode_split.py in Barcode_split(summary_file, barcode_file, output_dir, output_unclassified, min_barcode_percent, verbose, quiet)
     75     df = df.rename(columns={"barcode_arrangement":"barcode"})
     76     if not "barcode" in df:
---> 77         raise pycoQCError ("No barcode information found in provided file(s)")
     78 
     79     # Quick data cleanup

pycoQCError: No barcode information found in provided file(s)