Barcode_split CLI Usage

Activate virtual environment


In [1]:
# Using virtualenvwrapper here but can also be done with Conda 
workon pycoQC


(pycoQC) 

Getting help


In [2]:
Barcode_split -h


usage: Barcode_split [-h] [--version] --summary_file
                     [SUMMARY_FILE [SUMMARY_FILE ...]]
                     [--barcode_file [BARCODE_FILE [BARCODE_FILE ...]]]
                     [--output_dir OUTPUT_DIR] [--output_unclassified]
                     [--min_barcode_percent MIN_BARCODE_PERCENT] [-v | -q]

Barcode_split is a simple tool to split sequencing summary report in per
barcodes

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --summary_file [SUMMARY_FILE [SUMMARY_FILE ...]], -f [SUMMARY_FILE [SUMMARY_FILE ...]]
                        Path to a sequencing_summary generated by Albacore
                        1.0.0 + (read_fast5_basecaller.py) / Guppy 2.1.3+
                        (guppy_basecaller). One can also pass multiple space
                        separated file paths or a UNIX style regex matching
                        multiple files
  --barcode_file [BARCODE_FILE [BARCODE_FILE ...]], -b [BARCODE_FILE [BARCODE_FILE ...]]
                        Path to the barcode_file generated by Guppy 2.1.3+
                        (guppy_barcoder) or Deepbinner 0.2.0+. One can also
                        pass multiple space separated file paths or a UNIX
                        style regex matching multiple files
  --output_dir OUTPUT_DIR, -o OUTPUT_DIR
                        Folder where to output split barcode data (default:
                        current dir
  --output_unclassified, -u
                        If given, unclassified barcodes are also written in a
                        file. By default they are skiped
  --min_barcode_percent MIN_BARCODE_PERCENT, -p MIN_BARCODE_PERCENT
                        Minimal percent of total reads to retain barcode
                        label. If below, the barcode value is set as
                        `unclassified` (default: 0.1)
  -v, --verbose         Increase verbosity
  -q, --quiet           Reduce verbosity
(pycoQC) 

Usage examples

Basic usage


In [3]:
Barcode_split \
    -f './data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz' \
    -o "./results/"


Import data from sequencing summary file(s) and cleanup
	Read files and import in a dataframe
Check input data files
Parse data files
Merge data
	Cleanup missing barcodes values
	Cleaning up low frequency barcodes
Split data per barcode
	Processing data for Barcode barcode02
	Processing data for Barcode barcode07
	Processing data for Barcode barcode08
	Processing data for Barcode barcode09
	Processing data for Barcode barcode10
	Processing data for Barcode barcode11
	Processing data for Barcode barcode12
	Processing data for Barcode unclassified
Barcode Counts
              Counts  Write
barcode02          2  False
barcode07          1  False
barcode08         30  False
barcode09       9945   True
barcode10      12644   True
barcode11      13594   True
barcode12       9813   True
unclassified    3971  False
(pycoQC) 

With externaly provided barcodes


In [4]:
Barcode_split \
    -f "./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz" \
    -b "./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz" \
    -o "./results/" \
    -v


General info
	package_name: pycoQC
	package_version: 2.5.0.17
	timestamp: 2020-01-09 16:57:57.525774

Runtime options
	quiet: False
	verbose: True
	min_barcode_percent: 0.1
	output_unclassified: False
	output_dir: ./results/
	barcode_file: ['./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz']
	summary_file: ['./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz']

Import data from sequencing summary file(s) and cleanup
	Read files and import in a dataframe
Check input data files
		Sequencing summary files found: ./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz
		Barcode files found: ./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz
Parse data files
	Parse summary files
		3,999 reads found in initial file
	Parse barcode files
		Found valid Deepbinner barcode file
		3,775 reads with barcodes assigned
Merge data
	Cleanup missing barcodes values
	Cleaning up low frequency barcodes
Split data per barcode
	Processing data for Barcode 1
	Processing data for Barcode 2
	Processing data for Barcode 3
	Processing data for Barcode 4
	Processing data for Barcode 5
	Processing data for Barcode 6
	Processing data for Barcode 7
	Processing data for Barcode 8
	Processing data for Barcode unclassified
Barcode Counts
              Counts  Write
1                534   True
2                206   True
3                562   True
4                579   True
5                590   True
6                655   True
7                271   True
8                378   True
unclassified     224  False
(pycoQC)