This notebook creates the BED file provided in
First, we download 2 FastQ from EBI FTP, its reference genome and its genbank annotation. Then, we map the read onto the reference using BWA and obtain a BAM file. The BAM file is converted to a BED, which is going to be one input file for our analysis. Finally, we use the coverage tool from Sequana project (i) with the standalone (sequana_coverage).
At the bottom of the page, we also added an example of an application where we show that some SNPs could be discarded or assigned a negative score when they appear next or within a ROIs.
Versions used:
In [1]:
%pylab inline
matplotlib.rcParams['figure.figsize'] = [10,7]
Populating the interactive namespace from numpy and matplotlib
We will use bioservices (ENA service) or you can download the file from our github: https://github.com/sequana/resources/tree/master/coverage
In [2]:
from bioservices import ENA
ena = ENA()
data = ena.get_data("FN433596", frmt="fasta")
with open("FN433596.fasta", "wb") as fh:
fh.write(data)
In [3]:
!wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR036/ERR036019/ERR036019_1.fastq.gz
!wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR036/ERR036019/ERR036019_2.fastq.gz
--2018-07-23 21:06:06-- ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR036/ERR036019/ERR036019_1.fastq.gz
=> ‘ERR036019_1.fastq.gz’
Resolving ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)... 193.62.192.7
Connecting to ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)|193.62.192.7|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /vol1/fastq/ERR036/ERR036019 ... done.
==> SIZE ERR036019_1.fastq.gz ... 450249800
==> PASV ... done. ==> RETR ERR036019_1.fastq.gz ... done.
Length: 450249800 (429M) (unauthoritative)
ERR036019_1.fastq.g 100%[===================>] 429.39M 2.72MB/s in 2m 57s
2018-07-23 21:09:06 (2.43 MB/s) - ‘ERR036019_1.fastq.gz’ saved [450249800]
--2018-07-23 21:09:07-- ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR036/ERR036019/ERR036019_2.fastq.gz
=> ‘ERR036019_2.fastq.gz’
Resolving ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)... 193.62.192.7
Connecting to ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)|193.62.192.7|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /vol1/fastq/ERR036/ERR036019 ... done.
==> SIZE ERR036019_2.fastq.gz ... 453230181
==> PASV ... done. ==> RETR ERR036019_2.fastq.gz ... done.
Length: 453230181 (432M) (unauthoritative)
ERR036019_2.fastq.g 100%[===================>] 432.23M 2.17MB/s in 4m 48s
2018-07-23 21:13:56 (1.50 MB/s) - ‘ERR036019_2.fastq.gz’ saved [453230181]
In [4]:
!sequana_mapping --file1 ERR036019_1.fastq.gz --file2 ERR036019_2.fastq.gz --reference FN433596.fasta --thread 4
Theoretical Depth of Coverage : 453.4815540169755
[bwa_index] Pack FASTA... 0.02 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.77 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.29 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index FN433596.fasta
[main] Real time: 1.313 sec; CPU: 1.103 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 533334 sequences (40000050 bp)...
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (12, 263542, 7, 3)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (199, 742, 7139)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 21019)
[M::mem_pestat] mean and std.dev: (2815.92, 3264.22)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 27959)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 270, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (283.23, 85.24)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 15.956 CPU sec, 3.696 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (12, 261456, 5, 5)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (290, 2689, 5030)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 14510)
[M::mem_pestat] mean and std.dev: (2672.92, 2663.40)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 19250)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 270, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (283.39, 85.50)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 16.611 CPU sec, 4.210 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (17, 262913, 11, 4)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (195, 3032, 6413)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 18849)
[M::mem_pestat] mean and std.dev: (3750.18, 3580.15)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 25067)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 270, 337)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 571)
[M::mem_pestat] mean and std.dev: (283.77, 85.72)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 688)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (4160, 5432, 7108)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 13004)
[M::mem_pestat] mean and std.dev: (5365.00, 1649.43)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 15952)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_process_seqs] Processed 533334 reads in 17.220 CPU sec, 4.352 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (17, 262148, 10, 4)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (147, 211, 5199)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15303)
[M::mem_pestat] mean and std.dev: (1948.12, 2919.26)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 20355)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 337)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 571)
[M::mem_pestat] mean and std.dev: (283.59, 86.05)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 688)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (3887, 6051, 8233)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 16925)
[M::mem_pestat] mean and std.dev: (5760.40, 2472.97)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 21271)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_process_seqs] Processed 533334 reads in 17.482 CPU sec, 4.430 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (15, 249800, 5, 12)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (97, 194, 2590)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7576)
[M::mem_pestat] mean and std.dev: (1517.79, 2285.91)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 10661)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (282.49, 85.33)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (2850, 6661, 7555)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 16965)
[M::mem_pestat] mean and std.dev: (5401.25, 2910.62)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 21670)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 533334 reads in 18.418 CPU sec, 4.686 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (18, 234575, 4, 8)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (90, 311, 2582)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7566)
[M::mem_pestat] mean and std.dev: (1066.29, 1671.97)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 10058)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.75, 85.67)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 20.535 CPU sec, 5.186 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (23, 253814, 25, 8)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (118, 153, 1572)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4480)
[M::mem_pestat] mean and std.dev: (337.26, 485.60)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 5934)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 335)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 567)
[M::mem_pestat] mean and std.dev: (282.32, 85.25)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 683)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (5250, 7975, 8103)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 13809)
[M::mem_pestat] mean and std.dev: (6606.24, 2108.87)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 16662)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_process_seqs] Processed 533334 reads in 20.184 CPU sec, 5.090 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (19, 262156, 5, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (119, 243, 6146)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 18200)
[M::mem_pestat] mean and std.dev: (2169.21, 2979.61)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 24227)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.74, 85.91)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 17.447 CPU sec, 4.437 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (11, 257591, 6, 6)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (161, 246, 2625)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7553)
[M::mem_pestat] mean and std.dev: (1501.80, 1910.33)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 10017)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.60, 85.65)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 19.594 CPU sec, 4.946 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (13, 261049, 7, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (73, 148, 2659)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7831)
[M::mem_pestat] mean and std.dev: (905.00, 1966.35)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 10417)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.63, 85.64)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 18.872 CPU sec, 4.786 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (16, 260110, 6, 9)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (145, 264, 5740)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 16930)
[M::mem_pestat] mean and std.dev: (2052.06, 3001.08)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 22525)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.67, 85.59)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 18.589 CPU sec, 4.722 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (17, 260774, 6, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (139, 184, 596)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1510)
[M::mem_pestat] mean and std.dev: (201.29, 142.44)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1967)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 337)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 571)
[M::mem_pestat] mean and std.dev: (283.01, 85.95)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 688)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 18.391 CPU sec, 4.647 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (12, 260127, 1, 4)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (89, 148, 244)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 554)
[M::mem_pestat] mean and std.dev: (132.18, 74.34)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 709)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.74, 85.67)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 19.699 CPU sec, 4.990 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (15, 260651, 6, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (140, 266, 5232)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15416)
[M::mem_pestat] mean and std.dev: (2286.60, 3406.85)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 20508)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.52, 85.70)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 19.693 CPU sec, 5.336 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (15, 253638, 2, 5)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (69, 132, 1019)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 2919)
[M::mem_pestat] mean and std.dev: (367.85, 675.48)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 3869)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.81, 85.73)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 22.451 CPU sec, 5.704 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (17, 255691, 2, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (122, 214, 3940)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 11576)
[M::mem_pestat] mean and std.dev: (2069.82, 3214.15)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 15394)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (282.92, 85.61)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 20.844 CPU sec, 5.275 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (14, 260515, 5, 3)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (96, 165, 6936)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 20616)
[M::mem_pestat] mean and std.dev: (2685.43, 3640.89)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 27456)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (282.80, 85.51)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 18.975 CPU sec, 4.799 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (11, 253986, 0, 5)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (98, 204, 6020)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 17864)
[M::mem_pestat] mean and std.dev: (1914.27, 2606.95)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 23786)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (283.01, 85.60)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 17.300 CPU sec, 4.846 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (12, 239773, 7, 2)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (67, 166, 325)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 841)
[M::mem_pestat] mean and std.dev: (140.50, 94.67)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1099)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (282.60, 85.32)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 19.718 CPU sec, 4.990 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (14, 231524, 6, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (49, 150, 1320)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 3862)
[M::mem_pestat] mean and std.dev: (306.27, 450.31)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 5133)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 268, 335)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 567)
[M::mem_pestat] mean and std.dev: (281.80, 85.31)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 683)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 19.110 CPU sec, 4.840 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (16, 250013, 8, 8)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (149, 218, 3504)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 10214)
[M::mem_pestat] mean and std.dev: (1721.56, 2411.48)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 13569)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (283.06, 85.63)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 20.357 CPU sec, 5.139 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (17, 241966, 2, 4)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (107, 354, 7813)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 23225)
[M::mem_pestat] mean and std.dev: (3108.29, 3710.96)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 30931)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (282.92, 85.62)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 21.647 CPU sec, 5.487 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (13, 235428, 6, 4)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (183, 1511, 7648)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 22578)
[M::mem_pestat] mean and std.dev: (3058.92, 3333.41)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 30043)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.94, 85.62)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 20.347 CPU sec, 5.151 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (9, 252065, 5, 5)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 570)
[M::mem_pestat] mean and std.dev: (282.92, 85.79)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 687)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 533334 reads in 19.562 CPU sec, 4.948 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (20, 257596, 8, 13)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (117, 450, 6068)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 17970)
[M::mem_pestat] mean and std.dev: (2859.00, 3378.66)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 23921)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (282.94, 85.47)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (346, 3491, 8599)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 25105)
[M::mem_pestat] mean and std.dev: (4238.31, 3750.10)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 33358)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 533334 reads in 18.019 CPU sec, 4.554 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (15, 254097, 5, 8)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (260, 367, 4200)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 12080)
[M::mem_pestat] mean and std.dev: (2115.53, 2747.45)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 16020)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 335)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 567)
[M::mem_pestat] mean and std.dev: (282.40, 85.31)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 683)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 18.750 CPU sec, 4.748 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (15, 245013, 6, 8)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (173, 203, 3665)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 10649)
[M::mem_pestat] mean and std.dev: (1574.07, 2112.14)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 14141)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 269, 335)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 567)
[M::mem_pestat] mean and std.dev: (282.22, 85.17)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 683)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 20.210 CPU sec, 5.143 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (13, 256626, 8, 6)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (175, 3961, 9141)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 27073)
[M::mem_pestat] mean and std.dev: (4283.92, 4104.67)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 36039)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (283.12, 85.43)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 19.530 CPU sec, 4.939 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (19, 259092, 6, 11)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (201, 345, 4664)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 13590)
[M::mem_pestat] mean and std.dev: (2057.05, 2675.98)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 18053)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 269, 336)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 568)
[M::mem_pestat] mean and std.dev: (283.12, 85.27)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 684)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (2927, 3847, 8704)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 20258)
[M::mem_pestat] mean and std.dev: (4920.55, 2774.25)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 26035)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 533334 reads in 19.344 CPU sec, 4.887 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (22, 252180, 5, 8)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (123, 359, 5325)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15729)
[M::mem_pestat] mean and std.dev: (2466.86, 3058.24)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 20931)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (220, 270, 337)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 571)
[M::mem_pestat] mean and std.dev: (283.63, 85.91)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 688)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 20.847 CPU sec, 5.275 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (10, 221083, 1, 6)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (54, 123, 273)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 711)
[M::mem_pestat] mean and std.dev: (103.88, 74.23)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 930)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (218, 267, 333)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 563)
[M::mem_pestat] mean and std.dev: (280.45, 84.59)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 678)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 24.288 CPU sec, 6.110 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (12, 220685, 3, 6)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (172, 273, 1618)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4510)
[M::mem_pestat] mean and std.dev: (335.90, 434.60)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 5956)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (218, 267, 334)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 566)
[M::mem_pestat] mean and std.dev: (280.82, 85.16)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 682)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 24.615 CPU sec, 6.193 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (18, 219768, 2, 1)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (100, 229, 331)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 793)
[M::mem_pestat] mean and std.dev: (163.50, 97.73)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1024)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 268, 335)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 567)
[M::mem_pestat] mean and std.dev: (281.51, 85.19)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 683)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 533334 reads in 25.366 CPU sec, 6.399 real sec
[M::process] read 533334 sequences (40000050 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (20, 220269, 5, 10)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (75, 225, 305)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 765)
[M::mem_pestat] mean and std.dev: (192.00, 132.87)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 995)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (219, 268, 334)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 564)
[M::mem_pestat] mean and std.dev: (281.31, 85.04)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 679)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (1267, 5766, 8436)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 22774)
[M::mem_pestat] mean and std.dev: (5220.00, 3134.74)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 29943)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 533334 reads in 25.836 CPU sec, 6.511 real sec
[M::process] read 267172 sequences (20037900 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (9, 110850, 4, 2)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (217, 266, 332)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 562)
[M::mem_pestat] mean and std.dev: (279.95, 84.46)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 677)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 267172 reads in 12.230 CPU sec, 3.099 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -M -t 4 -R @RG\tID:1\tSM:1\tPL:illumina -T 30 FN433596.fasta ERR036019_1.fastq.gz ERR036019_2.fastq.gz
[main] Real time: 414.684 sec; CPU: 740.304 sec
[bam_sort_core] merging from 4 files and 4 in-memory blocks...
In [5]:
!bedtools genomecov -d -ibam FN433596.fasta.sorted.bam > FN433596.bed
In [ ]:
!samtools view -q 35 -o FN433596.filtered.bam FN433596.fasta.sorted.bam
!samtools depth -d 20000 FN433596.fasta.sorted.bam FN433596.filtered.bam -aa > FN433596.filtered.bed
In [ ]:
from sequana.snpeff import download_fasta_and_genbank
download_fasta_and_genbank("FN433596", "FN433596", fasta=False)
In [3]:
from sequana import GenomeCov
# load the BED and genbank file
b = GenomeCov("FN433596.bed", "FN433596.gbk")
b.compute_gc_content("FN433596.fasta")
In [4]:
# load the first chromosome and compute running median and get z-scores
chromosome = b.chr_list[0]
chromosome.run(30001, circular=True)
Out[4]:
<sequana.bedtools.ChromosomeCovMultiChunk at 0x7f955e45bba8>
In [5]:
# 5Mb data points. So, this may take a while. Note that only 1Mb are used for plotting.
chromosome.plot_coverage()
In [8]:
#!sequana_coverage --input FN433596.bed --reference FN433596.fasta --genbank FN433596.gbk -w 80001
!sequana_coverage --input FN433596.bed --reference FN433596.fasta -w 80001 -o
INFO [sequana]: Reading FN433596.bed. This may take time depending on your input file
INFO [sequana]: Scanning input file (chunk of 5000000 rows)
INFO [sequana]: Computing GC content
WARNING [sequana]: There is only one chromosome. Selected automatically.
INFO [sequana]: Computing some metrics
INFO [sequana]:
Genome length: 3043210
Sequencing depth (DOC): 447.81
Sequencing depth (median): 453.00
Breadth of coverage (BOC) (percent): 98.46
Genome coverage standard deviation: 84.12
Genome coverage coefficient variation: 0.19
INFO [sequana]: Using running median (w=80001)
INFO [sequana]: Number of mixture models 2
INFO [sequana]: Fitted central distribution (first chunk): mu=1.002, sigma=0.075, pi=0.95
INFO [sequana]: Searching for ROIs (threshold=[-4,4] ; double =[-2.0,2.0])
INFO [sequana]: Number of ROIs found: 664
INFO [sequana]: - below average: 561
INFO [sequana]: - above average: 103
INFO [sequana]: Computing extra metrics
INFO [sequana]: Evenness: 0.9372
INFO [sequana]: Centralness (3 sigma): 0.955
INFO [sequana]: Centralness (4 sigma): 0.967
INFO [sequana]: Creating report in report. Please wait
INFO [sequana]: sub sample data for plotting the coverage
Computing 2D histogram. Please wait
INFO [sequana]: =========================
INFO [sequana]: Creating multiqc report
[WARNING] multiqc : MultiQC Version v1.4 now available!
[INFO ] multiqc : This is MultiQC v1.0
[INFO ] multiqc : Template : default
[INFO ] multiqc : Report title: Sequana multi summary
[INFO ] multiqc : Searching '.'
[INFO ] multiqc : Only using modules sequana_coverage
Searching 35 files.. [####################################] 100%
[INFO ] coverage : Found 1 reports
[INFO ] multiqc : Compressing plot data
[INFO ] multiqc : Report : multiqc_report.html
[INFO ] multiqc : Data : multiqc_data
[INFO ] multiqc : MultiQC complete
You can now open the report ./report/multiqc_report.html
In [9]:
!cp report/coverage_reports/*/rois.csv .
In the directory, we provide a file called ERR036019.filter.vcf that contains a VCF file generated with the Variant Calling pipeline described here http://sequana.readthedocs.io/en/master/pipeline_variant_calling.html
Here below, we read the VCF, the coverage and the ROIs found by the sequana_coverage tool. For a few examples, we plot the coverage, the ROIS (blue areas) and the SNPs found. We show that SNPs we low quality also coincide with ROIs. They may be further discarded and confirmed to be poor quality SNPs.
In [10]:
# The file ERR036019.filter.vcf was generated with sequana pipeline variantcalling
# version 0.6.2 using sequanix interface. The config file and pipeline
# can be found in ./pipeline directory
# The rois.csv file was generated by the sequana_coverage command here above
# in
import pandas as pd
roi = pd.read_csv("rois.csv")
from sequana.freebayes_vcf_filter import VCF_freebayes as VCF
vcf = VCF("ERR036019.filter.vcf")
variants = [v for v in vcf]
positions = [v.POS for v in variants]
quality = [v.QUAL for v in variants] # freebayes score
M = max(quality)
scores = [x/M*100 for x in quality] # normalise
depths = [chromosome.df['cov'].iloc[pos] for pos in positions]
In [11]:
def plot_variant(x1=526000, x2=538000):
clf()
chromosome.plot_coverage()
low = roi[roi.mean_cov<roi.mean_rm]
high = roi[roi.mean_cov>roi.mean_rm]
chr = chromosome
highT = (chr.thresholds.high * chr.best_gaussian["sigma"] +
chr.best_gaussian["mu"]) * chr.df["rm"]
lowT = (chr.thresholds.low * chr.best_gaussian["sigma"] +
chr.best_gaussian["mu"]) * chr.df["rm"]
for k,v in low.iterrows():
Y1 = chr.df['cov'].iloc[v.start:v.end]
Y2 = lowT.iloc[v.start:v.end]
Y1 = Y1.combine(Y2, max) *0
if v.start > x1 and v.end<x2:
fill_between(range(v.start, v.end), Y1, Y2, alpha=0.6, color="blue")
for k,v in high.iterrows():
Y1 = highT.iloc[v.start:v.end]
Y2 = chr.df['cov'].iloc[v.start:v.end]
Y2 = Y2.combine(Y1,max)
if v.start > x1 and v.end<x2:
fill_between(range(v.start, v.end), y1=Y1,y2=Y2 ,alpha=0.6, color="orange")
scatter(positions, depths, c=scores, s=150)
colorbar()
xlim([x1,x2])
In [12]:
plot_variant()
Here, the color in the RHS indicates the quality of the SNPs
In [13]:
plot_variant(1913000,1923000)
In [14]:
plot_variant(947000, 952000)
In [ ]:
Content source: sequana/resources
Similar notebooks: