Pacbio seq reads were mapped to the current version of the reference genome (galGal4).
Important details:
In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
from glob import glob
import os
In [3]:
!cd .. && make pacbio_galGal4
There are 11,666,999 reads in the input files.
There are 627,636 (5.37%) unmapped reads to galGal4.
Reference for using samtools to count reads:
In [6]:
print "Total: ",
! samtools view -c ../outputs/pacbio/galGal4.pacbio.sorted.bam
In [7]:
print "Unmapped ",
! samtools view -c -f 4 ../outputs/pacbio/galGal4.pacbio.sorted.bam
In [2]:
total_N = !grep -F -o N ../outputs/reference/galGal4.fa |wc -l
total_bases = !grep -v ">" ../outputs/reference/galGal4.fa |wc -c
print float(total_N[0]) / float(total_bases[0])
In [10]:
!python ~/khmer/sandbox/assemstats3.py 100 ../outputs/pacbio_assembly/Chicken_10Kb20Kb_40X_Filtered_Subreads.fastq > ../outputs/pacbio_assembly/assemstats_output
In [11]:
import pandas as pd
pd.read_table("../outputs/pacbio_assembly/assemstats_output", sep=" ")
Out[11]:
In [12]:
!cat ../outputs/pacbio_assembly/assemstats_output
In [ ]: