The results of three rounds of correction look, on visual inspection to be good enough for assembly. However Newbler does not give good results.

Aim:

  • To try Celera assembler in various modes and see whether this does any better.

I got the pbasm.spec file I am using from the Sprai distribution http://zombie.cb.k.u-tokyo.ac.jp/sprai/README.html


In [ ]:
!./wgs-8.2/Linux-amd64/bin/runCA -d celera_FC20 -p asm -s pbasm.spec FC20iter2.frg

If you use FASTA and a fake QUAL file to make the FRG file this reports "no overlaps found". The PBCR documentation suggests there is an issue with malformatted input files.

Got this working:


In [ ]:
!java -jar convertFastaAndQualToFastq.jar nanocorrect/FC20_iter2_corrected.fasta > nanocorrect/FC20_iter2_corrected.fastq
!fastqToCA -technology sanger -libraryname FC20 -reads nanocorrect/FC20_iter2_corrected.fastq > test.frg
!runCA -d celera_FC20 -p asm -s pbasm.spec test.frg

In [ ]:
[Scaffolds]
TotalScaffolds=99
TotalContigsInScaffolds=99
MeanContigsPerScaffold=1.00
MinContigsPerScaffold=1
MaxContigsPerScaffold=1

TotalBasesInScaffolds=5033903
MeanBasesInScaffolds=50848
MinBasesInScaffolds=6278
MaxBasesInScaffolds=243027
N25ScaffoldBases=164588
N50ScaffoldBases=106886
N75ScaffoldBases=66746
ScaffoldAt1000000=187342
ScaffoldAt2000000=132409
ScaffoldAt3000000=96678
ScaffoldAt4000000=55717
ScaffoldAt5000000=7374

Might be worth going back to the pre-trimmed iteration?


In [ ]:
!java -jar convertFastaAndQualToFastq.jar nanocorrect/FC20_iter1_corrected.fasta > nanocorrect/FC20_iter1_corrected.fastq
!fastqToCA -technology sanger -libraryname FC20 -reads nanocorrect/FC20_iter1_corrected.fastq > FC20_iter1.frg
!runCA -d celera_FC20_iter1 -p asm -s pbasm.spec FC20_iter1.frg

That's worse ...


In [ ]:
[Scaffolds]
TotalScaffolds=251
TotalContigsInScaffolds=251
MeanContigsPerScaffold=1.00
MinContigsPerScaffold=1
MaxContigsPerScaffold=1

TotalBasesInScaffolds=5488426
MeanBasesInScaffolds=21866
MinBasesInScaffolds=2997
MaxBasesInScaffolds=152363
N25ScaffoldBases=69791
N50ScaffoldBases=36078
N75ScaffoldBases=16270
ScaffoldAt1000000=76316
ScaffoldAt2000000=55997
ScaffoldAt3000000=28765
ScaffoldAt4000000=17169
ScaffoldAt5000000=8673

Try mapping the assembly back to the reference and see where the gaps are created?


In [ ]:
bwa mem -t16 -x ont2d refs/NC_000913.fna celera_FC20/9-terminator/asm.ctg.fasta  | samtools view -bS - | samtools sort - > celera_FC20.sorted

Scaffold gaps seem to correlate fairly well with low coverage / poorly-corrected regions, so I guess we just need to improve the correction a bit more -- perhaps more coverage is required.


In [ ]: