In [ ]:
# Notes
- Modify writeScriptHeader
* Load modules
- "main" starts around line 940
* wrk is defined on 1130
* really starts at 1336 (call runCA)
## Running the pipeline
### Gatekeeper
### Overlapper (MHAP)
### Correction
-
### Partition
- ~ 500 MB / RAM
- Parallelizable (199 independent jobs)
- 8 cores
PBJelly + Quiver: gap filling MHAP -> low coverage after error correcting
PBcR-MHAP
PBcR-MHAP is based on wgs-assembler (Celera). They only support SGE and LSF. Submitting a single big job consumes too much memory (gatekeeper -> 1.5TB) or could use more CPU (overlapper) on some steps
Josh: I just worked on a fungus genome which we used PacBio (Titus is involved with this). I used PBJelly and then SPAdes and got a really good assembly when paired with our Illumina data from the original sequencing project (PacBio was for genome scaffolding improvement).
Eichler lab: HGAP (using a modified Blasr, but they just added some outputs, no core modifications)
Dazzler - http://dazzlerblog.wordpress.com/ Currently just the aligner, need to figure out how to use together with other parts of the pipeline. Talk with Jason Chin, he done that for big genomes
ectools - https://github.com/jgurtowski/ectools
dbg2olc - http://arxiv.org/abs/1410.2801 http://sites.google.com/site/dbg2olc/. claim to be superfast
http://wgs-assembler.sourceforge.net/wiki/index.php/RunCA
dbg2olc
lordec
ec2tools
pacbio2ca
In [ ]: