This is the configuration file for our Annai Systems virtual machine.
Link to CGHub manifest describing the contents of its database.
In [4]:
cghub_manifest = 'https://cghub.ucsc.edu/reports/SUMMARY_STATS/LATEST_MANIFEST.tsv'
CGHUB download url
In [5]:
CGHUB = 'https://cghub.ucsc.edu/cghub/data/analysis/download'
There paths are on the local file-system (cellar).
In [6]:
DBSNP = '/mnt/projects/resouces/bcbio-nextgen/genomes/Hsapiens/GRCh37/variation/dbsnp_138.vcf'
COSMIC = '/mnt/projects/resouces/bcbio-nextgen/genomes/Hsapiens/GRCh37/variation/cosmic-v67_20131024-GRCh37.vcf'
REFERENCE = '/mnt/projects/resouces/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa'
Path to key for downloading data from CGHub. You need to contact CGHub to get a key.
In [7]:
KEY = '/home/centos/cghub.key'
Path to MuTect. I was getting bugs with version 1.1.4 so make sure you use this version.
In [8]:
MUTECT_JAR = '/usr/local/share/java/mutect/muTect-1.1.5.jar'
Path to SomaticIndelDetector jar. This is in a specific version of GATK, so make sure you are using version 2.2-2.
In [9]:
SID_JAR = '/home/centos/projects/pipeline/progs/GenomeAnalysisTK-2.2-2/GenomeAnalysisTK.jar'
Path to cache directory. This is only used in conjuntion with GT-Fuse. This is important because the bam files can pile up in the cash and eat up all of the space on a hard drive very quickly.
In [10]:
CACHE = '/home/centos/cache/fusecache'
Number of processes you want to spawn with the variant calling. I'm taking a quick and dirty pass here and just running a bunch of bash scripts simultaniously. At some point we will probably switch to a scheduler, but we don't have one on our Annai VM currently and this seems to be working ok for now.
In [11]:
NUM_PROCESSES = 16
Directory to store the data on whatever machine you are running the scripts on (we use a VM).
In [12]:
VM_DIRECTORY = '/home/centos/projects'
Local directory to spit out the bash scripts.
In [13]:
LOCAL_DIRECTORY = '/cellar/users/agross/scripts'
In most situations you can probably get away with using the default java on your system as long as its version 7 of the JDK.
In [14]:
JAVA = 'java'
The command being run for GTFuse.
In [15]:
GT_FUSE = 'gtfuse -c {} --inactivity-timeout=2'.format(KEY)