Notebook 11/14/2014

Deleted all work from past few months! Cleaning up files and removed files in wrong directory!

Opportunity for a fresh start

Objectives

  1. step up directory structure for study
  2. setup docker for performing analysis

Directory structure


In [14]:
%%bash
tree -d ../


../
├── analysis
│   ├── bioinf
│   │   ├── genome_purity
│   │   ├── genome_structure
│   │   ├── sequence_homogeneity
│   │   └── sequence_purity
│   └── stats
├── bin
├── data
│   └── RM8375
│       ├── MiSeq
│       │   ├── bam
│       │   ├── fastq
│       │   └── vcf
│       ├── PGM
│       │   ├── bam
│       │   └── vcf
│       └── ref
├── dev
│   └── tmp
├── doc
└── src

22 directories
  1. analysis - code for performing data analysis
    1. bioinf - scripts for running bioinformatic pipelines
    2. stats - code for statistical summary of pipeline results
  2. bin - binaries for bioinformatic algorithms, note this may be included in the docker image
  3. data - raw sequence data
  4. dev - analysis playground
  5. doc - general project documentation
  6. src - source code for bioinformatic algorithms

Github repositories

  • analysis/bioinf
    • genome_structure
    • genome_purity
    • sequence_purity
    • sequence_homogenetiy

For each README.md file describing analysis approach and bash or ipython pipeline scripts


In [9]:
mv sequence* ../analysis/bioinf/

Approach download related repositories will create new directories for the three different applications and transfer related files to each.

TO DO

  1. set up github repository
  2. look into github annex for data
  3. code for creating directories for other reference materials - use -p for creating parents and {A,B,C} for multiple subdirectories
  4. vcf file for base quality score recalibration
  5. work on PGM variant call script

In [ ]: