Datasets for the book

Here we provide links to the datasets used in the book.

Important Notes:

  1. Note that these datasets are provided on external servers by third parties
  2. Due to security issues with github you will have to cut and paste FTP links (they are not provided as clickable URLs)

Python and the Surrounding Software Ecology

Interfacing with R via rpy2

  • sequence.index Please FTP from this URL(cut and paste)

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/sequence.index

Next-generation Sequencing (NGS)

Working with modern sequence formats

  • SRR003265.filt.fastq.gz Please FTP from this URL (cut and paste)

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/sequence_read/SRR003265.filt.fastq.gz

Working with BAM files

  • NA18490_20_exome.bam Please FTP from this URL (cut and paste)

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/exome_alignment/NA18489.chrom20.ILLUMINA.bwa.YRI.exome.20121211.bam

  • NA18490_20_exome.bam.bai Please FTP from this URL (cut and paste)

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/exome_alignment/NA18489.chrom20.ILLUMINA.bwa.YRI.exome.20121211.bam.bai

Analyzing data in Variant Call Format (VCF)

  • tabix link: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/supporting/vcf_with_sample_level_annotation/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5_extra_anno.20130502.genotypes.vcf.gz

Genomics

Working with high-quality reference genomes

Dealing with low low-quality genome references

  • gambiae.fa.gz Please FTP from this URL (cut and paste) ftp://ftp.vectorbase.org/public_data/organism_data/agambiae/Genome/agambiae.CHROMOSOMES-PEST.AgamP3.fa.gz

  • atroparvus.fa.gz

Traversing genome annotations

PopGen

PDB

Parsing mmCIF files with Biopython

Python for Big genomics datasets

Setting the stage for high-performance computing

These are the exact same files as Managing datasets with PLINK above

Programing with lazyness

  • SRR003265_1.filt.fastq.gz Please ftp from this URL (cut and paste): ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/sequence_read/SRR003265_1.filt.fastq.gz
  • SRR003265_2.filt.fastq.gz Please ftp from this URL (cut and paste): ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18489/sequence_read/SRR003265_2.filt.fastq.gz

In [ ]: