The processing of a "typical" Hi-C experiment of about 200 M reads will occupy a space of around 100 GB per experiment. After the anaysis many of the intermediate files can be compressed or erased, but at it is probable that at each of the experiment/replicate will at least 50 Gb in disk.
The more the better. RAM is specially important to load matrices at high resolution, but usually 32 Gb of RAM should be enough to deal with 50 kb resolution matrices on a human genome.
No limitations here, just time. A 8 core computer should be abble to process a single Hi-C experiment (200 M reads, analyzed at 50 kb) in 3-4 days. This includes all the steps of the mapping, filtering, normalization and detection of TADs and compartments.
The 3D modeling will depend on the size of the regions to be modeled.
In this course we will use GEM, but any other alternative is just fine.
To install GEM, go to the download page: https://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%202/
and download the i3
version (the other version is for older computers, and you usually won't have to use it).
In [1]:
! wget -O GEM.tbz2 https://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%202/GEM-binaries-Linux-x86_64-core_i3-20121106-022124.tbz2/download
Uncompress the archive:
In [2]:
! tar -xjvf GEM.tbz2
And copy the needed binaries to somewhere in your PATH, like:
In [ ]:
! sudo cp GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-mapper /usr/local/bin/
! sudo cp GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer /usr/local/bin/
In case you do not have root access, just copy the binaries to some path and add this path to your global PATH:
In [ ]:
! mkdir ~/bin
! cp GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-mapper ~/bin/
! cp GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer* ~/bin/
! echo $PATH=$PATH:"~/bin/" >> ~/.bashrc
Conda (http://conda.pydata.org/docs/index.html) is a package manager, mainly hosting python programs, that is very usefull when no root access is available and the softwares have complicated dependencies.
To install is just download the installer from http://conda.pydata.org/miniconda.html
In [ ]:
! wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
And run it with all the default options. The installer will create a miniconda2
folder in your home directory where all the programs that you need will be stored (including python).
With conda you can install the needed dependencies:
In [9]:
! conda install -y scipy # scientific computing in python
! conda install -y numpy # scientific computing in python
! conda install -y matplotlib # to produce plots
! conda install -y jupyter # this notebook :)
! conda install -y -c https://conda.anaconda.org/bcbio pysam # to deal with SAM/BAM files
! conda install -y -c https://conda.anaconda.org/salilab imp # for 3D modeling
! conda install -y pip # yet another python package manager
! conda install -y -c bioconda mcl # for clustering
DSRC is a FASTQ compressor, it's not needed, but we use it as the size of the files is significantly smaller than using gunzip (>30%), and, more importantly, the access to them can be parallelized, and is much faster than any other alternative.
It can be downloaded from https://github.com/lrog/dsrc
In [13]:
! wget http://sun.aei.polsl.pl/dsrc/download/2.0rc/dsrc
In [18]:
! chmod +x dsrc
And, if you have root access:
In [ ]:
sudo mv dsrc /usr/local/bin
Otherwise, and as before:
In [ ]:
mv dsrc ~/bin
For now TADbit is not available through conda or pip package manager, so to install it we will have to clone the repository, and compile the binaries:
In [ ]:
! git clone git@github.com:3DGenomes/TADbit.git
! cd TADbit; python setup.py install