Hardware

Disk space

The processing of a "typical" Hi-C experiment of about 200 M reads will occupy a space of around 100 GB per experiment. After the anaysis many of the intermediate files can be compressed or erased, but at it is probable that at each of the experiment/replicate will at least 50 Gb in disk.

RAM memory

The more the better. RAM is specially important to load matrices at high resolution, but usually 32 Gb of RAM should be enough to deal with 50 kb resolution matrices on a human genome.

CPUs

No limitations here, just time. A 8 core computer should be abble to process a single Hi-C experiment (200 M reads, analyzed at 50 kb) in 3-4 days. This includes all the steps of the mapping, filtering, normalization and detection of TADs and compartments.

The 3D modeling will depend on the size of the regions to be modeled.

Software

GEM Mapper

In this course we will use GEM, but any other alternative is just fine.

To install GEM, go to the download page: https://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%202/ and download the i3 version (the other version is for older computers, and you usually won't have to use it).


In [1]:
! wget -O GEM.tbz2 https://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%202/GEM-binaries-Linux-x86_64-core_i3-20121106-022124.tbz2/download


--2017-04-12 15:42:34--  https://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%202/GEM-binaries-Linux-x86_64-core_i3-20121106-022124.tbz2/download
Resolving sourceforge.net (sourceforge.net)... 216.34.181.60
Connecting to sourceforge.net (sourceforge.net)|216.34.181.60|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://downloads.sourceforge.net/project/gemlibrary/gem-library/Binary%20pre-release%202/GEM-binaries-Linux-x86_64-core_i3-20121106-022124.tbz2?r=&ts=1492004555&use_mirror=vorboss [following]
--2017-04-12 15:42:35--  https://downloads.sourceforge.net/project/gemlibrary/gem-library/Binary%20pre-release%202/GEM-binaries-Linux-x86_64-core_i3-20121106-022124.tbz2?r=&ts=1492004555&use_mirror=vorboss
Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 216.34.181.59
Connecting to downloads.sourceforge.net (downloads.sourceforge.net)|216.34.181.59|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://vorboss.dl.sourceforge.net/project/gemlibrary/gem-library/Binary%20pre-release%202/GEM-binaries-Linux-x86_64-core_i3-20121106-022124.tbz2 [following]
--2017-04-12 15:42:35--  https://vorboss.dl.sourceforge.net/project/gemlibrary/gem-library/Binary%20pre-release%202/GEM-binaries-Linux-x86_64-core_i3-20121106-022124.tbz2
Resolving vorboss.dl.sourceforge.net (vorboss.dl.sourceforge.net)... 5.10.152.194
Connecting to vorboss.dl.sourceforge.net (vorboss.dl.sourceforge.net)|5.10.152.194|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3999929 (3,8M) [application/octet-stream]
Saving to: ‘GEM.tbz2’

GEM.tbz2            100%[===================>]   3,81M  9,27MB/s    in 0,4s    

2017-04-12 15:42:36 (9,27 MB/s) - ‘GEM.tbz2’ saved [3999929/3999929]

Uncompress the archive:


In [2]:
! tar -xjvf GEM.tbz2


GEM-binaries-Linux-x86_64-core_i3-20121106-022124/
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer.man
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-map-2-map.man
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-mapper.man
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer_fasta2meta+cont
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-2-sam
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-map-2-map
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer_generate
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-2-sam.man
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-mapper
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer_bwt-dna
GEM-binaries-Linux-x86_64-core_i3-20121106-022124/LICENSE

And copy the needed binaries to somewhere in your PATH, like:


In [ ]:
! sudo cp GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-mapper /usr/local/bin/
! sudo cp GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer /usr/local/bin/

In case you do not have root access, just copy the binaries to some path and add this path to your global PATH:


In [ ]:
! mkdir ~/bin
! cp GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-mapper ~/bin/
! cp GEM-binaries-Linux-x86_64-core_i3-20121106-022124/gem-indexer* ~/bin/
! echo $PATH=$PATH:"~/bin/" >> ~/.bashrc

Conda

Conda (http://conda.pydata.org/docs/index.html) is a package manager, mainly hosting python programs, that is very usefull when no root access is available and the softwares have complicated dependencies.

To install is just download the installer from http://conda.pydata.org/miniconda.html


In [ ]:
! wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh

And run it with all the default options. The installer will create a miniconda2 folder in your home directory where all the programs that you need will be stored (including python).

Python dependencies

With conda you can install the needed dependencies:


In [9]:
! conda install -y scipy # scientific computing in python
! conda install -y numpy # scientific computing in python
! conda install -y matplotlib # to produce plots
! conda install -y jupyter # this notebook :)
! conda install -y -c https://conda.anaconda.org/bcbio pysam # to deal with SAM/BAM files
! conda install -y -c https://conda.anaconda.org/salilab imp # for 3D modeling
! conda install -y pip # yet another python package manager
! conda install -y -c bioconda mcl # for clustering


Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /home/fransua/.miniconda2:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    scipy-0.18.1               |      np111py27_0        30.9 MB

The following packages will be UPDATED:

    scipy: 0.17.1-np111py27_1 --> 0.18.1-np111py27_0

Fetching packages ...
scipy-0.18.1-n 100% |################################| Time: 0:00:08   4.05 MB/s
Extracting packages ...
[      COMPLETE      ]|###################################################| 100%
Unlinking packages ...
[      COMPLETE      ]|###################################################| 100%
Linking packages ...
[      COMPLETE      ]|###################################################| 100%
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /home/fransua/.miniconda2:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    numpy-1.11.2               |           py27_0         6.2 MB

The following packages will be UPDATED:

    numpy: 1.11.1-py27_0 --> 1.11.2-py27_0

Fetching packages ...
numpy-1.11.2-p 100% |################################| Time: 0:00:04   1.50 MB/s
Extracting packages ...
[      COMPLETE      ]|###################################################| 100%
Unlinking packages ...
[      COMPLETE      ]|###################################################| 100%
Linking packages ...
[      COMPLETE      ]|###################################################| 100%
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /home/fransua/.miniconda2:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    expat-2.1.0                |                0         365 KB
    libxcb-1.12                |                0         1.5 MB
    dbus-1.10.10               |                0         2.4 MB
    libpng-1.6.22              |                0         214 KB
    gstreamer-1.8.0            |                0         2.6 MB
    sip-4.18                   |           py27_0         264 KB
    fontconfig-2.11.1          |                6         405 KB
    gst-plugins-base-1.8.0     |                0         3.1 MB
    qt-5.6.0                   |                0        43.9 MB
    pyqt-5.6.0                 |           py27_0         5.3 MB
    matplotlib-1.5.3           |      np111py27_0         8.2 MB
    ------------------------------------------------------------
                                           Total:        68.2 MB

The following NEW packages will be INSTALLED:

    dbus:             1.10.10-0        
    expat:            2.1.0-0          
    gst-plugins-base: 1.8.0-0          
    gstreamer:        1.8.0-0          
    libxcb:           1.12-0           

The following packages will be UPDATED:

    fontconfig:       2.11.1-5          --> 2.11.1-6         
    libpng:           1.6.17-0          --> 1.6.22-0         
    matplotlib:       1.5.1-np110py27_0 --> 1.5.3-np111py27_0
    pyqt:             4.11.4-py27_1     --> 5.6.0-py27_0     
    qt:               4.8.7-1           --> 5.6.0-0          
    sip:              4.16.9-py27_0     --> 4.18-py27_0      

Fetching packages ...
expat-2.1.0-0. 100% |################################| Time: 0:00:01 372.01 kB/s
libxcb-1.12-0. 100% |################################| Time: 0:00:01 887.07 kB/s
dbus-1.10.10-0 100% |################################| Time: 0:00:03 818.91 kB/s
libpng-1.6.22- 100% |################################| Time: 0:00:00 287.09 kB/s
gstreamer-1.8. 100% |################################| Time: 0:00:03 836.25 kB/s
sip-4.18-py27_ 100% |################################| Time: 0:00:00 286.39 kB/s
fontconfig-2.1 100% |################################| Time: 0:00:01 349.04 kB/s
gst-plugins-ba 100% |################################| Time: 0:00:06 539.21 kB/s
qt-5.6.0-0.tar 100% |################################| Time: 0:00:06   6.97 MB/s
pyqt-5.6.0-py2 100% |################################| Time: 0:00:03   1.76 MB/s
matplotlib-1.5 100% |################################| Time: 0:00:04   1.96 MB/s
Extracting packages ...
[      COMPLETE      ]|###################################################| 100%
Unlinking packages ...
[      COMPLETE      ]|###################################################| 100%
Linking packages ...
[      COMPLETE      ]|###################################################| 100%
Fetching package metadata .........
Solving package specifications: ..........

# All requested packages already installed.
# packages in environment at /home/fransua/.miniconda2:
#
pysam                     0.8.4pre0                py27_0    bcbio
Fetching package metadata .........
Solving package specifications: ..........

Package plan for installation in environment /home/fransua/.miniconda2:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    numpy-1.10.4               |           py27_2         6.0 MB
    numexpr-2.6.0              |      np110py27_0         351 KB
    scipy-0.17.1               |      np110py27_1        30.1 MB
    scikit-learn-0.17.1        |      np110py27_2         8.6 MB
    ------------------------------------------------------------
                                           Total:        45.0 MB

The following packages will be UPDATED:

    hdf5:         1.8.15.1-3         --> 1.8.16-0          
    numexpr:      2.6.0-np111py27_0  --> 2.6.0-np110py27_0 
    scikit-learn: 0.17.1-np111py27_2 --> 0.17.1-np110py27_2

The following packages will be DOWNGRADED due to dependency conflicts:

    libpng:       1.6.22-0           --> 1.6.17-0          
    numpy:        1.11.2-py27_0      --> 1.10.4-py27_2     
    scipy:        0.18.1-np111py27_0 --> 0.17.1-np110py27_1

Fetching packages ...
numpy-1.10.4-p 100% |################################| Time: 0:00:04   1.39 MB/s
numexpr-2.6.0- 100% |################################| Time: 0:00:01 309.06 kB/s
scipy-0.17.1-n 100% |################################| Time: 0:00:04   7.09 MB/s
scikit-learn-0 100% |################################| Time: 0:00:04   2.09 MB/s
Extracting packages ...
[      COMPLETE      ]|###################################################| 100%
Unlinking packages ...
[      COMPLETE      ]|###################################################| 100%
Linking packages ...
[      COMPLETE      ]|###################################################| 100%
Fetching package metadata .......
Solving package specifications: ..........

# All requested packages already installed.
# packages in environment at /home/fransua/.miniconda2:
#
pip                       8.1.2                    py27_0  
Fetching package metadata .........
Solving package specifications: ..........

Package plan for installation in environment /home/fransua/.miniconda2:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    mcl-14.137                 |                1         6.4 MB  bioconda

The following NEW packages will be INSTALLED:

    mcl: 14.137-1 bioconda

Fetching packages ...
mcl-14.137-1.t 100% |################################| Time: 0:00:04   1.61 MB/s
Extracting packages ...
[      COMPLETE      ]|###################################################| 100%
Linking packages ...
[      COMPLETE      ]|###################################################| 100%

DSRC FASTQ compressor

DSRC is a FASTQ compressor, it's not needed, but we use it as the size of the files is significantly smaller than using gunzip (>30%), and, more importantly, the access to them can be parallelized, and is much faster than any other alternative.

It can be downloaded from https://github.com/lrog/dsrc


In [13]:
! wget http://sun.aei.polsl.pl/dsrc/download/2.0rc/dsrc


--2016-10-09 18:55:54--  http://sun.aei.polsl.pl/dsrc/download/2.0rc/dsrc
Resolving sun.aei.polsl.pl (sun.aei.polsl.pl)... 157.158.77.3
Connecting to sun.aei.polsl.pl (sun.aei.polsl.pl)|157.158.77.3|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1761768 (1.7M) [text/plain]
Saving to: ‘dsrc’

dsrc                100%[===================>]   1.68M  1017KB/s    in 1.7s    

2016-10-09 18:55:56 (1017 KB/s) - ‘dsrc’ saved [1761768/1761768]


In [18]:
! chmod +x dsrc

And, if you have root access:


In [ ]:
sudo mv dsrc /usr/local/bin

Otherwise, and as before:


In [ ]:
mv dsrc ~/bin

TADbit

For now TADbit is not available through conda or pip package manager, so to install it we will have to clone the repository, and compile the binaries:


In [ ]:
! git clone git@github.com:3DGenomes/TADbit.git
! cd TADbit; python setup.py install