Download the data from NCBI

We will use data from (LeDily et al., 2014), which comes from T47D cells (cancer cell line), and where Hi-C experiment where conducted under two conditions, before and after treatment with hormone, and with two restriction enzyme.

The data can be downloaded from:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53463

Once downloaded the files can be passed to the FASTQ format in order for TADbit to read them. This can be done with the fastq-dump program from SRA Toolkit (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software)


In [ ]:
! fastq-dump SRR398318.sra

Note: alternatively you can also directly download the FASTQ from http://www.ebi.ac.uk/

Compression

Internally we use DSRC (Roguski and Deorowicz, 2014) that allows better compression ration and, more importatntly, faster decompression:


In [ ]:
! dsrc c -t8 SRR398318.fastq SRR398318.fastq.dsrc

References

[^](#ref-1) Le Dily, Fran\c{cois and Baù, Davide and Pohl, Andy and Vicent, Guillermo P and Serra, Fran\c{cois and Soronellas, Daniel and Castellano, Giancarlo and Wright, Roni HG and Ballare, Cecilia and Filion, Guillaume and others. 2014. Distinct structural transitions of chromatin topological domains correlate with coordinated hormone-induced gene regulation.

[^](#ref-2) Roguski, \Lukasz and Deorowicz, Sebastian. 2014. DSRC 2—Industry-oriented compression of FASTQ files.