Hi-C data are usually represented as symmetric matrices in a tab separated file, as in the example below:
chrT_001 chrT_002 chrT_003 chrT_004 chrT_005 chrT_006
chrT_001 629 164 88 105 10 35
chrT_002 86 612 175 110 40 29
chrT_003 159 216 437 105 43 73
chrT_004 100 111 146 278 70 42
chrT_005 16 36 26 79 243 39
chrT_006 19 50 53 42 37 224
TADbit allows to load most of this kind of matrices. A Hi-C matrix is loaded as this (note: this example loads human chromosome 19 from [Lieberman-Aiden2009]_ results.):
In [1]:
from pytadbit import Chromosome
# initiate a chromosome object that will store all Hi-C data and analysis
my_chrom = Chromosome(name='chr19', centromere_search=True,
species='Homo sapiens', assembly='NCBI36')
# load Hi-C data
my_chrom.add_experiment('k562', cell_type='wild type', exp_type='Hi-C', identifier='k562',
project='TADbit tutorial',
hic_data="../../scripts/sample_data/HIC_k562_chr19_chr19_100000_obs.txt", resolution=100000)
my_chrom.add_experiment('gm06690', cell_type='cancer', exp_type='Hi-C', identifier='gm06690',
project='TADbit tutorial',
hic_data="../../scripts/sample_data/HIC_gm06690_chr19_chr19_100000_obs.txt", resolution=100000)
In some cases users may found data that are not readable by TADbit. In such case, a parser function can be written and passed as an argument to the add_experiment
function:
In [2]:
def read_strange_file(f_handle):
"""
reads from file handler (already openned)
"""
nums = []
for line in f_handle:
if not line:
continue
# modify the following line to fit your parsing needings
##
values = line.split()
##
# feed the "num" list with the list of values you parsed
nums.append([int(v) for v in values])
return tuple([nums[j][i] for i in xrange(len(nums)) for j in xrange(len(nums))]), len(nums)
And call it as follow:
In [3]:
other_chrom = Chromosome(name='An other chromosome')
other_chrom.add_experiment('First Hi-C experiment', hic_data='../../src/test/data/hESC_chr19-rep1.txt',
parser=read_strange_file, resolution=20000)
Experiments, when loaded, are stored in a special kind of list attached to chromosome objects:
In [4]:
my_chrom.experiments
Out[4]:
A specific Experiment can be accessed either by its name or by its position in :class:pytadbit.chromosome.ExperimentList
:
In [5]:
my_chrom.experiments[0] == my_chrom.experiments["k562"]
Out[5]:
Two Hi-C experiments can be summed up easily, resulting in a new Hi-c experiment contatining the sum of the interaction counts of the summed experiments:
In [6]:
exp = my_chrom.experiments["k562"] + my_chrom.experiments["gm06690"]
print exp
Note the last warning asking to normalize experiments before summing them. Indeed normalizing the sum of different experiment should result in bayesing the final result towards the experiment with more raw Hi-C interaction counts.
The resulting experiment (which default name is the concatenation of the summed experiments) can be added to the experiments of a given chromosome.
In [7]:
my_chrom.add_experiment(exp)
print my_chrom.experiments
In [8]:
exp.view()
Out[8]:
This plot shows the log2 interaction counts, resulting from the given Hi-C experiment, and it can be drawn from individual experiments or from the chromosome object.
In [9]:
my_chrom.visualize([('k562', 'gm06690'), 'k562+gm06690'])
In [10]:
my_chrom.find_tad('k562', n_cpus=8)
my_chrom.find_tad('gm06690', n_cpus=8)
In [11]:
exp = my_chrom.experiments["k562"]
exp.tads
Out[11]:
The "tads" variable returned in this example is a dictionary of TADs, each of each is in turn a new dictionary containing information about the start and end positions of a TAD.
"start" and "end" values correspond respectively to the start and end positions of the given TAD in the chromosome (note that this numbers have to be multiplied by the resolution of the experiment, "exp.resolution"). The "brk" key corresponds to the value of "end", all "brk" together corresponds to all TAD's boundaries.
TADs can also be seen or saved to a file using this write function:
In [12]:
exp.write_tad_borders()
Another way to view TADs is using the matrix visualization:
In [13]:
my_chrom.visualize(exp.name, paint_tads=True)
In this case we can also put them side by side, view a given region, and use nrmalized data instead of the log2 of raw data:
In [14]:
my_chrom.visualize([('k562', 'gm06690')], paint_tads=True, focus=(490,620), normalized=True)
Note that the width of the line is proportional to the score of the TAD border.
Finally TAD bourders can be seen using the density plot summary:
In [15]:
my_chrom.tad_density_plot('k562')
In this plot, each grey-filled-arc represents a TAD. The eight of the TAD is proportional to the relative amount of interactions in this TAD. If this relative amount of interactions is higher than 1 (dark grey TADs) the number of interactions inside the TAD is higher than expected accordin to its size (a black horizontal line, highlights this theoretical threshold).
TAD borders in this plot are marked by colored triangles. The color code, from blue to red, represents how confident is TADbit about the identification of the border.
In [16]:
my_chrom.find_tad(['k562', 'gm06690'], batch_mode=True, n_cpus=8)
print my_chrom.experiments
In [17]:
my_chrom.set_max_tad_size(3000000)
In [18]:
my_chrom.visualize('k562', paint_tads=True)
Another optional inspection performed by TADbit is the presence of centromeric regions. TADbit assumes that the larger gap found in a Hi-C matrix corresponds to the centromere. In the case of this example we set the argument "search_centromere=True", but the default is False, as it may give unexpected results in telocentric chromosomes.
The search for centromere is updated and refined each time a new experiment is linked to a given Chromosome. Typically, TADs calculated by the :func:pytadbit.tadbit.tadbit
function include centromeric regions; if a centromere is found, TADbit will split the TAD containing it into two TADs, one ending before the centromere and one starting after.
In [19]:
my_chrom.save_chromosome("some_path.tdb", force=True)
In [20]:
from pytadbit import load_chromosome
my_chrom = load_chromosome("some_path.tdb")
print my_chrom.experiments
In [21]:
expr = my_chrom.experiments["k562"]
expr.load_hic_data("../../scripts/sample_data/HIC_k562_chr19_chr19_100000_obs.txt")