In [1]:
from pytadbit import load_chromosome
from pytadbit.parsers.hic_parser import load_hic_data_from_bam
In [2]:
crm = load_chromosome('results/fragment/chr3.tdb')
In [3]:
B, PSC = crm.experiments
In [4]:
B, PSC
Out[4]:
Load raw data matrices, and normalized matrices
In [5]:
base_path = 'results/fragment/{0}_both/03_filtering/valid_reads12_{0}.bam'
bias_path = 'results/fragment/{0}_both/04_normalizing/biases_{0}_both_{1}kb.biases'
reso = 100000
chrname = 'chr3'
cel1 = 'mouse_B'
cel2 = 'mouse_PSC'
In [6]:
hic_data1 = load_hic_data_from_bam(base_path.format(cel1),
resolution=reso,
region='chr3',
biases=bias_path.format(cel1, reso // 1000),
ncpus=8)
hic_data2 = load_hic_data_from_bam(base_path.format(cel2),
resolution=reso,
region='chr3',
biases=bias_path.format(cel2, reso // 1000),
ncpus=8)
In [7]:
B.load_hic_data([hic_data1.get_matrix(focus='chr3')])
B.load_norm_data([hic_data1.get_matrix(focus='chr3', normalized=True)])
PSC.load_hic_data([hic_data2.get_matrix(focus='chr3')])
PSC.load_norm_data([hic_data2.get_matrix(focus='chr3', normalized=True)])
We use the best parameters obtained from the optimization process to produce an ensemble of models.
From the models produced (n_models) we will tell TADbit to conserve a number of them (n_keep) that best satisfy the imposed restraints.
In [8]:
optimal_params = {'dcutoff': 2.0,
'kbending': 0.0,
'kforce': 5,
'lowfreq': -0.6,
'maxdist': 2000.0,
'reference': 'Stadhouders R, Vidal E, Serra F, Di Stefano B et al. 2018',
'scale': 0.01,
'upfreq': 0.0}
In [9]:
models_B = B.model_region(start=300, end=360, n_models=400, n_keep=100, n_cpus=8,
config=optimal_params)
In [10]:
models_PSC = PSC.model_region(start=300, end=360, n_models=400, n_keep=100, n_cpus=8,
config=optimal_params)
The ensemble of models have inherited the description from the Chromosome object
In [11]:
print(models_B.description)
We still can access to the experiment object from the 3D models:
In [12]:
print(models_B.experiment)
print(models_PSC.experiment)
We can have a look at the data that was used to define restraints:
In [13]:
models_B.zscore_plot()
In [14]:
models_PSC.zscore_plot()
and also visualize how the IMP objective function (OF) of the stored model improves during the MOnte Carlo optimization:
In [15]:
model = models_B[0]
model.objective_function(log=True, smooth=False)
The definition of the "best models" can be changed at any time. Only the best models will be used in the analysis.
Select top 10 models
In [16]:
models_B.define_best_models(10)
print("Lowest 10 IMP OF models:")
print(models_B)
Select top 100 models
In [17]:
models_B.define_best_models(100)
print("Lowest 100 IMP OF models:")
print(models_B)
The ensemble of models "models_B" and "models_PSC" contain the models generated by the Montecarlo simulation ordered by its Objective Function (OF). The first model in the list is the one than best satisfies the imposed restraints.
To get the data for the lowest IMP OF model in the set of models we retrieve model number 0
In [18]:
model = models_B[0]
print(model)
We can check the correlation of models_B with the original HiC matrix.
In the plot "Real vs modelled data" we should see a positive correlation of the contacts in the models with the frequency of interaction of the pairs of beads in the HiC matrix. High interaction frequency between two loci in the matrix is reflected by the fact of having a high proportion of models where the beads representing those two loci are "in contact" (distance lower than the cutoff).
In [19]:
models_B.correlate_with_real_data(plot=True, cutoff=2000)
Out[19]:
To plot all the models in the ensemble we use the view_models function. By default the centroid (the model closer to the median) is highlighted.
In [27]:
models_PSC.view_models(tool='plot')
We can also plot individual models.
In [28]:
models_PSC.view_models(models=[0], tool='plot')
And use Chimera (https://www.cgl.ucsf.edu/chimera/) for the visualization of the 3D structure
In [29]:
models_PSC.view_models(models=[0], tool='chimera')
In the Montecarlo simulation each of the models is built starting from a random initial conformation. Therefore models are not aligned in a preferred orientation. We can use the function align_models to rotate and translate the coordinates of the models so they follow the same orientation as one of the models in the ensemble. By default the model used as reference is the first one.
In [31]:
models_PSC.align_models(in_place=True)
With the function deconvolve we obtain a deconvolution analysis of a given froup of models.It first clusters models based on structural comparison (dRMSD). Differential contact map between each possible pair of clusters is shown in the resulting graph. This allows us to detect common sets of contacts in the ensemble.
In [32]:
models_PSC.deconvolve(fact=0.35, dcutoff=2000, represent_models='best', n_best_clusters=5)
The clustering of the models by Markov Cluster Algorith (MCL) or Ward can be based on different statistics measures (score, rmsd, drmsd or eqv). By default a score computed as a combination of rmsd, drmsd and eqv is used.
In [37]:
# Cluster models based on structural similarity
models_B.cluster_models(fact=0.95, dcutoff=1000)
print(models_B.clusters)
The analysis dendogram allows us to view the different clusters population and their OF values.
In [39]:
# Plot the resulting clusers
cl = models_B.cluster_analysis_dendrogram()
In [40]:
# Cluster models based on structural similarity
models_PSC.cluster_models(fact=0.95, dcutoff=1000)
print(models_PSC.clusters)
In [41]:
# Plot the resulting clusers
cl = models_PSC.cluster_analysis_dendrogram()
Model consistency gives a measure of the variability of the particles accross a set of models. Particles in the same position accross different models are considered consistent if their distance is less than the given cutoff.
In [42]:
# Calculate the consistency plot for all models in the first cluster (cluster 0)
models_B.model_consistency(cluster=1, cutoffs=(1000,1500))
In [43]:
# Calculate the consistency plot for all models in the first cluster (cluster 0)
models_PSC.model_consistency(cluster=1, cutoffs=(1000,1500))
In the modelling we have used a scale of 0.01 nm/bp; that means that if we expect 100 bp/nm of chromatin in each bead and between two consecutives beads.
In [44]:
# Calculate a DNA density plot
models_B.density_plot(cluster=1)
In [45]:
# Calculate a DNA density plot
models_PSC.density_plot()