Overall layout of a Molecule object.

The Molecule object has the following architecture:

A Molecule is composed of Models (conformations)
A Model is composed of Residues
A Residue is composed of Atoms

Atom object is the basic component in SBio python, which is a container of an atomic coordinate line in a 3D-coordinates file. For example, when reading a pdb file, each line starting with ‘ATOM’ or ‘HETATM’ will be used to create an Atom object.
Residue object is used to represent a residue (amino acid or nucleic acid residue, sometimes could be a small molecule.) in a macromolecule. Residue object is composed of several Atom object of atoms belong to a specific residue.
Model object is composed of Residue objects and used to represent a model (or conformation) of a macromolecule.
Molecule object is the top container for Atom objects. Molecule can have at least 1 Model object.

Access

Atom, Residue and Model objects are stored in a python dict of their parent containers. Access of Atom, Residue and Model objects within a Molecule object is supported by using the properties of python object. Normally a Model object will be assigned with the key of ‘m+number’, such as ‘m1, m100’. Suppose we have a Molecule object name ‘mol’, then the 1st Model object of ‘mol’ is ‘mol.m1’. Residue objects within a model will be assigned with the key of ‘chain id+residue serial’, 1st residue of chain A will have the name of ‘A1’, and can be access by ‘mol.m1.A1’. The name of an atom in its 3D coordinates will be used as the key. Then an Atom object with the name of ‘CA’ in residue ‘A1’ in 1st model can be access by ‘mol.m1.A1.CA’. However some atom has the name end with quotes, in this case the quotes will be replaced by ‘’ (underscore). E.g., the key for “C5’” will be ‘C5’

Create a Molecule object

Molecule objects can be created from PDB files or other formats (to be implemented), for example:



In [2]:

    
from SBio import *
mol = create_molecule('test.pdb')
mol









    Out[2]:





test.pdb

navigate through a Molecule object:



In [2]:

    
for m in mol.get_model():
    for r in m.get_residue():
        print(r)









    



test.pdb.m1.A.DT.1
test.pdb.m1.A.DA.2
test.pdb.m2.A.DT.1
test.pdb.m2.A.DA.2

The "get_model", "get_atom" and "get_residue" are python generators, can be more conveniently used like this:



In [3]:

    
atoms=mol.m1.get_atom()
residue=mol.m1.get_residue()
for r in residue:
    print(r)









    



test.pdb.m1.A.DT.1
test.pdb.m1.A.DA.2

write coordinates to pdb

Both Molecule and Model object can be written into pdb file. Thus it provides a method to split pdb file with multiple conformations.



In [4]:

    
mol.write_pdb('mol_new.pdb')       # write all conformation into a single pdb file
i = 1
for m in mol.get_model():
    name = 'mol_m'+str(i)+'.pdb'
    m.write_pdb(name)  #write one conformation to a single pdb file
    i+=1









    



Successfully write pdb: mol_new.pdb
Successfully write pdb: mol_m1.pdb
Successfully write pdb: mol_m2.pdb

get information of a molecule

The 'Model' module provide several methods for extraction information of a molecule

get_atom_num: return the number of atoms in a molecule
get_residue_list: return a list of residue of a molecule
get_sequence: return the sequence information of a molecule
write_fasta: write sequence information into a fasta file
get_mw: return the molecular weight of a molecule
get_dist_matrix: compute the complete inter-atomic distance matrix
usage:



In [10]:

    
m1 = mol.m1
print(m1.get_atom_num())
print(m1.get_residue_list())
print(m1.get_sequence('A')) #the sequence of chain A
m1.write_fasta('A', 'test.fasta', comment='test')
m1.get_mw()









    



62
['A1', 'A2']
TA
Writing sequence to test.fasta






    Out[10]:





553.4192210000001

compute geometry information

The 'Geometry' module contains several methods for the measurement of distance, angle and torsion angle among atoms:



In [6]:

    
a1 = mol.m1.A2.O4_   # the actual name for this atom is "O4'"
a2 = mol.m1.A2.C1_
a3 = mol.m1.A2.N9
a4 = mol.m1.A2.C4
get_distance(a1, a2)









    Out[6]:





1.4165210905595438



In [7]:

    
get_angle(a1, a2, a3)









    Out[7]:





110.19401514913508



In [8]:

    
chi = get_torsion(a1,a2,a3,a4)
print('the CHI torsion angle is {}'.format(chi))









    



the CHI torsion angle is -114.6114797744751

compute the interaction between atoms

The 'Interaction' module provides several methods to check the interaction between atoms:

get_hydrogen_bond: check whether hydrogen bond formation between given atoms
get_polar_interaction: compute the polar interaction between given atoms
get_pi_pi_interaction: compute the aromatic pi-pi interaction between given atom groups



In [11]:

    
a5 = m1.A2.N6
a6 = m1.A2.H61
a7 = m1.A1.O2
print(get_hydrogen_bond(a5, a7, a6))  #arguments order: donor, acceptor, donor_H=None
print(get_polar_interaction(a5, a7))









    



(False, 10.638108149478457, 118.41191960964946)
(False, 10.638108149478457)

Structure alignment

The 'Structural_alignment' module provide funtion to align a set of molecules. The RMSD values for the strcuture superimpose can be calculated, coordinates of the aligned structure can also be updated.



In [3]:

    
m1 = mol.m1
m2 = mol.m2
molecule_list = [m1,m2]

residue_range = [[1,2],[1,2]]
sa = Structural_alignment(molecule_list, residue_range, update_coord=False)
sa.run()
print(sa.rmsd)









    



[2.0068643949461528e-15, 0.94370990160247836]

Sequence alignment

The 'Structural_alignment' module is used to deal with the multiple sequence alignment to mapping residues between different residues, i.e., to get the residue serials of the conserved residues among different molecules. The conserved residue serials can than be used in the structure alignment.



In [6]:

    
seq = 'D:\\python\\structural bioinformatics_in_python\\PPO-crystal.clustal'
alignment=Seq_align_parser(seq)
alignment.run()
con_res = []
con_res.append(alignment.align_res_mask['O24164|PPOM_TOBAC      '])
con_res.append(alignment.align_res_mask['P56601|PPOX_MYXXA      '])
print(con_res[0])  #conserved residue in 'PPOM_TOBAC'
print(con_res[1])









    



[16, 20, 22, 25, 26, 27, 43, 49, 50, 59, 65, 79, 83, 85, 104, 155, 158, 161, 175, 179, 184, 185, 190, 191, 197, 202, 205, 206, 241, 245, 253, 264, 267, 300, 305, 337, 352, 354, 379, 388, 394, 427, 440, 442, 467, 472]
[12, 16, 18, 21, 22, 23, 39, 45, 46, 55, 61, 75, 79, 81, 101, 147, 150, 153, 167, 171, 176, 177, 182, 183, 189, 194, 197, 198, 229, 233, 241, 251, 254, 279, 284, 314, 328, 330, 352, 361, 367, 400, 413, 415, 440, 445]

Structural analysis for nucleic acids and protein

The 'Structural_analysis' and 'Plot' modules provide methods for simple structural analysis and visulization for nucleic and protein. For example, the backbone torsion angle, the puckering of the sugar of nucleeotide, can be computed and plotted. (see examples for more detail)

Standard biomolecular data

The 'Data' module provides standard syntax for biomolecule, such as the standard name for amino acid and nucleic acid residue, molecular weights, etc.