Overall layout of a Molecule object.

The Molecule object has the following architecture:

  • A Molecule is composed of Models (conformations)
  • A Model is composed of Residues
  • A Residue is composed of Atoms
  • Atom object is the basic component in SBio python, which is a container of an atomic coordinate line in a 3D-coordinates file. For example, when reading a pdb file, each line starting with ‘ATOM’ or ‘HETATM’ will be used to create an Atom object.
  • Residue object is used to represent a residue (amino acid or nucleic acid residue, sometimes could be a small molecule.) in a macromolecule. Residue object is composed of several Atom object of atoms belong to a specific residue.
  • Model object is composed of Residue objects and used to represent a model (or conformation) of a macromolecule.
  • Molecule object is the top container for Atom objects. Molecule can have at least 1 Model object.

Access

Atom, Residue and Model objects are stored in a python dict of their parent containers. Access of Atom, Residue and Model objects within a Molecule object is supported by using the properties of python object. Normally a Model object will be assigned with the key of ‘m+number’, such as ‘m1, m100’. Suppose we have a Molecule object name ‘mol’, then the 1st Model object of ‘mol’ is ‘mol.m1’. Residue objects within a model will be assigned with the key of ‘chain id+residue serial’, 1st residue of chain A will have the name of ‘A1’, and can be access by ‘mol.m1.A1’. The name of an atom in its 3D coordinates will be used as the key. Then an Atom object with the name of ‘CA’ in residue ‘A1’ in 1st model can be access by ‘mol.m1.A1.CA’. However some atom has the name end with quotes, in this case the quotes will be replaced by ‘’ (underscore). E.g., the key for “C5’” will be ‘C5

Create a Molecule object

Molecule objects can be created from PDB files or other formats (to be implemented), for example:


In [2]:
from SBio import *
mol = create_molecule('test.pdb')
mol


Out[2]:
test.pdb

navigate through a Molecule object:


In [2]:
for m in mol.get_model():
    for r in m.get_residue():
        print(r)


test.pdb.m1.A.DT.1
test.pdb.m1.A.DA.2
test.pdb.m2.A.DT.1
test.pdb.m2.A.DA.2

The "get_model", "get_atom" and "get_residue" are python generators, can be more conveniently used like this:


In [3]:
atoms=mol.m1.get_atom()
residue=mol.m1.get_residue()
for r in residue:
    print(r)


test.pdb.m1.A.DT.1
test.pdb.m1.A.DA.2

write coordinates to pdb

Both Molecule and Model object can be written into pdb file. Thus it provides a method to split pdb file with multiple conformations.


In [4]:
mol.write_pdb('mol_new.pdb')       # write all conformation into a single pdb file
i = 1
for m in mol.get_model():
    name = 'mol_m'+str(i)+'.pdb'
    m.write_pdb(name)  #write one conformation to a single pdb file
    i+=1


Successfully write pdb: mol_new.pdb
Successfully write pdb: mol_m1.pdb
Successfully write pdb: mol_m2.pdb

get information of a molecule

The 'Model' module provide several methods for extraction information of a molecule

  • get_atom_num: return the number of atoms in a molecule
  • get_residue_list: return a list of residue of a molecule
  • get_sequence: return the sequence information of a molecule
  • write_fasta: write sequence information into a fasta file
  • get_mw: return the molecular weight of a molecule
  • get_dist_matrix: compute the complete inter-atomic distance matrix
    usage:

In [10]:
m1 = mol.m1
print(m1.get_atom_num())
print(m1.get_residue_list())
print(m1.get_sequence('A')) #the sequence of chain A
m1.write_fasta('A', 'test.fasta', comment='test')
m1.get_mw()


62
['A1', 'A2']
TA
Writing sequence to test.fasta
Out[10]:
553.4192210000001

compute geometry information

The 'Geometry' module contains several methods for the measurement of distance, angle and torsion angle among atoms:


In [6]:
a1 = mol.m1.A2.O4_   # the actual name for this atom is "O4'"
a2 = mol.m1.A2.C1_
a3 = mol.m1.A2.N9
a4 = mol.m1.A2.C4
get_distance(a1, a2)


Out[6]:
1.4165210905595438

In [7]:
get_angle(a1, a2, a3)


Out[7]:
110.19401514913508

In [8]:
chi = get_torsion(a1,a2,a3,a4)
print('the CHI torsion angle is {}'.format(chi))


the CHI torsion angle is -114.6114797744751

compute the interaction between atoms

The 'Interaction' module provides several methods to check the interaction between atoms:

  • get_hydrogen_bond: check whether hydrogen bond formation between given atoms
  • get_polar_interaction: compute the polar interaction between given atoms
  • get_pi_pi_interaction: compute the aromatic pi-pi interaction between given atom groups

In [11]:
a5 = m1.A2.N6
a6 = m1.A2.H61
a7 = m1.A1.O2
print(get_hydrogen_bond(a5, a7, a6))  #arguments order: donor, acceptor, donor_H=None
print(get_polar_interaction(a5, a7))


(False, 10.638108149478457, 118.41191960964946)
(False, 10.638108149478457)

Structure alignment

The 'Structural_alignment' module provide funtion to align a set of molecules. The RMSD values for the strcuture superimpose can be calculated, coordinates of the aligned structure can also be updated.


In [3]:
m1 = mol.m1
m2 = mol.m2
molecule_list = [m1,m2]

residue_range = [[1,2],[1,2]]
sa = Structural_alignment(molecule_list, residue_range, update_coord=False)
sa.run()
print(sa.rmsd)


[2.0068643949461528e-15, 0.94370990160247836]

Sequence alignment

The 'Structural_alignment' module is used to deal with the multiple sequence alignment to mapping residues between different residues, i.e., to get the residue serials of the conserved residues among different molecules. The conserved residue serials can than be used in the structure alignment.


In [6]:
seq = 'D:\\python\\structural bioinformatics_in_python\\PPO-crystal.clustal'
alignment=Seq_align_parser(seq)
alignment.run()
con_res = []
con_res.append(alignment.align_res_mask['O24164|PPOM_TOBAC      '])
con_res.append(alignment.align_res_mask['P56601|PPOX_MYXXA      '])
print(con_res[0])  #conserved residue in 'PPOM_TOBAC'
print(con_res[1])


[16, 20, 22, 25, 26, 27, 43, 49, 50, 59, 65, 79, 83, 85, 104, 155, 158, 161, 175, 179, 184, 185, 190, 191, 197, 202, 205, 206, 241, 245, 253, 264, 267, 300, 305, 337, 352, 354, 379, 388, 394, 427, 440, 442, 467, 472]
[12, 16, 18, 21, 22, 23, 39, 45, 46, 55, 61, 75, 79, 81, 101, 147, 150, 153, 167, 171, 176, 177, 182, 183, 189, 194, 197, 198, 229, 233, 241, 251, 254, 279, 284, 314, 328, 330, 352, 361, 367, 400, 413, 415, 440, 445]

Structural analysis for nucleic acids and protein

The 'Structural_analysis' and 'Plot' modules provide methods for simple structural analysis and visulization for nucleic and protein. For example, the backbone torsion angle, the puckering of the sugar of nucleeotide, can be computed and plotted. (see examples for more detail)

Standard biomolecular data

The 'Data' module provides standard syntax for biomolecule, such as the standard name for amino acid and nucleic acid residue, molecular weights, etc.