In [1]:
from __future__ import print_function, division, unicode_literals
import oddt
from oddt.shape import usr, usr_similarity
print(oddt.__version__)
We'd like to compare the shape of heroin with other molecules.
ODDT supports three methods of molecular shape comparison: USR, USRCAT and Electroshape.
USR looks only at the shape of molecule.
USR-CAT considers the shape and type of atoms.
Electroshape accounts for the shape and charge of atoms.
All those methods have the same API.
We will use USR, because it's the simplest and the fastest.
In [2]:
heroin = oddt.toolkit.readstring('smi',
'CC(=O)Oc1ccc2c3c1O[C@@H]4[C@]35CC[NH+]([C@H](C2)[C@@H]5C=C[C@@H]4OC(=O)C)C')
smiles = ['CC(=O)Oc1ccc2c3c1O[C@@H]4[C@]35CC[NH+]([C@H](C2)[C@@H]5C=C[C@@H]4OC(=O)Cc6cccnc6)C',
'CC(=O)O[C@@H]1C=C[C@@H]2[C@H]3Cc4ccc(c5c4[C@]2([C@H]1O5)CC[NH+]3C)OC',
'C[N+]1(CC[C@@]23c4c5ccc(c4O[C@H]2[C@@H](C=C[C@@H]3[C@@H]1C5)O)OC)C',
'C[NH2+][C@@H]1Cc2ccc(c3c2[C@]4([C@@H]1CC=C([C@H]4O3)OC)C=C)OC',
'CCOC(=O)CNC(=O)O[C@H]1C=C[C@H]2[C@H]3Cc4ccc(c5c4[C@]2([C@H]1O5)CC[NH+]3C)OCOC',
'CC(=O)OC1=CC[C@H]2[C@@H]3Cc4ccc(c5c4[C@@]2([C@@H]1O5)CC[NH+]3C)OC',
'C[NH+]1CC[C@]23c4c5cc(c(c4O[C@H]2[C@H](C=C[C@H]3[C@H]1C5)O)O)c6cc7c8c(c6O)O[C@@H]9[C@]81CC[NH+]([C@H](C7)[C@@H]1C=C[C@@H]9O)C']
molecules = [oddt.toolkit.readstring('smi', smi) for smi in smiles]
To compute the shape using USR we need the molecule's 3D coordinates.
In [3]:
heroin.make3D()
heroin.removeh()
for mol in molecules:
mol.make3D()
mol.removeh()
Now we can use the usr
function.
In [4]:
usr_heroin = usr(heroin)
usr_heroin
Out[4]:
USR represents shape with 12 descriptors, which summarize the distribution of atomic distances in the molecule. For more details see Ballester & Richards (2007).
USR-CAT and Electroshape use more descriptors, 60 and 15 respectively.
Let's see how similar it is to a different molecule.
In [5]:
usr_similarity(usr_heroin, usr(molecules[0]))
Out[5]:
The similarity function returns a number in range (0, 1], where a higher number means that the molecules are more similar and 1 means that the molecules have identical shapes.
All methods (USR, USR-CAT and Electroshape) use the same similarity function.
We will find a molecule similar to oxamide.
In [6]:
similar_mols = []
for i, mol in enumerate(molecules):
sim = usr_similarity(usr_heroin, usr(mol))
similar_mols.append((i, sim))
In [7]:
similar_mols.sort(key=lambda similarity: similarity[1], reverse=True)
similar_mols
Out[7]:
In [8]:
heroin
Out[8]:
Heroin
In [9]:
idx_most = similar_mols[0][0]
molecules[idx_most]
Out[9]:
The most similar molecule
In [10]:
idx_least = similar_mols[-1][0]
molecules[idx_least]
Out[10]:
The least similar molecule
Similarity between these molecules:
In [11]:
usr_most = usr(molecules[idx_most])
usr_least = usr(molecules[idx_least])
usr_similarity(usr_most, usr_least)
Out[11]: