(c) 2019, Dr. Ramil Nugmanov; Dr. Timur Madzhidov; Ravil Mukhametgaleev
Installation instructions of CGRtools package information and tutorial's files see on https://github.com/cimm-kzn/CGRtools
NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results.
In [ ]:
import pkg_resources
if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['3', '1']:
print('WARNING. Tutorial was tested on 3.1 version of CGRtools')
else:
print('Welcome!')
In [ ]:
# load data for tutorial
from pickle import load
from traceback import format_exc
with open('molecules.dat', 'rb') as f:
molecules = load(f) # list of MoleculeContainer objects
with open('reactions.dat', 'rb') as f:
reactions = load(f) # list of ReactionContainer objects
m2, m3 = molecules[1:3] # molecule
m7 = m3.copy()
m7.standardize()
r1 = reactions[0] # reaction
m5, m6 = r1.reactants[:2]
m8 = m7.substructure([4, 5, 6, 7, 8, 9], as_view=False)
m9 = m6.substructure([5, 6,7, 8], as_view=False) # acid
m10 = r1.products[0].copy()
benzene = m3.substructure([4,5,6,7,8,9], as_view=False)
cgr1 = m7 ^ m8
cgr1.reset_query_marks()
carb = m10.substructure([5,7,8, 2])
m2.reset_query_marks()
from CGRtools.containers import *
from CGRtools import CGRpreparer
preparer = CGRpreparer()
In [ ]:
m7
In [ ]:
m8
In [ ]:
benzene.standardize()
benzene
In [ ]:
# isomorphism operations
print(benzene < m7) # benzene is substructure of m7
print(benzene > m7) # benzene is not superstructure of m7
print(benzene <= m7) # benzene is substructure/or same structure of m7
print(benzene >= m7) # benzene is not superstructure/or same structure of m7
print(benzene < m8) # benzene is not substructure of m8. it's equal
print(benzene <= m8)
In [ ]:
m5
In [ ]:
m6
Mappings of substructure to structure can be returned using substructure.get_substructure_mapping(structure, limit=1) method. Argument limit is the number of mappings that one wants to be returned, limit=0 means to return all possible mappings. Method acts as generator.
To get mapping upon structure search structure1.get_mapping(structure2) method was developed. It returns only one possible mapping of all atoms for two isomorphic molecules. This functionality was developed to reorder atoms of two MoleculeContainers in the same order (the dictionary that is given by this method could be directly fed to remap function, see above) for some reaction handling issues. If molecules are isomorphic it works faster than get_substructure_mapping.
In [ ]:
m5.get_substructure_mapping(m6) # mapping of m5 substructure into m2 superstructure
In [ ]:
for m in m5.get_substructure_mapping(m6, limit=0): # iterate over all possible substructure mappings
print(m)
In [ ]:
benzene.get_mapping(m8) # mapping of benzene into m8 - also benzene.
In [ ]:
try: # it is not possible to match molecule and reaction. Error is returned
m6 < r1
except TypeError:
print(format_exc())
In [ ]:
r1.products[0] # see structure in products
In [ ]:
m6 # substructure used. One can see, they should not match
In [ ]:
any(m6 < m for m in r1.products) # check if any molecule from product side has m6 as substructure
Substructure search is possible with CGRContainer. API is the same as for molecules.
Matching CGR into CGR and molecule into CGR is possible. Note that only conventional bonds in CGR could match moleculear bonds.
Equal atoms in isomorphism is atoms with same charge/multiplicity and isotope numbers in reactant and product states
In [ ]:
decomposed1 = preparer.decompose(cgr1) # let's have a look at reaction corresponding to cgr1
decomposed1
In [ ]:
m8 # this's the substructure we are looking for
In [ ]:
m8 < cgr1
In [ ]:
cgr1 <= cgr1
In [ ]:
# to use QueryContainers neighbors and hybridization for molecules need to be calculated
m9.reset_query_marks()
m10.reset_query_marks()
In [ ]:
m9 # acid
In [ ]:
m10 # ether
In [ ]:
carb
In [ ]:
print('m9:', f'{m9:hn}') # all labels were calculated
print('m10:', f'{m10:hn}')
print('carb:', f'{carb:hn}') # notice that one of oxygen atom has 2 neighbors. Only ester could fit this restriction.
Molecules isomorphism don't take into account neighbors and hybridization
In [ ]:
carb < m9 # carb currently is molecule projection. It fit this molecule as well.
In [ ]:
carb < m10 # carb is a substructure of m10
One need to convert molecule (or it's projection) into QueryContainer object. In this case number of neighbors and hybridization data will be taken into account upon substructure search.
API of isomorphism is the same.
In [ ]:
q = QueryContainer(carb) # convert molecule into query
print(q) # now one can see that in signature of QueryContainer. See that one of oxygen has 2 neighbors.
In [ ]:
q < m9 # now neighbors and hybridization are taken into account.
Acid m9 has hydroxyl group with one non-hydrogen neighbor. Our query requires existence of one oxygen atom with two non-hydrogen neighbors.
In [ ]:
q < m10 # ester matches to query.
In [ ]:
m2.reset_query_marks()
m2
In [ ]:
q < m2 # this molecule does q as substructure as well. It is acid.