ChemicalEnvironments
Chemical Environments were created as a way to parse SMIRKS strings and make changes in chemical perception space. In this workbook, we will show you have chemical environments are initiated and used to make changes to a SMIRKS pattern.
Authors
Basic Structure of ChemicalEnvironments
ChemicalEnvironments
are initiated with the following input variables:
SMIRKS Strings
Here we use the word SMIRKS to mean SMARTS patterns with indexed atoms, we are not using Chemical Environments to parse SMIRKS strings that describe reactions.
That means these SMIRKS patterns should not contain multiple molecules ('.'
) or reaction arrows ('>>'
).
Here we will try to explain the SMIRKS patterns used here, but SMARTS and SMIRKS are a complex language.
SMARTS/SMIRKS strings are similar to SMILES strings with increased complexity.
For more details about this language see the Daylight tutorials:
In [1]:
# import necessary functions
from openforcefield.typing.chemistry import environment as env
from openeye import oechem
All Chemical Environments can be initated using SMIRKS strings.
If a ChemicalEnvironment
is initiated with no SMIRKS pattern, it is an empty structure.
However, there are 5 subtypes of ChemicalEnvironments
that match the types of parameters found in the SMIRNOFF format.
If they are initiated with no SMIRKS pattern, their structure matches a generic for that parameter type, for example [*:1]~[*:2]
for a bond (that is any atom connected to any other atom by any bond).
The 5 subtypes are listed below with their expected number of indexed atoms and the corresponding SMIRKS structure:
AtomChemicalEnvironment
"[*:1]"
BondChemicalEnvironment
"[*:1]~[*:2]"
AngleChemicalEnvironment
"[*:1]~[*:2]~[*:3]"
TorsionChemicalEnvironment
"[*:1]~[*:2]~[*:3]~[*:4]"
ImproperChemicalEnvironment
"[*:1]~[*:2](~[*:3])~[*:4]"
Here we show how these are initiated. Note that the generic environment is blank, it has the potential to become a SMIRKS pattern, but currently nothing is stored in it. While the subtypes have the shape described above, but wildcards ('*'
for atoms and '~'
for bonds).
In [2]:
# NBVAL_SKIP
Env = env.ChemicalEnvironment()
atomEnv = env.AtomChemicalEnvironment()
bondEnv = env.BondChemicalEnvironment()
angleEnv = env.AngleChemicalEnvironment()
torsionEnv = env.TorsionChemicalEnvironment()
impropEnv = env.ImproperChemicalEnvironment()
EnvList = [Env, atomEnv, bondEnv, angleEnv, torsionEnv, impropEnv]
names = ['generic', 'Atom','Bond','Angle','Torsion','(Improper']
for idx, Env in enumerate(EnvList):
print("%10s: %s" % (names[idx], Env.asSMIRKS()))
ChemicalEnvironments
from SMIRKS StringsChemicalEnvironments
can be initialized by SMIRKS strings. Here we attempt to show the robustness of this parsing. These patterns are intentionally complicated and therefore can be hard to read by humans. Here are some of the key features we would like to test:
"$ewg1"
to mean "[#7!-1,#8!-1,#16!-1,#9,#17,#35,#53]"
"[#6$(*([#6]=[#8])-,=$ewg2))]"
In this set-up we will show that these SMIRKS patterns are parseable with the OpenEye Toolkits, then create a ChemicalEnvironment
from the SMIRKS string and then print the ChemicalEnvironment
as a SMIRKS string. Note that due to the nature of SMIRKS patterns the ChemicalEnvironment
smirks may not identically match the input SMIRKS. A key difference is that every atom in a ChemicalEnvironment
SMIRKS will be inside square brackets. Also, "blank" bonds, for example in "CCC" will be converted to their literal meaning, single or aromatic.
In [3]:
# NBVAL_SKIP
# define the two replacements strings
replacements = [ ('ewg1', '[#7!-1,#8!-1,#16!-1,#9,#17,#35,#53]'),
('ewg2', '[#7!-1,#8,#16]')]
# define complicated SMIRKS patterns
SMIRKS = ['[#6$(*~[#6]=[#8])$(*-,=$ewg2)]', # complex recursive SMIRKS
'CCC', # SMILES
"[#1:1]-CCC", # simple hybrid
'[#6:1]1(-;!@[#1,#6])=;@[#6]-;@[#6]1', # Complicated ring
'C(O-[#7,#8])CC=[*]', # Hybrid SMIRKS
"[#6$([#6X4](~[$ewg1])(~[#8]~[#1])):1]-[#6X2H2;+0:2]-,=,:;!@;!#[$ewg2:3]-[#4:4]", # its just long
"[#6$([#6X4](~[$ewg1])(~[#8]~[#1])):1]1=CCCC1", # another ring
]
for smirk in SMIRKS:
qmol = oechem.OEQMol()
tmp_smirks = oechem.OESmartsLexReplace(smirk, replacements)
parseable = env.OEParseSmarts(qmol, tmp_smirks)
print("Input SMIRKS: %s" % smirk)
print("\t parseable by OpenEye Tools: %s" % parseable)
Env = env.ChemicalEnvironment(smirks = smirk, replacements = replacements)
print("\t Chemical Environment SMIRKS: %s\n" % Env.asSMIRKS())
ChemicalEnvironments
Up until now, we have discussed only how to initiate ChemicalEnvironment
s. Now we will explain how they are structured and how to use them to make changes to your SMIRKS pattern (and therefor the fragment you are describing).
To begin with, the overall structure of ChemicalEnvironment
s is similar to how a chemist might think about a fragment.
We use NetworkX graphs to store information about the pieces.
Nodes store information about Atoms and edges (connecting nodes) store information about Bonds.
Both of these sub-structures, Atoms and Bonds store information about the input SMIRKS pattern in a broken down way so it can be easily editted. The words Atoms and Bonds are capitalized as they are classes in and of themselves.
Both Atoms and Bonds have two types of information
This starts to sound complicated, so to try to illustrate how this works, we will use an actual Angle found in the SMIRNOFF99Frosst forcefield.
Here is the SMIRKS String:
"[#6X3,#7:1]~;@[#8;r:2]~;@[#6X3,#7:3]"
Here we will use the selectAtom and selectBond functions to get a specific atom or bond and then print its information. The 'select' methods ( selectAtom() or selectBond() ) takes an argument descriptor which can be used to select a certain atom or type of atom.
Descriptor input option:
In [4]:
smirks = "[#6X3,#7:1]~;@[#8;r:2]~;@[#6X3,#7:3]"
angle = env.ChemicalEnvironment(smirks = smirks)
# get atom1 and print information
atom1 = angle.selectAtom(1)
print("Atom 1: '%s'" % atom1.asSMIRKS())
print("ORTypes")
for (base, decs) in atom1.getORtypes():
print("\tBase: %s" % base)
str_decs = ["'%s'" % d for d in decs]
str_decs = ','.join(str_decs)
print("\tDecorators: [%s]" % str_decs)
print("ANDTypes:", atom1.getANDtypes())
print()
# get bond1 and print information
bond1 = angle.selectBond(1)
print("Bond 1: '%s'" % bond1.asSMIRKS())
print("ORTypes: ", bond1.getORtypes())
print("ANDTypes: ", bond1.getANDtypes())
For both ORtypes and ANDtypes for Atoms and Bonds there are "get" and "set" methods. The set methods completely rewrite that type. There are also methods for add ORtypes and ANDtypes where you add a single entry to the existing list.
Here we will use the set ORtypes to change atom1 to be a trivalent carbon or a divalent nitrogen.
Then we will also add an ORType and ANDType to atom2 so that it could refer to an oxygen ('#8') or trivalent and neutral nitrogen ('#7X3+0') and in one ring ('R1').
"[#6X3,#7X2:1]~;@[#8,#7X3+0;r;R1:2]~;@[#6X3,#7:3]"
In [5]:
# Change atom1's ORtypes with the setORtype method
new_ORtypes = [ ('#6', ['X3']), ('#7', ['X2']) ]
atom1.setORtypes(new_ORtypes)
print("New Atom 1: %s " % atom1.asSMIRKS())
# Change atom2's AND and OR types with the add*type methods
atom2 = angle.selectAtom(2)
atom2.addANDtype('R1')
atom2.addORtype('#7', ['X3', '+0'])
print("New Atom 2: %s" % atom2.asSMIRKS())
print("\nNew SMIRKS: %s" % angle.asSMIRKS())
The addAtom method is used to introduce atoms bound to existing atoms. You can add an empty atom or specify information about the new bond and new atom. Here are the parameters for the addAtom method:
Parameters
-----------
bondToAtom: atom object, required
atom the new atom will be bound to
bondORtypes: list of tuples, optional
strings that will be used for the ORtypes for the new bond
bondANDtypes: list of strings, optional
strings that will be used for the ANDtypes for the new bond
newORtypes: list of strings, optional
strings that will be used for the ORtypes for the new atom
newANDtypes: list of strings, optional
strings that will be used for the ANDtypes for the new atom
newAtomIndex: int, optional
integer label that could be used to index the atom in a SMIRKS string
beyondBeta: boolean, optional
if True, allows bonding beyond beta position
The addAtom
method returns the created atom.
Here we will add an alpha atom (oxygen) to atom 3 that is not in a ring and then a beta atom (hydrogen) bound to the alpha atom.
"[#6X3,#7X2:1]~;@[#8,#7+0X3;R1:2]~;@[#6X3,#7:3]~;!@[#8X2H1;R0]~[#1]"
In [6]:
atom3 = angle.selectAtom(3)
alpha_ORtypes = [('#8', ['X2', 'H1'])]
alpha_ANDtypes = ['R0']
alpha_bondANDtypes = ['!@']
beta_ORtypes = [('#1', [])]
alpha = angle.addAtom(atom3, bondANDtypes = alpha_bondANDtypes,
newORtypes = alpha_ORtypes, newANDtypes = alpha_ANDtypes)
beta = angle.addAtom(alpha, newORtypes = beta_ORtypes)
print("Alpha Atom SMIRKS: %s" % alpha.asSMIRKS())
print("Beta Atom SMIRKS: %s" % beta.asSMIRKS())
print()
print("New overall SMIRKS: %s" % angle.asSMIRKS())
The removeAtom method works how you would expect. It removes the specified atom and the bond connecting it to the fragment.
You cannot remove indexed atoms (if you want to remove their OR and AND decorators you can set them to empty lists).
The other option with the removeAtom
method is to say only remove it if the atom is undecorated. This is done by setting the input variable isEmpty
to True (default is False). When isEmpty
is True, the atom is only removed if it has 1 ORtype and no ANDtypes.
The removeAtom
method returns True if the atom was removed and False if it was not.
As an example, we will remove the hydrogen in the beta position to atom3 that was added above.
"New overall SMIRKS: [#6X3,#7X2:1]~;@[#8,#7+0X3;R1:2]~;@[#6X3,#7:3]~;!@[#8X2H1;R0]"
In [7]:
removed = angle.removeAtom(beta)
print("The hydrogen beta to atom3 was remove: ", removed)
print("Updated SMIRKS string: %s" % angle.asSMIRKS())
ChemicalEnvironment
MethodsThere are a variety of other methods that let you get information about the stored fragment. This includes:
isAlpha
returns a boolean) getAtoms
or getBonds
getIndexedAtoms
or getIndexedBonds
getAlphaAtoms
or getAlphaBonds
getBetaAtoms
or getBetaBonds
getUnindexedAtoms
or getUnindexedBonds
Bond.getOrder
'-,='
) will report the order as 1getValence
and getBondORder
getBond(atom1, atom2)
getNeighbors
Here we will show how each of these method types is used:
In [8]:
# 1. Getting information about an atom or bond in an environment (i.e. isAlpha returns a boolean)
# Check if the alpha atom above is any of the following
print("Above a carbon atom ('%s') was added in the alpha position to atom 3. This atom is ..." % alpha.asSMIRKS())
print("\t Indexed: ", angle.isIndexed(alpha))
print("\t Unindexed: ", angle.isUnindexed(alpha))
print("\t Alpha: ", angle.isAlpha(alpha))
print("\t Beta: ", angle.isBeta(alpha))
# NOTE - These methods can take an atom or a bond as an argument
In [9]:
# 2. Get atoms or bonds in each type of position, for example getIndexedAtoms or getAlphaBonds
# We will print the SMIRKS for each indexed atom:
indexed = angle.getIndexedAtoms()
print("Here are the SMIRKS strings for the Indexed atoms in the example angle:")
for a in indexed:
print("\tAtom %i: '%s'" % (a.index, a.asSMIRKS()))
print()
bonds = angle.getBonds()
print("Here are the SMIRKS strings for ALL bonds in the example angle:")
for b in bonds:
print("\t'%s'" % b.asSMIRKS())
In [10]:
# 3. Report the minimum order of a bond with Bond.getOrder
bond1 = angle.selectBond(1)
print("Bond 1 (between atoms 1 and 2) has a minimum order of %i" % bond1.getOrder())
In [11]:
# 4. Report the valence and bond order around an atom can be reported with getValence and getBondORder
atom3 = angle.selectAtom(3)
print("Atom 3 has a valency of %i" % angle.getValence(atom3))
print("Atom 3 has a minimum bond order of %i" % angle.getBondOrder(atom3))
In [13]:
# 5. Get a bond between two atoms (or determine if the atoms are bonded) with getBond(atom1, atom2)
# Check for bonds between each pair of indexed atoms
atom_pairs = [ (1,2), (2,3), (1,3) ]
for (A,B) in atom_pairs:
atomA = angle.selectAtom(A)
atomB = angle.selectAtom(B)
# check if there is a bond between the two atoms
bond = angle.getBond(atomA, atomB)
if bond is None:
print("There is no bond between Atom %i and Atom %i" % (A, B))
else:
print("The bond between Atom %i and Atom %i has the pattern '%s'" % (A, B, bond.asSMIRKS()))
In [17]:
# 6. Get atoms bound to a specified atom with getNeighbors
# get the neighbors for each indexed atom
for A in [1,2,3]:
atomA = angle.selectAtom(A)
print("Atom %i has the following neighbors" % A)
for a in angle.getNeighbors(atomA):
print("\t '%s' " % a.asSMIRKS())
print()
In [ ]: