Using ChemicalEnvironments

Chemical Environments were created as a way to parse SMIRKS strings and make changes in chemical perception space. In this workbook, we will show you have chemical environments are initiated and used to make changes to a SMIRKS pattern.

Authors

  • Caitlin C. Bannan from Mobley Group at University of California, Irvine

Basic Structure of ChemicalEnvironments

ChemicalEnvironments are initiated with the following input variables:

  • smirks = any SMIRKS string (if None an empty environment is created)
  • label = this could be anything, a number/str/int, stored at ChemicalEnvironment.label
  • replacements = This is a list of two tuples in the form (short, smirks) to substitute short hand in your SMIRKS strings. This is used to check if your input SMIRKS string or created Chemical Environment are Valid.

SMIRKS Strings

Here we use the word SMIRKS to mean SMARTS patterns with indexed atoms, we are not using Chemical Environments to parse SMIRKS strings that describe reactions. That means these SMIRKS patterns should not contain multiple molecules ('.') or reaction arrows ('>>'). Here we will try to explain the SMIRKS patterns used here, but SMARTS and SMIRKS are a complex language. SMARTS/SMIRKS strings are similar to SMILES strings with increased complexity. For more details about this language see the Daylight tutorials:


In [1]:
# import necessary functions
from openforcefield.typing.chemistry import environment as env
from openeye import oechem

Default Chemical Environments

All Chemical Environments can be initated using SMIRKS strings. If a ChemicalEnvironment is initiated with no SMIRKS pattern, it is an empty structure. However, there are 5 subtypes of ChemicalEnvironments that match the types of parameters found in the SMIRNOFF format. If they are initiated with no SMIRKS pattern, their structure matches a generic for that parameter type, for example [*:1]~[*:2] for a bond (that is any atom connected to any other atom by any bond). The 5 subtypes are listed below with their expected number of indexed atoms and the corresponding SMIRKS structure:

  • AtomChemicalEnvironment
    • expects 1 indexed atom
    • default/generic SMIRKS "[*:1]"
  • BondChemicalEnvironment
    • expects 2 indexed atoms
    • default/generic SMIRKS: "[*:1]~[*:2]"
  • AngleChemicalEnvironment
    • expects 3 indexed atoms
    • default/generic SMIRKS: "[*:1]~[*:2]~[*:3]"
  • TorsionChemicalEnvironment
    • expects 4 indexed atoms in a proper dihedral angle
    • default/generic SMIRKS: "[*:1]~[*:2]~[*:3]~[*:4]"
  • ImproperChemicalEnvironment
    • expects 4 indexed atoms in an improper dihedral angle
    • default/generic SMIRKS: "[*:1]~[*:2](~[*:3])~[*:4]"

Here we show how these are initiated. Note that the generic environment is blank, it has the potential to become a SMIRKS pattern, but currently nothing is stored in it. While the subtypes have the shape described above, but wildcards ('*' for atoms and '~' for bonds).


In [2]:
# NBVAL_SKIP
Env = env.ChemicalEnvironment()
atomEnv = env.AtomChemicalEnvironment()
bondEnv = env.BondChemicalEnvironment()
angleEnv = env.AngleChemicalEnvironment()
torsionEnv = env.TorsionChemicalEnvironment()
impropEnv = env.ImproperChemicalEnvironment()

EnvList = [Env, atomEnv, bondEnv, angleEnv, torsionEnv, impropEnv]
names = ['generic', 'Atom','Bond','Angle','Torsion','(Improper']

for idx, Env in enumerate(EnvList):
    print("%10s: %s" % (names[idx], Env.asSMIRKS()))


   generic: 
      Atom: [*:1]
      Bond: [*:1]~[*:2]
     Angle: [*:1]~[*:2]~[*:3]
   Torsion: [*:1]~[*:2]~[*:3]~[*:4]
 (Improper: [*:1]~[*:2](~[*:3])~[*:4]

Initiating ChemicalEnvironments from SMIRKS Strings

ChemicalEnvironments can be initialized by SMIRKS strings. Here we attempt to show the robustness of this parsing. These patterns are intentionally complicated and therefore can be hard to read by humans. Here are some of the key features we would like to test:

  • SMILES strings are SMIRKS strings (i.e. 'CCC' should be stored as 3 atoms bonded in a row).
  • Replacement strings, such as "$ewg1" to mean "[#7!-1,#8!-1,#16!-1,#9,#17,#35,#53]"
  • Complex recursive SMIRKS such as "[#6$(*([#6]=[#8])-,=$ewg2))]"
  • Ring indexing, as in SMILES, SMARTS and SMIKRS use a number after an atom to describe the atoms in a ring, such as "[#6:1]1(-;!@[#1,#6])=;@[#6]-;@[#6]1" to show a cyclopropene ring where atom 1 is in the double bond and is bound to a hydrogen or carbon outside the ring.
  • Hybrid SMIRKS with atomic symbols for the atoms. These do not have to use the square brackets, for example "C(O-[#7,#8])C[C+0]=[*]"

In this set-up we will show that these SMIRKS patterns are parseable with the OpenEye Toolkits, then create a ChemicalEnvironment from the SMIRKS string and then print the ChemicalEnvironment as a SMIRKS string. Note that due to the nature of SMIRKS patterns the ChemicalEnvironment smirks may not identically match the input SMIRKS. A key difference is that every atom in a ChemicalEnvironment SMIRKS will be inside square brackets. Also, "blank" bonds, for example in "CCC" will be converted to their literal meaning, single or aromatic.


In [3]:
# NBVAL_SKIP
# define the two replacements strings
replacements = [ ('ewg1', '[#7!-1,#8!-1,#16!-1,#9,#17,#35,#53]'),
               ('ewg2', '[#7!-1,#8,#16]')]

# define complicated SMIRKS patterns
SMIRKS = ['[#6$(*~[#6]=[#8])$(*-,=$ewg2)]', # complex recursive SMIRKS
         'CCC', # SMILES
          "[#1:1]-CCC", # simple hybrid
         '[#6:1]1(-;!@[#1,#6])=;@[#6]-;@[#6]1', # Complicated ring
         'C(O-[#7,#8])CC=[*]', # Hybrid SMIRKS
         "[#6$([#6X4](~[$ewg1])(~[#8]~[#1])):1]-[#6X2H2;+0:2]-,=,:;!@;!#[$ewg2:3]-[#4:4]", # its just long
         "[#6$([#6X4](~[$ewg1])(~[#8]~[#1])):1]1=CCCC1", # another ring
         ]

for smirk in SMIRKS:
    qmol = oechem.OEQMol()
    tmp_smirks = oechem.OESmartsLexReplace(smirk, replacements) 
    parseable = env.OEParseSmarts(qmol, tmp_smirks)
    print("Input SMIRKS: %s" % smirk)
    print("\t parseable by OpenEye Tools: %s" % parseable)
    Env = env.ChemicalEnvironment(smirks = smirk, replacements = replacements)
    print("\t Chemical Environment SMIRKS: %s\n" % Env.asSMIRKS())


Input SMIRKS: [#6$(*~[#6]=[#8])$(*-,=$ewg2)]
	 parseable by OpenEye Tools: True
	 Chemical Environment SMIRKS: [#6](~[#6]=[#8])-,=[$ewg2]

Input SMIRKS: CCC
	 parseable by OpenEye Tools: True
	 Chemical Environment SMIRKS: [C]-,:[C]-,:[C]

Input SMIRKS: [#1:1]-CCC
	 parseable by OpenEye Tools: True
	 Chemical Environment SMIRKS: [#1:1]-[C]-,:[C]-,:[C]

Input SMIRKS: [#6:1]1(-;!@[#1,#6])=;@[#6]-;@[#6]1
	 parseable by OpenEye Tools: True
	 Chemical Environment SMIRKS: [#6:1]1(-;!@[#1,#6])=;@[#6]-;@[#6]1

Input SMIRKS: C(O-[#7,#8])CC=[*]
	 parseable by OpenEye Tools: True
	 Chemical Environment SMIRKS: [C](-,:[C]-,:[C]=[*])-,:[O]-[#7,#8]

Input SMIRKS: [#6$([#6X4](~[$ewg1])(~[#8]~[#1])):1]-[#6X2H2;+0:2]-,=,:;!@;!#[$ewg2:3]-[#4:4]
	 parseable by OpenEye Tools: True
	 Chemical Environment SMIRKS: [#6X4:1](-[#6X2H2;+0:2]-,=,:;!@;!#[$ewg2:3]-[#4:4])(~[#8]~[#1])~[$ewg1]

Input SMIRKS: [#6$([#6X4](~[$ewg1])(~[#8]~[#1])):1]1=CCCC1
	 parseable by OpenEye Tools: True
	 Chemical Environment SMIRKS: [#6X4:1]1(~[#8]~[#1])(~[$ewg1])=[C]-,:[C]-,:[C]-,:[C]1

Structure of ChemicalEnvironments

Up until now, we have discussed only how to initiate ChemicalEnvironments. Now we will explain how they are structured and how to use them to make changes to your SMIRKS pattern (and therefor the fragment you are describing). To begin with, the overall structure of ChemicalEnvironments is similar to how a chemist might think about a fragment. We use NetworkX graphs to store information about the pieces. Nodes store information about Atoms and edges (connecting nodes) store information about Bonds. Both of these sub-structures, Atoms and Bonds store information about the input SMIRKS pattern in a broken down way so it can be easily editted. The words Atoms and Bonds are capitalized as they are classes in and of themselves.

Both Atoms and Bonds have two types of information

  • ORtypes
    • things that are OR'd together in the SMIRKS string using a comma (',')
    • These have two subtypes:
      • ORbases - typically an atomic number
      • ORdecorators - typically information that might be true for 1 possible atomic number, but not others
  • ANDtypes
    • things that are AND'd together in the SMIRKS string using a semi-colon (';')

This starts to sound complicated, so to try to illustrate how this works, we will use an actual Angle found in the SMIRNOFF99Frosst forcefield.

Here is the SMIRKS String:

"[#6X3,#7:1]~;@[#8;r:2]~;@[#6X3,#7:3]"

  • atom 1 and atom 3
    • ORtypes
      • '#6X3' - a trivalent carbon
        • ORbase = '#6'
        • ORdecorators = ['X3']
      • '#7' is a nitrogen
        • ORbase = '#7'
        • ORdecorators []
    • ANDtypes
      • [] (None)
  • atom 2
    • ORtypes
      • '#8'
        • ORbase = '#8'
        • ORdecorators = []
    • ANDtypes
      • ['r'] it is in a ring
  • bond 1 and 2 are identical
    • ORtypes = None (generic bond ~)
    • ANDtypes = ['@']
      • it is in a ring

Selecting Atoms and Bonds

Here we will use the selectAtom and selectBond functions to get a specific atom or bond and then print its information. The 'select' methods ( selectAtom() or selectBond() ) takes an argument descriptor which can be used to select a certain atom or type of atom.

Descriptor input option:

  • None - returns a random atom
  • int - returns that atom or bond by index
  • 'Indexed' - returns a random indexed atom
  • 'Unindexed' - returns a random non-indexed atom
  • 'Alpha' - returns a random atom alpha to an indexed atom
  • 'Beta' - returns a random atom beta to an indexed atom

In [4]:
smirks = "[#6X3,#7:1]~;@[#8;r:2]~;@[#6X3,#7:3]"
angle = env.ChemicalEnvironment(smirks = smirks)

# get atom1 and print information
atom1 = angle.selectAtom(1)
print("Atom 1: '%s'" % atom1.asSMIRKS())
print("ORTypes")
for (base, decs) in atom1.getORtypes():
    print("\tBase: %s" % base)
    str_decs = ["'%s'" % d for d in decs]
    str_decs = ','.join(str_decs)
    print("\tDecorators: [%s]" % str_decs)
print("ANDTypes:", atom1.getANDtypes())
print()

# get bond1 and print information
bond1 = angle.selectBond(1)
print("Bond 1: '%s'" % bond1.asSMIRKS())
print("ORTypes: ", bond1.getORtypes())
print("ANDTypes: ", bond1.getANDtypes())


Atom 1: '[#6X3,#7:1]'
ORTypes
	Base: #6
	Decorators: ['X3']
	Base: #7
	Decorators: []
ANDTypes: []

Bond 1: '~;@'
ORTypes:  []
ANDTypes:  ['@']

Changing ORtypes and ANDtypes

For both ORtypes and ANDtypes for Atoms and Bonds there are "get" and "set" methods. The set methods completely rewrite that type. There are also methods for add ORtypes and ANDtypes where you add a single entry to the existing list.

Here we will use the set ORtypes to change atom1 to be a trivalent carbon or a divalent nitrogen.

Then we will also add an ORType and ANDType to atom2 so that it could refer to an oxygen ('#8') or trivalent and neutral nitrogen ('#7X3+0') and in one ring ('R1').

Final SMIRKS string: "[#6X3,#7X2:1]~;@[#8,#7X3+0;r;R1:2]~;@[#6X3,#7:3]"


In [5]:
# Change atom1's ORtypes with the setORtype method
new_ORtypes = [ ('#6', ['X3']), ('#7', ['X2']) ]
atom1.setORtypes(new_ORtypes)
print("New Atom 1: %s " % atom1.asSMIRKS())

# Change atom2's AND and OR types with the add*type methods
atom2 = angle.selectAtom(2)
atom2.addANDtype('R1')
atom2.addORtype('#7', ['X3', '+0'])
print("New Atom 2: %s" % atom2.asSMIRKS())

print("\nNew SMIRKS: %s" % angle.asSMIRKS())


New Atom 1: [#6X3,#7X2:1] 
New Atom 2: [#8,#7+0X3;R1:2]

New SMIRKS: [#6X3,#7X2:1]~;@[#8,#7+0X3;R1:2]~;@[#6X3,#7:3]

Adding new Atoms

The addAtom method is used to introduce atoms bound to existing atoms. You can add an empty atom or specify information about the new bond and new atom. Here are the parameters for the addAtom method:

    Parameters
    -----------
    bondToAtom: atom object, required
        atom the new atom will be bound to
    bondORtypes: list of tuples, optional
        strings that will be used for the ORtypes for the new bond
    bondANDtypes: list of strings, optional
        strings that will be used for the ANDtypes for the new bond
    newORtypes: list of strings, optional
        strings that will be used for the ORtypes for the new atom
    newANDtypes: list of strings, optional
        strings that will be used for the ANDtypes for the new atom
    newAtomIndex: int, optional
        integer label that could be used to index the atom in a SMIRKS string
    beyondBeta: boolean, optional
        if True, allows bonding beyond beta position

The addAtom method returns the created atom.

Here we will add an alpha atom (oxygen) to atom 3 that is not in a ring and then a beta atom (hydrogen) bound to the alpha atom.

New SMIRKS pattern: "[#6X3,#7X2:1]~;@[#8,#7+0X3;R1:2]~;@[#6X3,#7:3]~;!@[#8X2H1;R0]~[#1]"


In [6]:
atom3 = angle.selectAtom(3)

alpha_ORtypes = [('#8', ['X2', 'H1'])]
alpha_ANDtypes = ['R0']
alpha_bondANDtypes = ['!@']

beta_ORtypes = [('#1', [])]

alpha = angle.addAtom(atom3, bondANDtypes = alpha_bondANDtypes, 
                      newORtypes = alpha_ORtypes, newANDtypes = alpha_ANDtypes)

beta = angle.addAtom(alpha, newORtypes = beta_ORtypes)

print("Alpha Atom SMIRKS: %s" % alpha.asSMIRKS())
print("Beta Atom SMIRKS: %s" % beta.asSMIRKS())
print()
print("New overall SMIRKS: %s" % angle.asSMIRKS())


Alpha Atom SMIRKS: [#8X2H1;R0]
Beta Atom SMIRKS: [#1]

New overall SMIRKS: [#6X3,#7X2:1]~;@[#8,#7+0X3;R1:2]~;@[#6X3,#7:3]~;!@[#8X2H1;R0]~[#1]

Removing Atoms

The removeAtom method works how you would expect. It removes the specified atom and the bond connecting it to the fragment. You cannot remove indexed atoms (if you want to remove their OR and AND decorators you can set them to empty lists). The other option with the removeAtom method is to say only remove it if the atom is undecorated. This is done by setting the input variable isEmpty to True (default is False). When isEmpty is True, the atom is only removed if it has 1 ORtype and no ANDtypes.

The removeAtom method returns True if the atom was removed and False if it was not.

As an example, we will remove the hydrogen in the beta position to atom3 that was added above.

New SMIRKS pattern: "New overall SMIRKS: [#6X3,#7X2:1]~;@[#8,#7+0X3;R1:2]~;@[#6X3,#7:3]~;!@[#8X2H1;R0]"


In [7]:
removed = angle.removeAtom(beta)
print("The hydrogen beta to atom3 was remove: ", removed)
print("Updated SMIRKS string: %s" % angle.asSMIRKS())


The hydrogen beta to atom3 was remove:  True
Updated SMIRKS string: [#6X3,#7X2:1]~;@[#8,#7+0X3;R1:2]~;@[#6X3,#7:3]~;!@[#8X2H1;R0]

Other ChemicalEnvironment Methods

There are a variety of other methods that let you get information about the stored fragment. This includes:

  1. Getting information about an atom or bond in an environment (i.e. isAlpha returns a boolean)
  • Get atoms or bonds in each type of position:
    • getAtoms or getBonds
      • returns all atoms or bonds
    • getIndexedAtoms or getIndexedBonds
    • getAlphaAtoms or getAlphaBonds
    • getBetaAtoms or getBetaBonds
    • getUnindexedAtoms or getUnindexedBonds
  • Report the minimum order of a bond with Bond.getOrder
    • Note this is the minimum so a bond that is single or double ('-,=') will report the order as 1
  • Report the valence and bond order around an atom can be reported with getValence and getBondORder
  • Get a bond between two atoms (or determine if the atoms are bonded) with getBond(atom1, atom2)
  • Get atoms bound to a specified atom with getNeighbors

Here we will show how each of these method types is used:


In [8]:
# 1. Getting information about an atom or bond in an environment (i.e. isAlpha returns a boolean)
# Check if the alpha atom above is any of the following
print("Above a carbon atom ('%s') was added in the alpha position to atom 3. This atom is ..." % alpha.asSMIRKS())
print("\t Indexed: ", angle.isIndexed(alpha))
print("\t Unindexed: ", angle.isUnindexed(alpha))
print("\t Alpha: ", angle.isAlpha(alpha))
print("\t Beta: ", angle.isBeta(alpha))
# NOTE - These methods can take an atom or a bond as an argument


Above a carbon atom ('[#8X2H1;R0]') was added in the alpha position to atom 3. This atom is ...
	 Indexed:  False
	 Unindexed:  True
	 Alpha:  True
	 Beta:  False

In [9]:
# 2. Get atoms or bonds in each type of position, for example getIndexedAtoms or getAlphaBonds
# We will print the SMIRKS for each indexed atom:
indexed = angle.getIndexedAtoms()
print("Here are the SMIRKS strings for the Indexed atoms in the example angle:")
for a in indexed:
    print("\tAtom %i: '%s'" % (a.index, a.asSMIRKS()))
print()

bonds = angle.getBonds()
print("Here are the SMIRKS strings for ALL bonds in the example angle:")
for b in bonds:
    print("\t'%s'" % b.asSMIRKS())


Here are the SMIRKS strings for the Indexed atoms in the example angle:
	Atom 1: '[#6X3,#7X2:1]'
	Atom 2: '[#8,#7+0X3;R1:2]'
	Atom 3: '[#6X3,#7:3]'

Here are the SMIRKS strings for ALL bonds in the example angle:
	'~;@'
	'~;@'
	'~;!@'

In [10]:
# 3. Report the minimum order of a bond with Bond.getOrder
bond1 = angle.selectBond(1)
print("Bond 1 (between atoms 1 and 2) has a minimum order of %i" % bond1.getOrder())


Bond 1 (between atoms 1 and 2) has a minimum order of 1

In [11]:
# 4. Report the valence and bond order around an atom can be reported with getValence and getBondORder
atom3 = angle.selectAtom(3)
print("Atom 3 has a valency of %i" % angle.getValence(atom3))
print("Atom 3 has a minimum bond order of %i" % angle.getBondOrder(atom3))


Atom 3 has a valency of 2
Atom 3 has a minimum bond order of 2

In [13]:
# 5. Get a bond between two atoms (or determine if the atoms are bonded) with getBond(atom1, atom2)
# Check for bonds between each pair of indexed atoms
atom_pairs = [ (1,2), (2,3), (1,3) ]
for (A,B) in atom_pairs:
    atomA = angle.selectAtom(A)
    atomB = angle.selectAtom(B)

    # check if there is a bond between the two atoms
    bond = angle.getBond(atomA, atomB)
    if bond is None:
        print("There is no bond between Atom %i and Atom %i" % (A, B))
    else:
        print("The bond between Atom %i and Atom %i has the pattern '%s'" % (A, B, bond.asSMIRKS()))


The bond between Atom 1 and Atom 2 has the pattern '~;@'
The bond between Atom 2 and Atom 3 has the pattern '~;@'
There is no bond between Atom 1 and Atom 3

In [17]:
# 6. Get atoms bound to a specified atom with getNeighbors
# get the neighbors for each indexed atom
for A in [1,2,3]:
    atomA = angle.selectAtom(A)
    print("Atom %i has the following neighbors" % A)
    for a in angle.getNeighbors(atomA):
        print("\t '%s' " % a.asSMIRKS())
    print()


Atom 1 has the following neighbors
	 '[#8,#7+0X3;R1:2]' 

Atom 2 has the following neighbors
	 '[#6X3,#7X2:1]' 
	 '[#6X3,#7:3]' 

Atom 3 has the following neighbors
	 '[#8,#7+0X3;R1:2]' 
	 '[#8X2H1;R0]' 


In [ ]: