Test creation of molecules in OpenEye

This is to help with https://github.com/pandegroup/openmm/issues/1443 -- we need to determine whether valid OpenEye molecules can be created in general (irrespective of charge) from elements, connectivity, and bond order. The intention is to update OpenMM's Topology to include bond order, and we need to make sure we have the necessary info in general.

John Chodera currently has a hacked tool for doing something like this at https://github.com/choderalab/openmoltools/blob/master/openmoltools/forcefield_generators.py#L80-L179 which uses antechamber to infer bond orders with the assumption that the molecule is neutral.

Here we will attempt a more general solution.

Initial steps for testing:

  1. Create a molecule from SMILES
  2. Define utility functions to:
    1. Obtain elements, names, connectivity and integer bond orders; store
    2. Create a new molecule from elements, names, connectivity and integer bond orders
  3. Convert back to SMILES and compare to original SMILES

We need to ensure that this works for various bond orders including aromatics, and that it works for molecules of various charges including unusual protonation states.

1. Create a reference molecule


In [33]:
# NBVAL_SKIP
from openeye.oechem import *
mol = OEMol()
#smiles = 'CC(c1ccccc1)C(=O)[O-]'
smiles = 'C[NH2+]CC(c1ccccc1)C(=O)[O-]'
OEParseSmiles(mol, smiles)
OEAddExplicitHydrogens(mol)
OEAssignAromaticFlags(mol)
OETriposAtomNames(mol)

2. Make utility functions


In [34]:
def get_atom_properties(mol):
    """Return atomic numbers, names, connectivity, and bond orders involved in an OE molecule.
    
    Arguments:
    ----------
        mol : OEMol
            OpenEye molecule to process
        
    Returns
    -------
        atomic_numbers : list of ints
            List of integer atomic numbers
        names : list of strings
            List of strings of atom names
        bonds : list of tuples
            List of tuples of atom names involved in bonds in the format [ (bond1_name1, bond1_name2), (bond2_name1, bond2_name2), ...]
        bond_orders : list of ints
            List of integer bond orders corresponding to the bonds in the prior list
    """
    
    atomic_numbers = []
    names = []
    bonds = []
    bond_orders = []
    
    for atom in mol.GetAtoms():
        atomic_numbers.append( atom.GetAtomicNum())
        names.append( atom.GetName())
        
    for bond in mol.GetBonds():
        bgn = bond.GetBgn().GetName()
        end = bond.GetEnd().GetName()
        bonds.append( (bgn, end) )
        bond_orders.append( bond.GetOrder() )
        
    return atomic_numbers, names, bonds, bond_orders


def make_mol_from_properties( atomic_numbers, names, bonds, bond_orders):
    """Use atomic numbers, names, connectivity, and bond orders (i.e. as output by get_atom_properties) to construct and return an OEMol.

    
    Arguments:
    ----------
        atomic_numbers : list of ints
            List of integer atomic numbers
        names : list of strings
            List of strings of atom names
        bonds : list of tuples
            List of tuples of atom names involved in bonds in the format [ (bond1_name1, bond1_name2), (bond2_name1, bond2_name2), ...]
        bond_orders : list of ints
            List of integer bond orders corresponding to the bonds in the prior list
        
    Returns
    -------
        mol : OEMol
            OpenEye molecule created
            
    Notes: Draws loosely on John Chodera's code from https://github.com/choderalab/openmoltools/blob/master/openmoltools/forcefield_generators.py#L80-L179
       
    """
    
    mol = OEMol()
    for atnr, name in zip(atomic_numbers, names):
        oeatom = mol.NewAtom( atnr )
        oeatom.SetName(name)
    oeatoms = { oeatom.GetName() : oeatom for oeatom in mol.GetAtoms() }
    
    for bond, bond_order in zip(bonds, bond_orders):
        mol.NewBond(oeatoms[bond[0]], oeatoms[bond[1]], bond_order)
    
    # Perceive aromaticity
    OEAssignAromaticFlags(mol)
    # Assign formal charges
    OEAssignFormalCharges(mol)
    
    return mol

Check and see whether the output is consistent with the input


In [36]:
# NBVAL_SKIP
atomic_numbers, names, bonds, bond_orders = get_atom_properties(mol)
mol = make_mol_from_properties( atomic_numbers, names, bonds, bond_orders)
smiles = OECreateIsoSmiString(mol)
print(smiles)


C[NH2+]CC(c1ccccc1)C(=O)[O-]

In [ ]: