SAMPL6 host-guest preparation

This notebook prepares small molecule structure files for the SAMPL6 host-guest challenge from source files already present in the source directories. OpenEye toolkits are required, as well as openbabel for the case of oxaliplatin.

The procedure basically is as follows:

  • Load and prepare receptors (hosts) from provided PDB files
  • Prepare receptors for docking (to place guests into binding site)
  • Load guests molecules from SMILES strings
  • Charge guests and generate conformers
  • Dock to hosts
  • Write out results to mol2 and SDF
  • Handle oxaliplatin (a platinum-containing CB8 ligand) separately via openbabel for conformer generation; no charges

Handle imports of tools we're using throughout


In [1]:
from openeye.oechem import *
from openeye.oeomega import *
from openeye.oequacpac import *
from openeye.oedocking import *
import os

Load receptors from provided structure files (see README.md files)

The receptors are loaded from the provided files (PDB for OA and TEMOA; SDF for CB8) and written out so that we have sdf, mol2, and PDB formats all available.


In [2]:
# Specify source directory
source_dir = '.'

# Load host files
ifile = oemolistream(os.path.join(source_dir, 'OctaAcidsAndGuests', 'OA.pdb'))
OA = OEMol()
OEReadMolecule( ifile, OA)
ifile.close()

ifile = oemolistream(os.path.join(source_dir, 'OctaAcidsAndGuests', 'TEMOA.pdb'))
TEMOA = OEMol()
OEReadMolecule( ifile, TEMOA)
ifile.close()

ifile = oemolistream(os.path.join(source_dir, 'CB8AndGuests', 'CB8.sdf'))
CB8 = OEMol()
OEReadMolecule( ifile, CB8)
ifile.close()

# Assign partial charges; this step will take a while and isn't necessary to generate basic files
# (I ran this for about 25minutes and didn't get even the first one to converge on my laptop, so I'm skipping for now)
#for mol in [OA, TEMOA, CB8]:
#    OEAssignCharges(mol, OEAM1BCCCharges())

# Write out host files to alternate formats (mol2, sdf)
for fmt in ['.mol2', '.sdf']:
    ofile = oemolostream(os.path.join(source_dir, 'OctaAcidsAndGuests', 'OA'+fmt))
    OEWriteMolecule( ofile, OA)
    ofile.close()
    ofile = oemolostream(os.path.join(source_dir, 'OctaAcidsAndGuests', 'TEMOA'+fmt))
    OEWriteMolecule( ofile, TEMOA)
    ofile.close()
    
# Add write of CB8
for fmt in ['.pdb', '.mol2']:
    ofile = oemolostream(os.path.join(source_dir, 'CB8AndGuests', 'CB8'+fmt))
    OEWriteMolecule( ofile, CB8)
    ofile.close()

Prepare hosts for docking

To place the guests into the binding site, we will dock into the receptors (hosts) so prep for docking. This step can be somewhat slow (a few minutes).


In [3]:
# Make receptors of the hosts for use in docking; takes about 4 minutes on my Mac
receptors = {}

for (hostname, hostmol) in [('OA', OA),('TEMOA', TEMOA), ('CB8', CB8)]:
    # Start by getting center of mass to use as a hint for where to dock
    com = OEFloatArray(3)
    OEGetCenterOfMass(hostmol, com) #Try octa acid for now
    # Create receptor, as per https://docs.eyesopen.com/toolkits/python/dockingtk/receptor.html#creating-a-receptor
    receptor = OEGraphMol()
    OEMakeReceptor(receptor, hostmol, com[0], com[1], com[2])
    receptors[hostname]=receptor

Load, prepare, dock, and write OA/TEMOA guests

We load each guest from SMILES (assigning protonation states roughly appropriate for neutral pH), use Omega to generate conformers, assign AM1-BCC charges, then dock to the relevant host and write out. We dock to both OA and TEMOA separately (though they use the same frame of reference) and write out "OA-GN" and "TEMOA-GN" files where N is a number from 0 to 7, corresponding to the guest. The molecules contained in these files are the same in both cases, but conformers/poses will be different.

Disclaimer: We have no idea whether these are likely binding modes, good starting poses, etc. We are providing these as a potentially reasonable set of geometries, but use at your own risk.


In [4]:
# Load initial octa acid molecule files, extract smiles
smifilename = os.path.join(source_dir, 'Gibb_SAMPL6_guests.smi')
smifile = open(smifilename, 'r')
text = smifile.readlines()

# Loop over OA, TEMOA and dock guests to each
for hostname in ['OA', 'TEMOA']:

    #Get SMILES, build names
    smiles = [ line.split()[0] for line in text ]
    guest_names = [ ('%s-' % hostname)+'G%s' % i for i in range(len(smiles))]

    #initialize omega
    omega = OEOmega()
    omega.SetMaxConfs(100) #Generate up to 100 conformers since we'll use for docking
    omega.SetIncludeInput(False)
    omega.SetStrictStereo(True) #Refuse to generate conformers if stereochemistry not provided

    #Initialize charge generation
    chargeEngine = OEAM1BCCCharges()

    # Initialize docking
    dock = OEDock()
    dock.Initialize(receptors[hostname])

    # Build OEMols from SMILES
    oemols = []
    for (smi, name) in zip(smiles, guest_names):
        # Generate new OEMol and parse SMILES
        mol = OEMol()
        OEParseSmiles( mol, smi)
        OESetNeutralpHModel(mol)

        # Generate conformers with Omega; keep only best conformer
        status = omega(mol)
        if not status:
            print("Error generating conformers for %s, %s." % (smi, name))
        #print(smi, name, mol.NumAtoms()) #Print debug info -- make sure we're getting protons added as we should

        # Assign AM1-BCC charges
        OEAssignCharges(mol, chargeEngine)

        # Dock to hosts
        dockedMol = OEGraphMol()
        status = dock.DockMultiConformerMolecule(dockedMol, mol) #By default returns only top scoring pose
        sdtag = OEDockMethodGetName(OEDockMethod_Chemgauss4)
        OESetSDScore(dockedMol, dock, sdtag)
        dock.AnnotatePose(dockedMol)

        # Write out docked pose if docking successful, otherwise write out first generated conformer
        if status == OEDockingReturnCode_Success:
            outmol = dockedMol
        else:
            print("Docking failed for %s; storing input pose." % name)
            # Delete excess conformers -- we want to write only one
            for k, conf in enumerate( mol.GetConfs() ):
                if k > 0:
                    mol.DeleteConf(conf)
            outmol = mol

        # Write out
        tripos_mol2_filename = os.path.join(source_dir, 'OctaAcidsAndGuests', name+'.mol2')
        ofile = oemolostream( tripos_mol2_filename )
        OEWriteMolecule( ofile, outmol)
        ofile.close()
        # Add write of SDF
        sdf_filename = os.path.join(source_dir, 'OctaAcidsAndGuests', name+'.sdf')
        ofile = oemolostream( sdf_filename )
        OEWriteMolecule( ofile, outmol)
        ofile.close()

        # Save to oemols
        oemols.append(OEMol(outmol))

        # Clean up residue names in mol2 files that are tleap-incompatible: replace substructure names with valid text.
        infile = open( tripos_mol2_filename, 'r')
        lines = infile.readlines()
        infile.close()
        newlines = [line.replace('<0>', name) for line in lines]
        outfile = open(tripos_mol2_filename, 'w')
        outfile.writelines(newlines)
        outfile.close()

Load, prepare, dock, and write CB8 guests

We load each guest from SMILES (assigning protonation states roughly appropriate for neutral pH), use Omega to generate conformers, assign AM1-BCC charges, then dock to CB8 and write out. Conformer generation for oxaliplatin (a bonus case) to CB8 fails because of the platinum, so a separate procedure is used in an additional cell below for this case.

Disclaimer: We have no idea whether these are likely binding modes, good starting poses, etc. We are providing these as a potentially reasonable set of geometries, but use at your own risk.


In [5]:
# Load initial CB8 molecule files, extract smiles
smifilename = os.path.join(source_dir, 'Isaacs_SAMPL6_guests.smi')
smifile = open(smifilename, 'r')
text = smifile.readlines()



#Get SMILES, build names
smiles = [ line.split()[0] for line in text ]
guest_names = [ 'CB8-G%s' % i for i in range(len(smiles))]

#initialize omega
omega = OEOmega()
omega.SetMaxConfs(100) #Generate up to 100 conformers since we'll use for docking
omega.SetIncludeInput(False)
omega.SetStrictStereo(True) #Refuse to generate conformers if stereochemistry not provided

#Initialize charge generation
chargeEngine = OEAM1BCCCharges()

# Initialize docking
dock = OEDock()
dock.Initialize(receptors['CB8']) 

# Build OEMols from SMILES
oemols = []
for (smi, name) in zip(smiles, guest_names):
    # Generate new OEMol and parse SMILES
    mol = OEMol()
    OEParseSmiles( mol, smi)
    OESetNeutralpHModel(mol)

    # Generate conformers with Omega; keep only best conformer
    status = omega(mol)
    if not status:
        print("Error generating conformers for %s, %s." % (smi, name))
    #print(smi, name, mol.NumAtoms()) #Print debug info -- make sure we're getting protons added as we should

    # Assign AM1-BCC charges
    OEAssignCharges(mol, chargeEngine)

    # Dock to hosts
    dockedMol = OEGraphMol()
    status = dock.DockMultiConformerMolecule(dockedMol, mol) #By default returns only top scoring pose
    sdtag = OEDockMethodGetName(OEDockMethod_Chemgauss4)
    OESetSDScore(dockedMol, dock, sdtag)
    dock.AnnotatePose(dockedMol)

    # Write out docked pose if docking successful, otherwise write out first generated conformer
    if status == OEDockingReturnCode_Success:
        outmol = dockedMol
    else:
        print("Docking failed for %s; storing input pose." % name)
        # Delete excess conformers -- we want to write only one
        for k, conf in enumerate( mol.GetConfs() ):
            if k > 0:
                mol.DeleteConf(conf)
        outmol = mol

    # Write out
    tripos_mol2_filename = os.path.join(source_dir, 'CB8AndGuests', name+'.mol2')
    ofile = oemolostream( tripos_mol2_filename )
    OEWriteMolecule( ofile, outmol)
    ofile.close()
    # Add write of SDF
    sdf_filename = os.path.join(source_dir, 'CB8AndGuests', name+'.sdf')
    ofile = oemolostream( sdf_filename )
    OEWriteMolecule( ofile, outmol)
    ofile.close()

    # Save to oemols
    oemols.append(OEMol(outmol))

    # Clean up residue names in mol2 files that are tleap-incompatible: replace substructure names with valid text.
    infile = open( tripos_mol2_filename, 'r')
    lines = infile.readlines()
    infile.close()
    newlines = [line.replace('<0>', name) for line in lines]
    outfile = open(tripos_mol2_filename, 'w')
    outfile.writelines(newlines)
    outfile.close()


Error generating conformers for O=C(C(O1)=O)O[Pt]21[NH2][C@@H]3CCCC[C@H]3[NH2]2, CB8-G13.
Docking failed for CB8-G13; storing input pose.

Handle conformer generation and docking of oxaliplatin

OpenBabel appears to be able to handle conformer generation for oxaliplatin, so generate its conformer and then use the same docking procedure (though this time skipping charging of the guest).

I am told that the resulting geometry of oxaliplatin is incorrect, which is perhaps not surprising since presumably the platinum poses problems for most of our modeling tools. If participants would like to study this compounds, they are encouraged to carefully examine and prepare a suitable 3D conformation before beginning their studies.


In [6]:
# Oxaliplatin fails, so use openbabel to at least generate a 3D structure of it
import os
file = open('oxaliplatin.smi', 'w')
file.write(smiles[-1]) #Assumes it is last!
file.close()

os.system('obabel -ismi oxaliplatin.smi -omol2 -O tmp.mol2 --gen3d')
istream = oemolistream('tmp.mol2')
mol = OEMol()
OEReadMolecule(istream, mol)
istream.close()
os.system('rm oxaliplatin.smi')
os.system('rm tmp.mol2')

# Can't charge this (no BCCs for Pt) so skip

# Dock to at least place in right region
dockedMol = OEGraphMol()
status = dock.DockMultiConformerMolecule(dockedMol, mol) #By default returns only top scoring pose
sdtag = OEDockMethodGetName(OEDockMethod_Chemgauss4)
OESetSDScore(dockedMol, dock, sdtag)
dock.AnnotatePose(dockedMol)
outmol = dockedMol

# Write out
tripos_mol2_filename = os.path.join(source_dir, 'CB8AndGuests', guest_names[-1]+'.mol2')
ofile = oemolostream( tripos_mol2_filename )
OEWriteMolecule( ofile, outmol)
ofile.close()
# Add write of SDF
sdf_filename = os.path.join(source_dir, 'CB8AndGuests', guest_names[-1]+'.sdf')
ofile = oemolostream( sdf_filename )
OEWriteMolecule( ofile, outmol)
ofile.close()

Final results

We now should have final docked poses for the Octa Acid (OA and TEMOA) and CB8 guests, organized into the relevant directories. I have manually inspected all of the files I generated in the hosts and at least verified that the guests reside roughly inside the hosts without any overlapping atoms. Whether you use these or not is entirely up to you, and the usual disclaimers apply (don't assume we have the right protonation states, etc.).


In [ ]: