Molecules in scikit-chem

scikit-chem is first and formost a wrapper around rdkit to make it more Pythonic, and more intuitive to a user familiar with other libraries in the Scientific Python Stack. The package implements a core Mol class, physically representing a molecule. It is a direct subclass of the rdkit.Mol class:



In [1]:

    
import rdkit.Chem
issubclass(skchem.Mol, rdkit.Chem.Mol)









    Out[1]:





True

As such, it has all the methods available that an rdkit.Mol class has, for example:



In [2]:

    
hasattr(skchem.Mol, 'GetAromaticAtoms')









    Out[2]:





True

Initializing new molecules

Constructors are provided as classmethods on the skchem.Mol object, in the same fashion as pandas objects are constructed. For example, to make a pandas.DataFrame from a dictionary, you call:



In [3]:

    
df = pd.DataFrame.from_dict({'a': [10, 20], 'b': [20, 40]}); df

Analogously, to make a skchem.Mol from a smiles string, you call;



In [4]:

    
mol = skchem.Mol.from_smiles('CC(=O)Cl'); mol









    Out[4]:





<Mol name="None" formula="C2H3ClO" at 0x11dc8f490>

The available methods are:



In [5]:

    
[method for method in skchem.Mol.__dict__ if method.startswith('from_')]









    Out[5]:





['from_tplblock',
 'from_molblock',
 'from_molfile',
 'from_binary',
 'from_tplfile',
 'from_mol2block',
 'from_pdbfile',
 'from_pdbblock',
 'from_smiles',
 'from_smarts',
 'from_mol2file',
 'from_inchi']

When a molecule fails to parse, a ValueError is raised:



In [6]:

    
skchem.Mol.from_smiles('NOTSMILES')









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-99e03ef822e7> in <module>()
----> 1 skchem.Mol.from_smiles('NOTSMILES')

/Users/rich/projects/scikit-chem/skchem/core/mol.py in constructor(_, in_arg, name, *args, **kwargs)
    419         m = getattr(rdkit.Chem, 'MolFrom' + constructor_name)(in_arg, *args, **kwargs)
    420         if m is None:
--> 421             raise ValueError('Failed to parse molecule, {}'.format(in_arg))
    422         m = Mol.from_super(m)
    423         m.name = name

ValueError: Failed to parse molecule, NOTSMILES

Molecule accessors

Atoms and bonds are accessible as a property:



In [7]:

    
mol.atoms









    Out[7]:





<AtomView values="['C', 'C', 'O', 'Cl']" at 0x11dc9ac88>



In [8]:

    
mol.bonds









    Out[8]:





<BondView values="['C-C', 'C=O', 'C-Cl']" at 0x11dc9abe0>

These are iterable:



In [9]:

    
[a for a in mol.atoms]









    Out[9]:





[<Atom element="C" at 0x11dcfe8a0>,
 <Atom element="C" at 0x11dcfe9e0>,
 <Atom element="O" at 0x11dcfed00>,
 <Atom element="Cl" at 0x11dcfedf0>]

subscriptable:



In [10]:

    
mol.atoms[3]









    Out[10]:





<Atom element="Cl" at 0x11dcfef30>

sliceable:



In [11]:

    
mol.atoms[:3]









    Out[11]:





[<Atom element="C" at 0x11dcfebc0>,
 <Atom element="C" at 0x11de690d0>,
 <Atom element="O" at 0x11de693f0>]

indexable:



In [19]:

    
mol.atoms[[1, 3]]









    Out[19]:





[<Atom element="C" at 0x11de74760>, <Atom element="Cl" at 0x11de7fe40>]

and maskable:



In [18]:

    
mol.atoms[[True, False, True, False]]









    Out[18]:





[<Atom element="C" at 0x11de74ad0>, <Atom element="O" at 0x11de74f30>]

Properties on the rdkit objects are accessible through the props property:



In [11]:

    
mol.props['is_reactive'] = 'very!'



In [12]:

    
mol.atoms[1].props['kind'] = 'electrophilic'
mol.atoms[3].props['leaving group'] = 1
mol.bonds[2].props['bond strength'] = 'strong'

These are using the rdkit property functionality internally:



In [13]:

    
mol.GetProp('is_reactive')









    Out[13]:





'very!'

The properties of atoms and bonds are accessible molecule wide:



In [14]:

    
mol.atoms.props









    Out[14]:





<MolPropertyView values="{'leaving group': [nan, nan, nan, 1.0], 'kind': [None, 'electrophilic', None, None]}" at 0x11daf8390>



In [15]:

    
mol.bonds.props









    Out[15]:





<MolPropertyView values="{'bond strength': [None, None, 'strong']}" at 0x11daf80f0>

These can be exported as pandas objects:



In [16]:

    
mol.atoms.props.to_frame()









    Out[16]:






  
    
      
      kind
      leaving group
    
    
      atom_idx
      
      
    
  
  
    
      0
      None
      NaN
    
    
      1
      electrophilic
      NaN
    
    
      2
      None
      NaN
    
    
      3
      None
      1.0

Export and Serialization

Molecules are exported and/or serialized in a very similar way in which they are constructed, again with an inspiration from pandas.



In [17]:

    
df.to_csv()









    Out[17]:





',a,b\n0,10,20\n1,20,40\n'



In [18]:

    
mol.to_inchi_key()









    Out[18]:





'WETWJCDKMRHUPV-UHFFFAOYSA-N'

The total available formats are:



In [19]:

    
[method for method in skchem.Mol.__dict__ if method.startswith('to_')]









    Out[19]:





['to_inchi',
 'to_json',
 'to_smiles',
 'to_smarts',
 'to_inchi_key',
 'to_binary',
 'to_dict',
 'to_molblock',
 'to_tplfile',
 'to_formula',
 'to_molfile',
 'to_pdbblock',
 'to_tplblock']