In [1]:
%matplotlib inline
Operations looking to remove compounds from a collection are implemented as Filters in scikit-chem. These are implemented in the skchem.filters packages:
In [19]:
skchem.filters.__all__
Out[19]:
They are used very much like Transformers:
In [20]:
of = skchem.filters.OrganicFilter()
In [21]:
benzene = skchem.Mol.from_smiles('c1ccccc1', name='benzene')
ferrocene = skchem.Mol.from_smiles('[cH-]1cccc1.[cH-]1cccc1.[Fe+2]', name='ferrocene')
norbornane = skchem.Mol.from_smiles('C12CCC(C2)CC1', name='norbornane')
dicyclopentadiene = skchem.Mol.from_smiles('C1C=CC2C1C3CC2C=C3')
ms = [benzene, ferrocene, norbornane, dicyclopentadiene]
In [22]:
of.filter(ms)
Out[22]:
Filters essentially use a predicate function to decide whether to keep or remove instances. The result of this function can be returned using transform:
In [23]:
of.transform(ms)
Out[23]:
In [24]:
issubclass(skchem.filters.Filter, skchem.base.Transformer)
Out[24]:
The predicate functions should return None, False or np.nan for negative results, and anything else for positive results
You can create your own filter by passing a predicate function to the Filter class. For example, perhaps you only wanted compounds to keep compounds that had a name:
In [25]:
is_named = skchem.filters.Filter(lambda m: m.name is not None)
We carelessly did not set dicyclopentadiene's name previously, so we want this to get filtered out:
In [26]:
is_named.filter(ms)
Out[26]:
It worked!
A common functionality in cheminformatics is to convert a molecule into something else, and if the conversion fails, to just remove the compound. An example of this is standardization, where one might want to throw away compounds that fail to standardize, or geometry optimization where one might throw away molecules that fail to converge.
This functionality is similar to but crucially different from simply filtering, as filtering returns the original compounds, rather than the transformed compounds. Instead, there are special Filters, called TransformFilters, that can perform this task in a single method call. To give an example of the functionality, we will use the UFF class:
In [27]:
issubclass(skchem.forcefields.UFF, skchem.filters.base.TransformFilter)
Out[27]:
They are instanciated the same way as normal Transformers and Filters:
In [28]:
uff = skchem.forcefields.UFF()
An example molecule that fails is taken from the NCI DTP Diversity set III:
In [29]:
mol_that_fails = skchem.Mol.from_smiles('C[C@H](CCC(=O)O)[C@H]1CC[C@@]2(C)[C@@H]3C(=O)C[C@H]4C(C)(C)[C@@H](O)CC[C@]4(C)[C@H]3C(=O)C[C@]12C',
name='7524')
In [30]:
skchem.vis.draw(mol_that_fails)
Out[30]:
In [31]:
ms.append(mol_that_fails)
In [32]:
res = uff.filter(ms); res
Out[32]:
In [33]:
skchem.vis.draw(res.ix[3])
Out[33]:
In [34]:
res = uff.transform_filter(ms); res
Out[34]:
In [35]:
skchem.vis.draw(res.ix[3])
Out[35]: