This deepchem tutorial introduces Atomic Convolutional Model. We'll see the structure of the Atomic Conv Model and write a simple program to run Atomic Convolutions.
ACNN’s directly exploit the local three-dimensional structure of molecules to hierarchically learn more complex chemical features by optimizing both the model and featurization simultaneously in an end-to-end fashion.
The atom type convolution makes use of a neighbor-listed distance matrix to extract features encoding local chemical environments from an input representation (Cartesian atomic coordinates) that does not necessarily contain spatial locality. Following are the methods use to build ACNN architecture:
R = tf.reduce_sum(tf.multiply(D, D), 3) # D: Distance Tensor R = tf.sqrt(R) # R: Distance Matrix return R
#### Atom type convolution The output of the atom type convolution is constructed from the distance matrix R and atomic number matrix Z. The matrix R is fed into a (1x1) filter with stride 1 and depth of Na , where Na is the number of unique atomic numbers (atom types) present in the molecular system. The atom type convolution kernel is a step function that operates on neighbor distance matrix R.
#### Radial Pooling layer Radial Pooling is basically a dimensionality reduction process which down-samples the output of the atom type convolutions. The reduction process prevents overfitting by providing an abstracted form of representation through feature binning, as well as reducing the number of parameters learned. Mathematically, radial pooling layers pool over tensor slices (receptive fields) of size (1xMx1) with stride 1 and a depth of Nr, where Nr is the number of desired radial filters.
#### Atomistic fully connected network Atomic Conolution layers are stacked by feeding the flattened(N, Na x Nr) output of radial pooling layer into the atom type convolution operation. Finally, we feed the tensor row-wise (per-atom) into a fully-connected network. The same fully connected weights and biases are used for each atom in a given molecule.
Now that we have seen the structural overview of ACNNs, we'll try to get deeper into the model and see how we can train it and what do we expect as the output.
For the training purpose, we will use the publicly available PDBbind dataset. In this example, every row reflects a protein-ligand complex, and the following columns are present: a unique complex identifier; the SMILES string of the ligand; the binding affinity (Ki) of the ligand to the protein in the complex; a Python list of all lines in a PDB file for the protein alone; and a Python list of all lines in a ligand file for the ligand alone.
This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.
To run DeepChem within Colab, you'll need to run the following cell of installation commands. This will take about 5 minutes to run to completion and install your environment.
In :%tensorflow_version 1.x !curl -Lo deepchem_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py import deepchem_installer %time deepchem_installer.install(version='2.3.0')
TensorFlow 1.x selected. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3477 100 3477 0 0 15117 0 --:--:-- --:--:-- --:--:-- 15117add /root/miniconda/lib/python3.6/site-packages to PYTHONPATH python version: 3.6.9 fetching installer from https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh done installing miniconda to /root/miniconda done installing deepchem done /usr/local/lib/python3.6/dist-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+. warnings.warn(msg, category=FutureWarning)WARNING:tensorflow: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue.deepchem-2.3.0 installation finished!CPU times: user 2.78 s, sys: 630 ms, total: 3.41 s Wall time: 2min 7s
In :import deepchem as dc import os from deepchem.utils import download_url
In :download_url("https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/pdbbind_core_df.csv.gz") data_dir = os.path.join(dc.utils.get_data_dir()) dataset_file= os.path.join(dc.utils.get_data_dir(), "pdbbind_core_df.csv.gz") raw_dataset = dc.utils.save.load_from_disk(dataset_file)
In :print("Type of dataset is: %s" % str(type(raw_dataset))) print(raw_dataset[:5]) #print("Shape of dataset is: %s" % str(raw_dataset.shape))
Type of dataset is: <class 'pandas.core.frame.DataFrame'> pdb_id ... label 0 2d3u ... 6.92 1 3cyx ... 8.00 2 3uo4 ... 6.52 3 1p1q ... 4.89 4 3ag9 ... 8.05 [5 rows x 7 columns]
Now that we've seen what our dataset looks like let's go ahead and do some python on this dataset.
In :import numpy as np import tensorflow as tf
TODO(rbharath): This tutorial still needs to be fleshed out.
Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:
This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.
The DeepChem Gitter hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!