In the previous tutorial, we learned some basics of how to load data into DeepChem and how to use the basic DeepChem objects to load and manipulate this data. In this tutorial, you'll put the parts together and learn how to train a basic image classification model in DeepChem. You might ask, why are we bothering to learn this material in DeepChem? Part of the reason is that image processing is an increasingly important part of AI for the life sciences. So learning how to train image processing models will be very useful for using some of the more advanced DeepChem features.
The MNIST dataset contains handwritten digits along with their human annotated labels. The learning challenge for this dataset is to train a model that maps the digit image to its true label. MNIST has been a standard benchmark for machine learning for decades at this point.
This tutorial and the rest in this sequence are designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.
We recommend running this tutorial on Google colab. You'll need to run the following cell of installation commands on Colab to get your environment set up. If you'd rather run the tutorial locally, make sure you don't run these commands (since they'll download and install a new Anaconda python setup)
In [1]:
%tensorflow_version 1.x
!curl -Lo deepchem_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py
import deepchem_installer
%time deepchem_installer.install(version='2.3.0')
In [0]:
from tensorflow.examples.tutorials.mnist import input_data
In [3]:
# TODO: This is deprecated. Let's replace with a DeepChem native loader for maintainability.
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
In [0]:
import deepchem as dc
import tensorflow as tf
from tensorflow.keras.layers import Reshape, Conv2D, Flatten, Dense, Softmax
In [0]:
train = dc.data.NumpyDataset(mnist.train.images, mnist.train.labels)
valid = dc.data.NumpyDataset(mnist.validation.images, mnist.validation.labels)
In [0]:
keras_model = tf.keras.Sequential([
Reshape((28, 28, 1)),
Conv2D(filters=32, kernel_size=5, activation=tf.nn.relu),
Conv2D(filters=64, kernel_size=5, activation=tf.nn.relu),
Flatten(),
Dense(1024, activation=tf.nn.relu),
Dense(10),
Softmax()
])
model = dc.models.KerasModel(keras_model, dc.models.losses.CategoricalCrossEntropy())
In [7]:
model.fit(train, nb_epoch=2)
Out[7]:
In [8]:
from sklearn.metrics import roc_curve, auc
import numpy as np
print("Validation")
prediction = np.squeeze(model.predict_on_batch(valid.X))
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(10):
fpr[i], tpr[i], thresh = roc_curve(valid.y[:, i], prediction[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
print("class %s:auc=%s" % (i, roc_auc[i]))
Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:
This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.
The DeepChem Gitter hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!