This demo will demonstrate how to do transfer learning to leverage the power of a deep convolutional neural network without having to train one yourself. Most people do not train those types of networks from scratch because of the large data and computational power requirements. What is more common is to train the network on a large dataset (unrelated to our task) and then leverage the representation it learnt in one of the following ways:
This notebook will be doing the latter using the Inception-v3 model that was trained on the ImageNet Large Visual Recognition Challenge dataset made up of over 1 million images, where the task was to classify images into 1000 classes.
We will cover three topics in this demo:
The following videos shows DeepTeach in action:
In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('7hZ3X37Qwc4')
Out[2]:
We recommend reading the Tensorflow Image Recognition Tutorial before going though this demo.
The notebook cells below use pymldb
's Connection
class to make REST API calls. You can check out the Using pymldb
Tutorial for more details.
In [2]:
from pymldb import Connection
mldb = Connection()
import urllib2, pandas as pd, numpy as np, matplotlib.pyplot as plt
from matplotlib.offsetbox import TextArea, DrawingArea, OffsetImage, AnnotationBbox
from matplotlib._png import read_png
%matplotlib inline
In the Tensorflow Image Recognition Tutorial tutorial, you saw how to embed the image of Admiral Grace Hopper using the Inception model. To embed a whole dataset of images, we do the same thing but within a procedure of type transform
. We will not cover this in details in the notebook, but the detailed code is available in the Dataset Builder plugin's code.
We ship the plugin with 4 pre-assembled datasets: real-estate, recipes, transportation and pets. We start by loading the embeddings of the images for the real-estate dataset:
In [3]:
prefix = "http://public.mldb.ai/datasets/dataset-builder"
print mldb.put("/v1/procedures/embedded_images", {
"type": "import.text",
"params": {
"dataFileUrl": prefix + "/cache/dataset_creator_embedding_realestate.csv.gz",
"outputDataset": {
"id": "embedded_images_realestate",
"type": "embedding"
},
"select": "* EXCLUDING(rowName)",
"named": "rowName",
"runOnCreation": True
}
})
The dataset we just imported has one row per image and the dense columns are the 2048-dimensional embeddings. We used the second to last layer of the network, the pool_3
layer, which is less specialized than the final softmax
layer of the network. Since the Inception model was trained on the ImageNet task, the last layer has been trained to perform very well on that specific task, while the previous layers are more abstract representations and are more suitable for transfer learning tasks.
The following query shows the embedding values for 2 rows:
In [4]:
mldb.query("SELECT * FROM embedded_images_realestate ORDER BY rowHash() ASC LIMIT 5")
Out[4]:
The real-estate dataset contains images of different types of buildings. The following query shows the different categories:
In [5]:
mldb.query("""
SELECT count(*) as count
FROM embedded_images_realestate
GROUP BY regex_replace(rowName(), '-[\\d]+', '')
""")
Out[5]:
Here are a few sample images:
condo-13 |
|
sand_castle-10 |
|
office_building-11 |
|
town_house-2 |
|
The first transfer learning task we will do is using the rich abstract embedding of the images to run an unsupervised dimensionality reduction algorithm to visualize the real-estate dataset.
In the following query, we use the t-SNE algorithm to do dimensionality reduction for our visualization:
In [6]:
print mldb.put("/v1/procedures/tsne", {
"type": "tsne.train",
"params": {
"trainingData": "SELECT * FROM embedded_images_realestate",
"rowOutputDataset": "tsne_embedding",
"numOutputDimensions": 2,
"runOnCreation": True
}
})
The tsne_embedding
dataset that the t-SNE procedure generated gives us x
and y
coordinates for all our images:
In [7]:
mldb.query("SELECT * from tsne_embedding limit 2")
Out[7]:
We can now create a scatter plot of all the images in our dataset, positioning them at the coordinates provided by the t-SNE algorithm:
In [8]:
image_prefix = "http://public.mldb.ai/datasets/dataset-builder/images/realestate_png/"
df = mldb.query("SELECT * from tsne_embedding")
bounds = df.quantile([.05, .95]).T.values
fig = plt.figure(figsize=(18, 15), frameon=False)
ax = fig.add_subplot(111, xlim=bounds[0], ylim=bounds[1])
plt.axis('off')
for x in df.iterrows():
imagebox = OffsetImage(read_png(urllib2.urlopen(image_prefix + "%s.png" % x[0])), zoom=0.35)
ax.add_artist(AnnotationBbox(imagebox, xy=(x[1]["x"], x[1]["y"]), xycoords='data', frameon=False))
We can clearly see clusters of similar images in the scatter plot above. Most of the work to make this possible was already done, when the convolutional network was trained. This allowed a simple dimensionality reduction algorithm to get decent results.
The second transfer learning task we will do is to use the image embeddings as features for a supervised multi-class classifier. In this example, we will train a bagged boosted decision tree to do multi-class classification on a subset of the labels present in the real-estate dataset used in the unsupervised learning example.
The following call will generate our training dataset, keeping only the images that are either a castle, an igloo, a cabin, a town house, a condo or a sand castle:
In [9]:
print mldb.put("/v1/procedures/<id>", {
"type": "transform",
"params": {
"inputData": """
SELECT *, regex_replace(rowName(), '-[\\d]+', '') as category
FROM embedded_images_realestate
WHERE regex_replace(rowName(), '-[\\d]+', '') IN
('castle', 'igloo', 'cabin', 'town_house', 'condo', 'sand_castle')
""",
"outputDataset": "training_dataset",
"runOnCreation": True
}
})
We can now take look at the generated dataset:
In [10]:
mldb.query("select * from training_dataset limit 3")
Out[10]:
We can now use a procedure of type classifier.experiment
to train and test our classifier. We will be using 2/3 of our data for training and the rest for testing:
In [11]:
rez = mldb.put("/v1/procedures/<id>", {
"type": "classifier.experiment",
"params": {
"experimentName": "realestate",
"inputData": """
SELECT
{* EXCLUDING(category)} as features,
category as label
FROM training_dataset
""",
"datasetFolds": [
{
"trainingWhere": "rowHash() % 3 < 2",
"testingWhere": "rowHash() % 3 = 2",
}
],
"modelFileUrlPattern": "file:///mldb_data/realestate.cls",
"algorithm": "bbdt",
"mode": "categorical",
"outputAccuracyDataset": True,
"runOnCreation": True
}
})
runResults = rez.json()["status"]["firstRun"]["status"]["folds"][0]["resultsTest"]
print rez
We can now look at the average classification accuracy for all labels:
In [12]:
pd.DataFrame.from_dict({"weightedStatistics": runResults["weightedStatistics"]})
Out[12]:
Those results are pretty good, and again this shows how good the features we are getting from the ConvNet are.
We can look at the confusion matrix to give us a better idea of where the mistakes were made:
In [13]:
pd.DataFrame(runResults["confusionMatrix"]).pivot_table(index="actual", columns="predicted", fill_value=0)
Out[13]:
DeepTeach is a concrete example of the powerful things we can do with transfer learning. It essentially does similarity search to allow a user to very quickly build a binary image classifier by doing a combination of thow things:
You can see the plugin in action in the video included at the top of the notebook, or read the blog post explaining how it works.
To play with the plugin yourself, you can install it in your MLDB instance by simply running the next cell:
In [ ]:
print mldb.put('/v1/plugins/deepteach', {
"type": "python",
"params": {
"address": "git://github.com/mldbai/deepteach.git"
}
})
If you did not change the name of the plugin in the cell above, you can load the plugin's UI here.
NOTE: this only works if you're running this Notebook live, not if you're looking at a static copy on http://docs.mldb.ai. See the documentation for Running MLDB.