In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Keras imports
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD
In [ ]:
# Build the model with keras
model = Sequential()
model.add( Dense( output_dim=1, input_dim=2 ) )
model.add( Activation( 'sigmoid' ) )
In [ ]:
# Print the summary
model.summary()
In [ ]:
# Load data
df = pd.read_csv('./data/setosa/train.csv')
X = df[['petal length (cm)', 'petal width (cm)']].values
y = df['setosa'].values
and define a function to look at the predictions of the model (which for the moment is untrained).
In [ ]:
def plot_keras_model():
"Plot the results of the model, along with the data points"
# Calculate the probability on a mesh
petal_width_mesh, petal_length_mesh = \
np.meshgrid( np.linspace(0,3,100), np.linspace(0,8,100) )
petal_width_mesh = petal_width_mesh.flatten()
petal_length_mesh = petal_length_mesh.flatten()
p = model.predict( np.stack( (petal_length_mesh, petal_width_mesh), axis=1 ) )
p = p.reshape((100,100))
# Plot the probability on the mesh
plt.clf()
plt.imshow( p.T, extent=[0,8,0,3], origin='lower',
vmin=0, vmax=1, cmap='RdBu', aspect='auto', alpha=0.7 )
# Plot the data points
plt.scatter( df['petal length (cm)'], df['petal width (cm)'], c=df['setosa'], cmap='RdBu')
plt.xlabel('petal length (cm)')
plt.ylabel('petal width (cm)')
cb = plt.colorbar()
cb.set_label('setosa')
plot_keras_model()
Keras will then automatically adjust the weights by trying to minimize a given loss function.
When the network output is a probability, a good loss function is the binary cross-entropy:
$ L(p_{setosa}) = - y\log(p_{setosa}) - (1-y) \log(1-p_{setosa}) $
In [ ]:
# Prepare the model for training
model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.1), metrics=['accuracy'])
In [ ]:
# Train the network
model.fit( X, y, batch_size=16, nb_epoch=20, verbose=1 )
In [ ]:
plot_keras_model()
Just like we did by hand, keras tried to find the best set of weights for the network.
In order to do so, Keras divided the dataset into batches of 16 examples (16 specimens of iris at a time).
For each batch, it computed how the weights should be changed in order to make the loss function lower, and correspondingly ajusted them by a small amount (which is proportional to the learning rate lr
).
The process of going through the whole dataset (batch by batch) is called an epoch. The network needs to "see" the dataset several times (i.e. it needs several epochs) in order to find the right weights.
The training depends on a number of parameters, that have to be "skillfully" chosen by the user:
A "bad" choice may result in the network training more slowly, or not properly training at all.
The training is stochastic. This is because:
In [ ]:
df_test = pd.read_csv('./data/setosa/test.csv')
df_test.head(10)
In [ ]:
model.predict( np.array([[4.2, 1.5]]) )
In [ ]:
df_test['probability_setosa_predicted'] = model.predict( df_test[['petal length (cm)', 'petal width (cm)']].values )
In [ ]:
df_test
Let us know look at how to train a network with multiple layers here.