Get the Data



In [1]:

    
%pylab inline
pylab.style.use('ggplot')
import numpy as np
import pandas as pd









    



Populating the interactive namespace from numpy and matplotlib



In [2]:

    
data_df = pd.read_csv('https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv')
data_df.head()









    Out[2]:






  
    
      
      sepal_length
      sepal_width
      petal_length
      petal_width
      species
    
  
  
    
      0
      5.1
      3.5
      1.4
      0.2
      setosa
    
    
      1
      4.9
      3.0
      1.4
      0.2
      setosa
    
    
      2
      4.7
      3.2
      1.3
      0.2
      setosa
    
    
      3
      4.6
      3.1
      1.5
      0.2
      setosa
    
    
      4
      5.0
      3.6
      1.4
      0.2
      setosa

Make the 'species' Column Categorical



In [4]:

    
data_df = data_df.assign(species=data_df.species.astype('category'))



In [5]:

    
data_df.head()









    Out[5]:






  
    
      
      sepal_length
      sepal_width
      petal_length
      petal_width
      species
    
  
  
    
      0
      5.1
      3.5
      1.4
      0.2
      setosa
    
    
      1
      4.9
      3.0
      1.4
      0.2
      setosa
    
    
      2
      4.7
      3.2
      1.3
      0.2
      setosa
    
    
      3
      4.6
      3.1
      1.5
      0.2
      setosa
    
    
      4
      5.0
      3.6
      1.4
      0.2
      setosa



In [6]:

    
import seaborn as sns
sns.pairplot(data_df, hue='species')









    Out[6]:





<seaborn.axisgrid.PairGrid at 0x148ad270940>

Shuffle the Columns and Split Train and Test Data



In [9]:

    
np.random.shuffle(data_df.values)
test_df = data_df.groupby(by='species').head(10)



In [10]:

    
train_df = data_df.iloc[data_df.index.difference(test_df.index)]
train_df.species.value_counts()









    Out[10]:





virginica     40
versicolor    40
setosa        40
Name: species, dtype: int64



In [11]:

    
import tensorflow as tf



In [12]:

    
# Helper function to make feed dict

def make_feed_dict(X, y, df):
    features = df.drop('species', axis=1).values
    labels = df.species.cat.codes
    labels_2d = np.atleast_2d(labels).T
    
    return {X: features, y: labels_2d}

Set up TensorFlow Layers and Train



In [29]:

    
tf.reset_default_graph()

# Shape of X: n_training_rows * 4 cols
# Shape of y:n_training_rows * 1 col 
with tf.variable_scope('input'):
    X = tf.placeholder(name='X', shape=(None, 4), dtype=np.float64)
    y = tf.placeholder(name='y', shape=(None, 1), dtype=np.int64)

# Shape of w: 4 rows (n_cols from last layer) * 3 cols (n_classes) 
with tf.variable_scope('hidden'):
    w = tf.get_variable(name='w', 
                        shape=(4, 3),  
                        initializer=tf.truncated_normal_initializer(),
                       dtype=np.float64)
    
    b = tf.get_variable(name='b', 
                        shape=1,  
                        initializer=tf.constant_initializer(1.0),
                       dtype=np.float64)
    
    hidden = tf.add(tf.matmul(X, w), b, name='hidden')
    
# The softmax layer calculates cross-entropy error 
# between the softmaxed output of the hidden layer 
# and the one-hot encoded train labels
with tf.variable_scope('cross_entropy'):
    one_hot = tf.one_hot(indices=y, depth=3, name='one_hot')
    
    x_ent = tf.nn.softmax_cross_entropy_with_logits(
            labels=one_hot, 
            name='cross_entropy_error', 
            logits=hidden)
    
    loss_function = tf.reduce_mean(x_ent, name='loss')

with tf.variable_scope('train'):
    optimizer = tf.train.AdamOptimizer(learning_rate=0.02)
    train_op = optimizer.minimize(loss_function)
    
n_iter = 1000
init_op = tf.global_variables_initializer()
loss_values = np.zeros(n_iter)

from IPython.display import display
from ipywidgets import FloatProgress

progress = FloatProgress(min=0, max=n_iter, description='Running training loop..')
display(progress)

with tf.Session() as sess:
    sess.run(init_op)

    train_dict = make_feed_dict(X, y, train_df)
    for i in range(1, n_iter+1):        
        _, current_loss = sess.run([train_op, loss_function], feed_dict=train_dict)       
        loss_values[i-1] = current_loss
        
        progress.value += 1
        
    progress.bar_style = 'Success'
    progress.description = 'Training complete.'
    
    # Evaluate with test data
    test_dict = make_feed_dict(X, y, test_df)
    test_preds = tf.argmax(tf.nn.softmax(hidden), axis=1, name='test_preds')
    test_results = sess.run(test_preds, feed_dict=test_dict)



In [30]:

    
loss_values = pd.Series(loss_values, name='cross_entropy_loss')
ax = loss_values.rolling(window=20).mean().plot()
ax.set(xlabel='Number of iterations', ylabel='Smoothed cross entropy loss')









    Out[30]:





[<matplotlib.text.Text at 0x148b2e92a20>,
 <matplotlib.text.Text at 0x148b3423208>]



In [31]:

    
# Label codes for the IRIS species
test_results









    Out[31]:





array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2], dtype=int64)



In [39]:

    
predicted_labels = np.apply_along_axis(lambda idx: data_df.species.cat.categories[idx], 0, test_results)
actual_labels = test_df.species

labels = pd.unique(data_df.species)

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(predicted_labels, actual_labels.values, labels=labels)
cm = pd.DataFrame(cm, index=labels, columns=labels)



In [40]:

    
sns.heatmap(cm, annot=True)









    Out[40]:





<matplotlib.axes._subplots.AxesSubplot at 0x148b35ce2e8>

Appendix: Cross Entopy Error

Let's say at any given time, we have a batch of 5 training examples. Each example contains 4 features for 3 classes (same as in the IRIS problem).

The output layer of 3 nodes will then produce a $5*3$ matrix.



In [63]:

    
out = np.random.rand(5, 3)
out









    Out[63]:





array([[ 0.91804477,  0.93496251,  0.23093528],
       [ 0.6539292 ,  0.77752673,  0.86195378],
       [ 0.89837069,  0.98221402,  0.40337227],
       [ 0.4002789 ,  0.58142488,  0.24842138],
       [ 0.19059925,  0.78464687,  0.7392134 ]])

Each row is then softmaxed into a probability distribution:



In [64]:

    
def naive_softmax(row):
    return np.exp(row) / np.sum(np.exp(row))



In [65]:

    
q = np.apply_along_axis(naive_softmax, 1, out)



In [66]:

    
q









    Out[66]:





array([[ 0.39681128,  0.40358154,  0.19960718],
       [ 0.2973709 ,  0.33649313,  0.36613597],
       [ 0.37077817,  0.40320588,  0.22601595],
       [ 0.32704312,  0.39199065,  0.28096623],
       [ 0.22015968,  0.39877635,  0.38106397]])

This is the model predicted distribution.



In [67]:

    
np.sum(q, axis=1)









    Out[67]:





array([ 1.,  1.,  1.,  1.,  1.])

Now, we also have 5 1-hot encoded training labels, corresponding to 3 classes. This is true distribution.



In [68]:

    
p = np.array([
    [1, 0, 0],
    [1, 0, 0],
    [0, 0, 1],
    [0, 1, 0],
    [0, 0, 1],
])

The cross entropy error for example 1 is

$- \sum_i p_i \space ln \space (q_i)$



In [73]:

    
-np.sum(p[0, :] * np.log(q[0, :]))









    Out[73]:





0.92429447200507864

Our goal is to build a model that's as close to the true distribution as possible. In that case, the value of q (for the 1st example) would be something like:



In [75]:

    
ideal_q1 = np.array([1 - 2E-5, 1E-5, 1E-5])

And the corresponding cross-entropy error is:



In [76]:

    
-np.sum(p[0, :] * np.log(ideal_q1))









    Out[76]:





2.0000200002686709e-05

In other words, minimizing the cross-entropy error leads to a model that closely approximates the true distribution.

Since we have a number of training examples, we calculate the cross entropy error of each sample and choose to minimize the mean, as in the case of MSE for regression problems.



In [82]:

    
xent_sum = -np.sum(p * np.apply_along_axis(np.log, 1, q), axis=1)
np.mean(xent_sum)









    Out[82]:





1.1051049173025631

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa