T81-558: Applications of Deep Neural Networks

Class 4: Classification and Regression

Instructor: Jeff Heaton, School of Engineering and Applied Science, Washington University in St. Louis
For more information visit the class website.

Binary Classification, Classification and Regression

Binary Classification - Classification between two possibilities (positive and negative). Common in medical testing, does the person have the disease (positive) or not (negative).
Classification - Classification between more than 2. The iris dataset (3-way classification).
Regression - Numeric prediction. How many MPG does a car get?

In this class session we will look at some visualizations for all three.

Feature Vector Encoding

These are exactly the same feature vector encoding functions from Class 3. They must be defined for this class as well. For more information, refer to class 3.



In [5]:

    
from sklearn import preprocessing
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df,name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = "{}-{}".format(name,x)
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)

# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df,name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_

# Encode a numeric column as zscores
def encode_numeric_zscore(df,name,mean=None,sd=None):
    if mean is None:
        mean = df[name].mean()

    if sd is None:
        sd = df[name].std()

    df[name] = (df[name]-mean)/sd

# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)

# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df,target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)

    # find out the type of the target column.  Is it really this hard? :(
    target_type = df[target].dtypes
    target_type = target_type[0] if hasattr(target_type, '__iter__') else target_type
    
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        return df.as_matrix(result).astype(np.float32),df.as_matrix([target]).astype(np.int32)
    else:
        # Regression
        return df.as_matrix(result).astype(np.float32),df.as_matrix([target]).astype(np.float32)
    
# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

Toolkit: Visualization Functions

This class will introduce 3 different visualizations that can be used with the two different classification type neural networks and regression neural networks.

Confusion Matrix - For any type of classification neural network.
ROC Curve - For binary classification.
Lift Curve - For regression neural networks.

The code used to produce these visualizations is shown here:



In [6]:

    
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc

# Plot a confusion matrix.
# cm is the confusion matrix, names are the names of the classes.
def plot_confusion_matrix(cm, names, title='Confusion matrix', cmap=plt.cm.Blues):
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(names))
    plt.xticks(tick_marks, names, rotation=45)
    plt.yticks(tick_marks, names)
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    

# Plot an ROC. pred - the predictions, y - the expected output.
def plot_roc(pred,y):
    fpr, tpr, _ = roc_curve(y_test, pred)
    roc_auc = auc(fpr, tpr)

    plt.figure()
    plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC)')
    plt.legend(loc="lower right")
    plt.show()
    
# Plot a lift curve.  pred - the predictions, y - the expected output.
def chart_regression(pred,y):
    t = pd.DataFrame({'pred' : pred.flatten(), 'y' : y_test.flatten()})
    t.sort_values(by=['y'],inplace=True)

    a = plt.plot(t['y'].tolist(),label='expected')
    b = plt.plot(t['pred'].tolist(),label='prediction')
    plt.ylabel('output')
    plt.legend()
    plt.show()

Binary Classification

Binary classification is used to create a model that classifies between only two classes. These two classes are often called "positive" and "negative". Consider the following program that uses the wcbreast_wdbc dataset to classify if a breast tumor is cancerous (malignant) or not (benign). The iris dataset is not binary, because there are three classes (3 types of iris).



In [7]:

    
import os
import pandas as pd
from sklearn.cross_validation import train_test_split
import tensorflow.contrib.learn as skflow
import numpy as np
from sklearn import metrics

path = "./data/"
    
filename = os.path.join(path,"wcbreast_wdbc.csv")    
df = pd.read_csv(filename,na_values=['NA','?'])

# Encode feature vector
df.drop('id',axis=1,inplace=True)
encode_numeric_zscore(df,'mean_radius')
encode_text_index(df,'mean_texture')   
encode_text_index(df,'mean_perimeter')
encode_text_index(df,'mean_area')
encode_text_index(df,'mean_smoothness')
encode_text_index(df,'mean_compactness')
encode_text_index(df,'mean_concavity')
encode_text_index(df,'mean_concave_points')
encode_text_index(df,'mean_symmetry')
encode_text_index(df,'mean_fractal_dimension')
encode_text_index(df,'se_radius')
encode_text_index(df,'se_texture')
encode_text_index(df,'se_perimeter')
encode_text_index(df,'se_area')
encode_text_index(df,'se_smoothness')
encode_text_index(df,'se_compactness')
encode_text_index(df,'se_concavity')
encode_text_index(df,'se_concave_points')
encode_text_index(df,'se_symmetry')
encode_text_index(df,'se_fractal_dimension')
encode_text_index(df,'worst_radius')
encode_text_index(df,'worst_texture')
encode_text_index(df,'worst_perimeter')
encode_text_index(df,'worst_area')
encode_text_index(df,'worst_smoothness')
encode_text_index(df,'worst_compactness')
encode_text_index(df,'worst_concavity')
encode_text_index(df,'worst_concave_points')
encode_text_index(df,'worst_symmetry')
encode_text_index(df,'worst_fractal_dimension')
diagnosis = encode_text_index(df,'diagnosis')
num_classes = len(diagnosis)

# Create x & y for training

# Create the x-side (feature vectors) of the training
x, y = to_xy(df,'diagnosis')
    
# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42) 
    
# Create a deep neural network with 3 hidden layers of 10, 20, 10
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10], n_classes=num_classes,
    steps=10000)

# Early stopping
early_stop = skflow.monitors.ValidationMonitor(x_test, y_test,
    early_stopping_rounds=200, print_steps=50, n_classes=num_classes)
    
# Fit/train neural network
classifier.fit(x_train, y_train, early_stop)

# Measure accuracy
score = metrics.accuracy_score(y, classifier.predict(x))
print("Final accuracy: {}".format(score))









    



/usr/local/lib/python3.4/dist-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.py:281: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  out.itemset((i, self.y[sample]), 1.0)






    



Step #50, epoch #3, avg. train loss: 2.53698, avg. val loss: 2.35609
Step #100, epoch #7, avg. train loss: 0.55002, avg. val loss: 0.54495
Step #150, epoch #10, avg. train loss: 0.48565, avg. val loss: 0.49846
Step #200, epoch #14, avg. train loss: 0.45980, avg. val loss: 0.49158
Step #250, epoch #17, avg. train loss: 0.43307, avg. val loss: 0.45604
Step #300, epoch #21, avg. train loss: 0.40331, avg. val loss: 0.44432
Step #350, epoch #25, avg. train loss: 0.36997, avg. val loss: 0.44423
Step #400, epoch #28, avg. train loss: 0.36701, avg. val loss: 0.44364
Step #450, epoch #32, avg. train loss: 0.34160, avg. val loss: 0.44327
Step #500, epoch #35, avg. train loss: 0.35113, avg. val loss: 0.44534
Step #550, epoch #39, avg. train loss: 0.32387, avg. val loss: 0.43548
Step #600, epoch #42, avg. train loss: 0.33891, avg. val loss: 0.42804
Final accuracy: 0.8471001757469244






    



Stopping. Best step:
 step 429 with loss 0.3980848491191864

Confusion Matrix

The confusion matrix is a common visualization for both binary and larger classification problems. Often a model will have difficulty differentiating between two classes. For example, a neural network might be really good at telling the difference between cats and dogs, but not so good at telling the difference between dogs and wolves. The following code generates a confusion matrix:



In [8]:

    
import numpy as np

from sklearn import svm, datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix

pred = classifier.predict(x_test)
    
# Compute confusion matrix
cm = confusion_matrix(y_test, pred)
np.set_printoptions(precision=2)
print('Confusion matrix, without normalization')
print(cm)
plt.figure()
plot_confusion_matrix(cm, diagnosis)

# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print('Normalized confusion matrix')
print(cm_normalized)
plt.figure()
plot_confusion_matrix(cm_normalized, diagnosis, title='Normalized confusion matrix')

plt.show()









    



Confusion matrix, without normalization
[[67 22]
 [ 3 51]]
Normalized confusion matrix
[[ 0.75  0.25]
 [ 0.06  0.94]]

The above two confusion matrixes show the same network. The bottom (normalized) is the type you will normally see. Notice the two labels. The label "B" means benign (no cancer) and the label "M" means malignant (cancer). The left-right (x) axis are the predictions, the top-bottom) are the expected outcomes. A perfect model (that never makes an error) has a dark blue diagonal that runs from top-left to bottom-right.

To read, consider the top-left square. This square indicates "true labeled" of B and also "predicted label" of B. This is good! The prediction matched the truth. The blueness of this box represents how often "B" is classified correct. It is not darkest blue. This is because the square to the right(which is off the perfect diagonal) has some color. This square indicates truth of "B" but prediction of "M". The white square, at the bottom-left, indicates a true of "M" but predicted of "B". The whiteness indicates this rarely happens.

Your conclusion from the above chart is that the model sometimes classifies "B" as "M" (a false negative), but never mis-classifis "M" as "B". Always look for the dark diagonal, this is good!

ROC Curves

ROC curves can be a bit confusing. However, they are very common. It is important to know how to read them. Even their name is confusing. Do not worry about their name, it comes from electrical engineering (EE).

Binary classification is common in medical testing. Often you want to diagnose if someone has a disease. This can lead to two types of errors, know as false positives and false negatives:

False Positive - Your test (neural network) indicated that the patient had the disease; however, the patient did not have the disease.
False Negative - Your test (neural network) indicated that the patient did not have the disease; however, the patient did have the disease.
True Positive - Your test (neural network) correctly identified that the patient had the disease.
True Negative - Your test (neural network) correctly identified that the patient did not have the disease.

Types of errors:

Neural networks classify in terms of probbility of it being positive. However, at what probability do you give a positive result? Is the cutoff 50%? 90%? Where you set this cutoff is called the threshold. Anything above the cutoff is positive, anything below is negative. Setting this cutoff allows the model to be more sensative or specific:

The following shows a more sensitive cutoff:

An ROC curve measures how good a model is regardless of the cutoff. The following shows how to read a ROC chart:

The following code shows an ROC chart for the breast cancer neural network. The area under the curve (AUC) is also an important measure. The larger the AUC, the better.



In [10]:

    
pred = classifier.predict_proba(x_test)
pred = pred[:,1] # Only positive cases
# print(pred[:,1])
plot_roc(pred,y_test)

Classification

We've already seen multi-class classification, with the iris dataset. Confusion matrixes work just fine with 3 classes. The following code generates a confusion matrix for iris.



In [11]:

    
import os
import pandas as pd
from sklearn.cross_validation import train_test_split
import tensorflow.contrib.learn as skflow
import numpy as np

path = "./data/"
    
filename = os.path.join(path,"iris.csv")    
df = pd.read_csv(filename,na_values=['NA','?'])

# Encode feature vector
encode_numeric_zscore(df,'petal_w')
encode_numeric_zscore(df,'petal_l')
encode_numeric_zscore(df,'sepal_w')
encode_numeric_zscore(df,'sepal_l')
species = encode_text_index(df,"species")
num_classes = len(species)

# Create x & y for training

# Create the x-side (feature vectors) of the training
x, y = to_xy(df,'species')
    
# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=45) 
    # as much as I would like to use 42, it gives a perfect result, and a boring confusion matrix!
    
# Create a deep neural network with 3 hidden layers of 10, 20, 10
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10], n_classes=num_classes,
    steps=10000)

# Early stopping
early_stop = skflow.monitors.ValidationMonitor(x_test, y_test,
    early_stopping_rounds=200, print_steps=50, n_classes=num_classes)
    
# Fit/train neural network
classifier.fit(x_train, y_train, early_stop)









    



/usr/local/lib/python3.4/dist-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.py:281: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  out.itemset((i, self.y[sample]), 1.0)






    



Step #50, epoch #12, avg. train loss: 0.42043, avg. val loss: 0.43100
Step #100, epoch #25, avg. train loss: 0.09383, avg. val loss: 0.16211
Step #150, epoch #37, avg. train loss: 0.05107, avg. val loss: 0.14559
Step #200, epoch #50, avg. train loss: 0.04197, avg. val loss: 0.16927
Step #250, epoch #62, avg. train loss: 0.02741, avg. val loss: 0.15594
Step #300, epoch #75, avg. train loss: 0.02923, avg. val loss: 0.15539
Step #350, epoch #87, avg. train loss: 0.02136, avg. val loss: 0.16504
Step #400, epoch #100, avg. train loss: 0.02007, avg. val loss: 0.15931
Step #450, epoch #112, avg. train loss: 0.02117, avg. val loss: 0.16234
Step #500, epoch #125, avg. train loss: 0.02170, avg. val loss: 0.15867
Step #550, epoch #137, avg. train loss: 0.01714, avg. val loss: 0.15081
Step #600, epoch #150, avg. train loss: 0.01578, avg. val loss: 0.15566
Step #650, epoch #162, avg. train loss: 0.01815, avg. val loss: 0.16091






    



Stopping. Best step:
 step 464 with loss 0.030616959556937218






    Out[11]:





TensorFlowDNNClassifier(batch_size=32, class_weight=None, clip_gradients=5.0,
            config=None, continue_training=False, dropout=None,
            hidden_units=[10, 20, 10], learning_rate=0.1, n_classes=3,
            optimizer='Adagrad', steps=10000, verbose=1)



In [12]:

    
import numpy as np

from sklearn import svm, datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix



pred = classifier.predict(x_test)
    
# Compute confusion matrix
cm = confusion_matrix(y_test, pred)
np.set_printoptions(precision=2)
print('Confusion matrix, without normalization')
print(cm)
plt.figure()
plot_confusion_matrix(cm, species)

# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print('Normalized confusion matrix')
print(cm_normalized)
plt.figure()
plot_confusion_matrix(cm_normalized, species, title='Normalized confusion matrix')

plt.show()









    



Confusion matrix, without normalization
[[14  0  0]
 [ 0  9  0]
 [ 0  3 12]]
Normalized confusion matrix
[[ 1.   0.   0. ]
 [ 0.   1.   0. ]
 [ 0.   0.2  0.8]]

See the strong diagonal? Iris is easy. See the light blue near the bottom? Sometimes virginica is confused for versicolor.

Regression

We've already seen regression with the MPG dataset. Regression uses its own set of visualizations, one of the most common is the lift chart. The following code generates a lift chart.



In [13]:

    
import tensorflow.contrib.learn as skflow
import pandas as pd
import os
import numpy as np
from sklearn import metrics
from scipy.stats import zscore

path = "./data/"

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

# create feature vector
missing_median(df, 'horsepower')
df.drop('name',1,inplace=True)
encode_numeric_zscore(df, 'horsepower')
encode_numeric_zscore(df, 'weight')
encode_numeric_zscore(df, 'cylinders')
encode_numeric_zscore(df, 'displacement')
encode_numeric_zscore(df, 'acceleration')
encode_text_dummy(df, 'origin')

# Encode to a 2D matrix for training
x,y = to_xy(df,['mpg'])

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.25, random_state=42)

# Create a deep neural network with 3 hidden layers of 50, 25, 10
regressor = skflow.TensorFlowDNNRegressor(hidden_units=[50, 25, 10], steps=5000)

# Early stopping
early_stop = skflow.monitors.ValidationMonitor(x_test, y_test,
    early_stopping_rounds=200, print_steps=50)

# Fit/train neural network
regressor.fit(x_train, y_train, early_stop)









    



Step #50, epoch #5, avg. train loss: 40.25010, avg. val loss: 35.26281
Step #100, epoch #10, avg. train loss: 6.67342, avg. val loss: 6.36329
Step #150, epoch #15, avg. train loss: 3.44723, avg. val loss: 3.21085
Step #200, epoch #20, avg. train loss: 1.99451, avg. val loss: 1.88279
Step #250, epoch #25, avg. train loss: 1.46245, avg. val loss: 1.39983
Step #300, epoch #30, avg. train loss: 1.22336, avg. val loss: 1.14907






    



Stopping. Best step:
 step 109 with loss 0.603066623210907






    Out[13]:





TensorFlowDNNRegressor(batch_size=32, clip_gradients=5.0, config=None,
            continue_training=False, dropout=None,
            hidden_units=[50, 25, 10], learning_rate=0.1, n_classes=0,
            optimizer='Adagrad', steps=5000, verbose=1)



In [14]:

    
pred = regressor.predict(x_test)

chart_regression(pred,y_test)

To generate a lift chart, perform the following activities:

Sort the data by expected output. Plot the blue line above.
For every point on the x-axis plot the predicted value for that same data point. This is the green line above.
The x-axis is just 0 to 100% of the dataset. The expected always starts low and ends high.
The y-axis is ranged according to the values predicted.

Reading a lift chart:

The expected and predict lines should be close. Notice where one is above the ot other.
The above chart is the most accurate on lower MPG.



In [ ]: