Classifying 1984 US House of Representatives Voting Records by Party Affiliation

Information

Downloaded from the UCI Machine Learning Repository on 13 November 2016. The dataset description is as follows:

  • Data Set: Multivariate
  • Attribute: Real
  • Tasks: Classification
  • Instances: 435
  • Attributes: 16
  • Missing Values: Yes
  • Area: Social
  • Date Donated: 1987-04-27

Data Set Information:

This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).

Attribute Information:

  1. Class Name: 2 (democrat, republican)
  2. handicapped-infants: 2 (y,n)
  3. water-project-cost-sharing: 2 (y,n)
  4. adoption-of-the-budget-resolution: 2 (y,n)
  5. physician-fee-freeze: 2 (y,n)
  6. el-salvador-aid: 2 (y,n)
  7. religious-groups-in-schools: 2 (y,n)
  8. anti-satellite-test-ban: 2 (y,n)
  9. aid-to-nicaraguan-contras: 2 (y,n)
  10. mx-missile: 2 (y,n)
  11. immigration: 2 (y,n)
  12. synfuels-corporation-cutback: 2 (y,n)
  13. education-spending: 2 (y,n)
  14. superfund-right-to-sue: 2 (y,n)
  15. crime: 2 (y,n)
  16. duty-free-exports: 2 (y,n)
  17. export-administration-act-south-africa: 2 (y,n)

Relevant Papers:

Schlimmer, J. C. (1987). Concept acquisition through representational adjustment. Doctoral dissertation, Department of Information and Computer Science, University of California, Irvine, CA.

Python Package(s) Used


In [1]:
import dill
import json
import numpy as np
import os
import pandas as pd
import requests
import time

In [2]:
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates, radviz
import seaborn as sns

In [3]:
from sklearn.cross_validation import train_test_split
from sklearn.feature_selection import RFECV
from sklearn.grid_search import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, classification_report, confusion_matrix 
from sklearn.preprocessing import LabelEncoder

In [4]:
%matplotlib inline

Data Fetching


In [5]:
# Importing data from web
URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/voting-records/house-votes-84.data"

def fetch_data(fname='house-votes-84.data'):
    """
    Helper method to retreive the ML Repository dataset.
    """
    response = requests.get(URL)
    outpath  = os.path.abspath(fname)
    with open(outpath, 'wb') as f:
        f.write(response.content)
    
    return outpath

# Fetch the data if required
DATA = fetch_data()

In [6]:
FEATURES  = [
    "class_name",
    "handicapped_infants",
    "water_project_cost_sharing",
    "adoption_of_the_budget_resolution",
    "physician_fee_freeze",
    "el_salvador_aid",
    "religious_groups_in_schools",
    "anti_satellite_test_ban",
    "aid_to_nicaraguan_contras",
    "mx_missile",
    "immigration",
    "synfuels_corporation_cutback",
    "education_spending",
    "superfund_right_to_sue",
    "crime",
    "duty_free_exports",
    "export_administration_act_south_africa"
]

In [7]:
# Read the data into a DataFrame
df = pd.read_csv(DATA, sep=',', header=None, names=FEATURES)

Data Exploration


In [8]:
df.head()


Out[8]:
class_name handicapped_infants water_project_cost_sharing adoption_of_the_budget_resolution physician_fee_freeze el_salvador_aid religious_groups_in_schools anti_satellite_test_ban aid_to_nicaraguan_contras mx_missile immigration synfuels_corporation_cutback education_spending superfund_right_to_sue crime duty_free_exports export_administration_act_south_africa
0 republican n y n y y y n n n y ? y y y n y
1 republican n y n y y y n n n n n y y y n ?
2 democrat ? y y ? y y n n n n y n y y n n
3 democrat n y y n ? y n n n n y n y n n y
4 democrat y y y n y y n n n n y ? y y y y

In [9]:
# Describe the dataset
print(df.describe())


       class_name handicapped_infants water_project_cost_sharing  \
count         435                 435                        435   
unique          2                   3                          3   
top      democrat                   n                          y   
freq          267                 236                        195   

       adoption_of_the_budget_resolution physician_fee_freeze el_salvador_aid  \
count                                435                  435             435   
unique                                 3                    3               3   
top                                    y                    n               y   
freq                                 253                  247             212   

       religious_groups_in_schools anti_satellite_test_ban  \
count                          435                     435   
unique                           3                       3   
top                              y                       y   
freq                           272                     239   

       aid_to_nicaraguan_contras mx_missile immigration  \
count                        435        435         435   
unique                         3          3           3   
top                            y          y           y   
freq                         242        207         216   

       synfuels_corporation_cutback education_spending superfund_right_to_sue  \
count                           435                435                    435   
unique                            3                  3                      3   
top                               n                  n                      y   
freq                            264                233                    209   

       crime duty_free_exports export_administration_act_south_africa  
count    435               435                                    435  
unique     3                 3                                      3  
top        y                 n                                      y  
freq     248               233                                    269  

In [10]:
# Unique value counts for each column
for i in df.columns:
    print(df[i].value_counts())


democrat      267
republican    168
Name: class_name, dtype: int64
n    236
y    187
?     12
Name: handicapped_infants, dtype: int64
y    195
n    192
?     48
Name: water_project_cost_sharing, dtype: int64
y    253
n    171
?     11
Name: adoption_of_the_budget_resolution, dtype: int64
n    247
y    177
?     11
Name: physician_fee_freeze, dtype: int64
y    212
n    208
?     15
Name: el_salvador_aid, dtype: int64
y    272
n    152
?     11
Name: religious_groups_in_schools, dtype: int64
y    239
n    182
?     14
Name: anti_satellite_test_ban, dtype: int64
y    242
n    178
?     15
Name: aid_to_nicaraguan_contras, dtype: int64
y    207
n    206
?     22
Name: mx_missile, dtype: int64
y    216
n    212
?      7
Name: immigration, dtype: int64
n    264
y    150
?     21
Name: synfuels_corporation_cutback, dtype: int64
n    233
y    171
?     31
Name: education_spending, dtype: int64
y    209
n    201
?     25
Name: superfund_right_to_sue, dtype: int64
y    248
n    170
?     17
Name: crime, dtype: int64
n    233
y    174
?     28
Name: duty_free_exports, dtype: int64
y    269
?    104
n     62
Name: export_administration_act_south_africa, dtype: int64

In [11]:
# Dataset information
print(df.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 435 entries, 0 to 434
Data columns (total 17 columns):
class_name                                435 non-null object
handicapped_infants                       435 non-null object
water_project_cost_sharing                435 non-null object
adoption_of_the_budget_resolution         435 non-null object
physician_fee_freeze                      435 non-null object
el_salvador_aid                           435 non-null object
religious_groups_in_schools               435 non-null object
anti_satellite_test_ban                   435 non-null object
aid_to_nicaraguan_contras                 435 non-null object
mx_missile                                435 non-null object
immigration                               435 non-null object
synfuels_corporation_cutback              435 non-null object
education_spending                        435 non-null object
superfund_right_to_sue                    435 non-null object
crime                                     435 non-null object
duty_free_exports                         435 non-null object
export_administration_act_south_africa    435 non-null object
dtypes: object(17)
memory usage: 57.9+ KB
None

In [12]:
# Check for missing values
print(df.isnull().sum())


class_name                                0
handicapped_infants                       0
water_project_cost_sharing                0
adoption_of_the_budget_resolution         0
physician_fee_freeze                      0
el_salvador_aid                           0
religious_groups_in_schools               0
anti_satellite_test_ban                   0
aid_to_nicaraguan_contras                 0
mx_missile                                0
immigration                               0
synfuels_corporation_cutback              0
education_spending                        0
superfund_right_to_sue                    0
crime                                     0
duty_free_exports                         0
export_administration_act_south_africa    0
dtype: int64

In [13]:
df_2 = df.copy()

In [14]:
# Labelencoding
df_2['class_name'] = df_2['class_name'].map({'democrat':0,'republican':1})

for i in df_2.columns[1:]:
    df_2[i] = df_2[i].map({'n': 0,'y': 1,'?':2})

In [15]:
df_2.head()


Out[15]:
class_name handicapped_infants water_project_cost_sharing adoption_of_the_budget_resolution physician_fee_freeze el_salvador_aid religious_groups_in_schools anti_satellite_test_ban aid_to_nicaraguan_contras mx_missile immigration synfuels_corporation_cutback education_spending superfund_right_to_sue crime duty_free_exports export_administration_act_south_africa
0 1 0 1 0 1 1 1 0 0 0 1 2 1 1 1 0 1
1 1 0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 2
2 0 2 1 1 2 1 1 0 0 0 0 1 0 1 1 0 0
3 0 0 1 1 0 2 1 0 0 0 0 1 0 1 0 0 1
4 0 1 1 1 0 1 1 0 0 0 0 1 2 1 1 1 1

In [16]:
# First pass for looking at frequency counts as a function of column.

for i in df_2.iloc[:,:]:
    print(i) 
   
    plt.figure(1, figsize = (5,5), dpi = 80)
    #histogram plot
    plt.subplot(111)
    plt.title("Histogram")
    plt.hist(df_2.iloc[:,:][i])    
    
    plt.tight_layout()
    plt.show()


class_name
handicapped_infants
water_project_cost_sharing
adoption_of_the_budget_resolution
physician_fee_freeze
el_salvador_aid
religious_groups_in_schools
anti_satellite_test_ban
aid_to_nicaraguan_contras
mx_missile
immigration
synfuels_corporation_cutback
education_spending
superfund_right_to_sue
crime
duty_free_exports
export_administration_act_south_africa

In [17]:
# Pairplot
sns.pairplot(df_2)


Out[17]:
<seaborn.axisgrid.PairGrid at 0x7f93e9ed3940>

In [18]:
# Correlation heatmap
sns.heatmap(df_2.corr())


Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f93e55c2278>

In [19]:
# Parallel coordinates plot
plt.figure(figsize=(12,12))
plt.xticks(rotation='vertical')
parallel_coordinates(df_2, 'class_name')
plt.show()



In [20]:
# Radial plot
plt.figure(figsize=(12,12))
radviz(df_2, 'class_name')
plt.show()


Data Extraction

Keeping Bunches method for later use.

Logistic Regression Classification


In [21]:
df_3 = df_2.copy()
# Drop target column for test-train-split
df_3 = df_3.drop('class_name', axis=1)

In [22]:
# Test-train split. Learning curves not performed. Using 80/20% split.
X_train, X_test, y_train, y_test = train_test_split(df_3, df_2['class_name'], train_size=0.8,
                                                    random_state=1)

In [23]:
# Data not scaled, since total range of data is 0-2 and categorical.
clf = LogisticRegression()

In [24]:
# Initialize RFECV for feature selection
rfecv = RFECV(estimator=clf, step=1, cv=12, scoring='accuracy')
rfecv.fit(X_train, y_train)

print("Optimal number of features : %d" % rfecv.n_features_)


Optimal number of features : 5

In [25]:
# Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()



In [26]:
# Print out table of sorted features by importance. Top features become the features used for ML.
print("Features sorted: ")
rfecv_ranking_df = pd.DataFrame({'feature':X_train.columns,
                                     'importance':rfecv.ranking_})
rfecv_ranking_df_sorted = rfecv_ranking_df.sort_values(by = 'importance'
                                        , ascending = True)
rfecv_ranking_df_sorted


Features sorted: 
Out[26]:
feature importance
2 adoption_of_the_budget_resolution 1
3 physician_fee_freeze 1
9 immigration 1
10 synfuels_corporation_cutback 1
11 education_spending 1
8 mx_missile 2
4 el_salvador_aid 3
13 crime 4
1 water_project_cost_sharing 5
14 duty_free_exports 6
0 handicapped_infants 7
7 aid_to_nicaraguan_contras 8
15 export_administration_act_south_africa 9
6 anti_satellite_test_ban 10
5 religious_groups_in_schools 11
12 superfund_right_to_sue 12

In [27]:
# Issues with subselecting the appropriate columns on the present test-train split. So
# re-performing test-train split for GridSearchCV.
df_4 = df_3[['adoption_of_the_budget_resolution','physician_fee_freeze','immigration',
             'synfuels_corporation_cutback','education_spending']]

# Test-train split. Learning curves not performed. Using 80/20% split.
X_train, X_test, y_train, y_test = train_test_split(df_4, df_2['class_name'], train_size=0.8,
                                                   random_state=1)

In [28]:
# GridSearch for optimum parameters.
param_grid_pipeline = {'C':[0.0001,0.001,0.01,0.1,1.0,10,100], 
              'fit_intercept':[True,False],
              'class_weight':['balanced',None],
              'solver':['liblinear','newton-cg','lbfgs','sag']}

In [29]:
grid = GridSearchCV(clf, param_grid_pipeline, cv = 12, n_jobs = -1, verbose=1, scoring = 'accuracy')

In [30]:
grid.fit(X_train, y_train)


Fitting 12 folds for each of 112 candidates, totalling 1344 fits
[Parallel(n_jobs=-1)]: Done 1344 out of 1344 | elapsed:    3.1s finished
Out[30]:
GridSearchCV(cv=12, error_score='raise',
       estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'class_weight': ['balanced', None], 'fit_intercept': [True, False], 'solver': ['liblinear', 'newton-cg', 'lbfgs', 'sag'], 'C': [0.0001, 0.001, 0.01, 0.1, 1.0, 10, 100]},
       pre_dispatch='2*n_jobs', refit=True, scoring='accuracy', verbose=1)

In [31]:
grid.best_score_


Out[31]:
0.95402298850574707

In [32]:
grid.best_estimator_.get_params()


Out[32]:
{'C': 1.0,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'max_iter': 100,
 'multi_class': 'ovr',
 'n_jobs': 1,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'newton-cg',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}

In [33]:
# Save model to disk
dill.dump(grid.best_estimator_, open('model_1984cvc_lr', 'wb'))

In [34]:
# Import model from disk
grid = dill.load(open('model_1984cvc_lr', 'rb'))

In [35]:
# Predicted target class
y_pred = grid.predict(X_test)
y_pred


Out[35]:
array([0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0,
       0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0,
       1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0])

In [36]:
# Predicted target class probabilities
y_pred_proba = grid.predict_proba(X_test)
y_pred_proba


Out[36]:
array([[  8.48882881e-01,   1.51117119e-01],
       [  9.83882413e-01,   1.61175867e-02],
       [  1.58871932e-02,   9.84112807e-01],
       [  2.27912244e-01,   7.72087756e-01],
       [  4.27385056e-02,   9.57261494e-01],
       [  6.85272393e-01,   3.14727607e-01],
       [  9.93194418e-01,   6.80558225e-03],
       [  7.53908547e-01,   2.46091453e-01],
       [  4.27385056e-02,   9.57261494e-01],
       [  9.63154393e-01,   3.68456075e-02],
       [  1.58871932e-02,   9.84112807e-01],
       [  9.56658857e-01,   4.33411433e-02],
       [  4.09165533e-03,   9.95908345e-01],
       [  5.88407914e-01,   4.11592086e-01],
       [  7.53908547e-01,   2.46091453e-01],
       [  1.63681067e-01,   8.36318933e-01],
       [  9.83882413e-01,   1.61175867e-02],
       [  1.58871932e-02,   9.84112807e-01],
       [  1.58871932e-02,   9.84112807e-01],
       [  9.73781070e-01,   2.62189301e-02],
       [  6.45474973e-01,   3.54525027e-01],
       [  9.83882413e-01,   1.61175867e-02],
       [  9.70838114e-01,   2.91618861e-02],
       [  9.97528447e-01,   2.47155260e-03],
       [  5.88407914e-01,   4.11592086e-01],
       [  1.58871932e-02,   9.84112807e-01],
       [  9.83882413e-01,   1.61175867e-02],
       [  9.97528447e-01,   2.47155260e-03],
       [  9.56658857e-01,   4.33411433e-02],
       [  9.93194418e-01,   6.80558225e-03],
       [  9.97528447e-01,   2.47155260e-03],
       [  4.27385056e-02,   9.57261494e-01],
       [  9.83882413e-01,   1.61175867e-02],
       [  5.61679242e-01,   4.38320758e-01],
       [  9.56658857e-01,   4.33411433e-02],
       [  9.97142000e-01,   2.85800000e-03],
       [  9.83882413e-01,   1.61175867e-02],
       [  1.49249702e-01,   8.50750298e-01],
       [  9.97528447e-01,   2.47155260e-03],
       [  1.58871932e-02,   9.84112807e-01],
       [  9.93194418e-01,   6.80558225e-03],
       [  9.56658857e-01,   4.33411433e-02],
       [  9.93194418e-01,   6.80558225e-03],
       [  1.12346452e-02,   9.88765355e-01],
       [  1.58871932e-02,   9.84112807e-01],
       [  9.97528447e-01,   2.47155260e-03],
       [  9.93194418e-01,   6.80558225e-03],
       [  7.98128548e-01,   2.01871452e-01],
       [  4.27385056e-02,   9.57261494e-01],
       [  9.56658857e-01,   4.33411433e-02],
       [  9.56658857e-01,   4.33411433e-02],
       [  4.27385056e-02,   9.57261494e-01],
       [  9.93194418e-01,   6.80558225e-03],
       [  9.93194418e-01,   6.80558225e-03],
       [  9.97528447e-01,   2.47155260e-03],
       [  2.45918841e-01,   7.54081159e-01],
       [  9.97528447e-01,   2.47155260e-03],
       [  9.83882413e-01,   1.61175867e-02],
       [  9.97528447e-01,   2.47155260e-03],
       [  1.58871932e-02,   9.84112807e-01],
       [  9.97528447e-01,   2.47155260e-03],
       [  9.93194418e-01,   6.80558225e-03],
       [  9.90358170e-01,   9.64183018e-03],
       [  9.93194418e-01,   6.80558225e-03],
       [  1.49249702e-01,   8.50750298e-01],
       [  5.96505161e-02,   9.40349484e-01],
       [  9.97528447e-01,   2.47155260e-03],
       [  4.13734162e-01,   5.86265838e-01],
       [  9.83882413e-01,   1.61175867e-02],
       [  4.27385056e-02,   9.57261494e-01],
       [  6.80188690e-01,   3.19811310e-01],
       [  9.97528447e-01,   2.47155260e-03],
       [  9.83882413e-01,   1.61175867e-02],
       [  1.58871932e-02,   9.84112807e-01],
       [  9.39523443e-01,   6.04765574e-02],
       [  9.56658857e-01,   4.33411433e-02],
       [  1.58871932e-02,   9.84112807e-01],
       [  1.58871932e-02,   9.84112807e-01],
       [  4.27385056e-02,   9.57261494e-01],
       [  1.12346452e-02,   9.88765355e-01],
       [  9.99625398e-01,   3.74601576e-04],
       [  9.56658857e-01,   4.33411433e-02],
       [  4.34721776e-01,   5.65278224e-01],
       [  3.51183893e-01,   6.48816107e-01],
       [  5.37017358e-01,   4.62982642e-01],
       [  4.27385056e-02,   9.57261494e-01],
       [  9.56658857e-01,   4.33411433e-02]])

In [37]:
y_pred_proba_democrat = y_pred_proba[:,0]
y_pred_proba_republican = y_pred_proba[:,1]

In [38]:
# Create dataframe of predicted values and probabilities for party affiliation
df_pred = pd.DataFrame({'class_name':y_test,
                        'class_name_predicted':y_pred,
                        'class_name_prob_democrat':y_pred_proba_democrat,
                        'class_name_prob_republican':y_pred_proba_republican})
df_pred.head()


Out[38]:
class_name class_name_predicted class_name_prob_democrat class_name_prob_republican
348 0 0 0.848883 0.151117
201 0 0 0.983882 0.016118
122 1 1 0.015887 0.984113
407 0 1 0.227912 0.772088
256 1 1 0.042739 0.957261

In [39]:
# Save test-based data to .csv to disk.
df_pred.to_csv('1984cvc_jhb.csv')

In [40]:
# Print classification report 
target_names = ['Democrat', 'Republican']
clr = classification_report(y_test, y_pred, target_names=target_names)
print(clr)


             precision    recall  f1-score   support

   Democrat       0.96      0.96      0.96        56
 Republican       0.94      0.94      0.94        31

avg / total       0.95      0.95      0.95        87


In [41]:
# Print/plot confusion matrix
cm = np.array(confusion_matrix(y_test, y_pred, labels=[0,1]))

confusion = pd.DataFrame(cm, index=['Democrat', 'Republican'],
                         columns=['predicted_Democrat','predicted_Republican'])
confusion


Out[41]:
predicted_Democrat predicted_Republican
Democrat 54 2
Republican 2 29

In [42]:
def plot_confusion_matrix(cm, title='Confusion Matrix', cmap=plt.cm.Blues):
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title, size = 15)
    plt.colorbar()    
    tick_marks = np.arange(len(target_names))
    plt.xticks(tick_marks, target_names, rotation=0, size = 12)
    plt.yticks(tick_marks, target_names, rotation=90, size = 12)
    plt.tight_layout()
    plt.ylabel('True Label', size = 15)
    plt.xlabel('Predicted Label', size = 15)
    plt.savefig('plot_confusion_matrix')

In [48]:
# Plot confusion matrix
cm = confusion_matrix(y_test, y_pred)
np.set_printoptions(precision=2)
print('Confusion matrix, without normalization')
print(cm)
plt.figure()
plot_confusion_matrix(cm)


Confusion matrix, without normalization
[[54  2]
 [ 2 29]]

In [44]:
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print('Normalized Confusion Matrix')
print(cm_normalized)
plt.figure()
plot_confusion_matrix(cm_normalized, title='Normalized Confusion Matrix')
plt.savefig('plot_norm_confusion_matrix')
plt.show()


Normalized Confusion Matrix
[[ 0.96  0.04]
 [ 0.06  0.94]]

In [45]:
# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
fpr[1], tpr[1], _ = roc_curve(y_test, y_pred_proba_republican)
roc_auc[1] = auc(fpr[1], tpr[1])

In [46]:
# Plot of ROC curve for a specific class
def roc_curve_single_class(fpr, tpr, roc_auc):
    plt.figure()
    plt.plot(fpr[1], tpr[1], label='ROC curve (area = %0.2f)' % roc_auc[1])
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate', size = 15)
    plt.ylabel('True Positive Rate', size = 15)
    plt.xticks(size = 12)
    plt.yticks(size = 12)
    plt.title('Receiver Operating Characteristic (ROC)', size = 15)
    plt.legend(loc="lower right")
    plt.savefig('plot_roc_curve')
    plt.show()

In [47]:
roc_curve_single_class(fpr, tpr, roc_auc)



In [ ]: