In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Churn_Modelling.csv')
dataset.head()
Out[1]:
Create matrix of features and matrix of target variable. In this case we are excluding column 1, 2 & 3 as those are ‘row_number’, ‘customerid’ & 'Surname' which are not useful in our analysis. Column 14, ‘Exited’ is our Target Variable
In [2]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
## Read this for categorical Encoding : http://pbpython.com/categorical-encoding.html
##pd.get_dummies(dataset, columns=["Geography", "Gender"], prefix=["Geography", "Gender"]).head()
In [3]:
def getXy_1(dataset,target):
df = pd.get_dummies(dataset, columns=["Geography", "Gender"], prefix=["Geography", "Gender"])
y = df[target]
X = df.loc[:, df.columns != target]
return X,y
def getXy_2(dataset,target):
lb = LabelEncoder()
dataset['Gender'] = lb.fit_transform(dataset['Gender'])
dataset['Geography'] = lb.fit_transform(dataset['Geography'])
## One-Hot Coding
dataset = pd.get_dummies(dataset, columns = ['Geography','Gender'])
y = dataset[target]
X = dataset.loc[:, dataset.columns != target]
return X,y
In [4]:
X,y = getXy_2(dataset,target='Exited')
print("X.columns:",X.columns)
X = X.iloc[:, 3:].values
y = y.values
y
Out[4]:
In [5]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
I know you are tired of data preprocessing but I promise this is the last step. If you carefully observe data, you will find that data is not scaled properly. Some variable has value in thousands while some have value is tens or ones. We don’t want any of our variable to dominate on other so let’s go and scale data.
‘StandardScaler’ is available in ScikitLearn. In the following code we are fitting and transforming StandardScaler method on train data. We have to standardize our scaling so we will use the same fitted method to transform/scale test data.
In [6]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)
In [7]:
X_train
Out[7]:
In [8]:
X_train.shape
Out[8]:
In [9]:
import keras
from keras.models import Sequential
from keras.layers.core import Dense,Dropout, Dropout,Activation
In [10]:
model = Sequential()
In [11]:
model.add(Dense(16,input_dim=13))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(8))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
In [12]:
model.summary()
In [13]:
# Compiling Neural Network
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
In [14]:
# Fitting our model
model.fit(X_train, y_train, batch_size = 10, nb_epoch = 100)
Out[14]:
In [15]:
##Predicting the test set results
y_pred = model.predict(X_test)
y_pred = (y_pred>0.4)
y_pred
Out[15]:
In [16]:
# Creating the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
In [17]:
cm
Out[17]:
In [ ]: