You are provided with the following data: loan_data.csv
This is the historical data that the bank has provided. It has the following columns
Application Attributes:
years
: Number of years the applicant has been employed ownership
: Whether the applicant owns a house or not income
: Annual income of the applicant age
: Age of the applicant Behavioural Attributes:
grade
: Credit grade of the applicantOutcome Variable:
amount
: Amount of Loan provided to the applicant interest
: Interest rate charged for the applicant Target Variable
default
: Whether the applicant has defaulted or not Let us build some intuition around the Loan Data
In [ ]:
In [1]:
#Load the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
#Default Variables
%matplotlib inline
plt.rcParams['figure.figsize'] = (16,9)
plt.rcParams['font.size'] = 18
plt.style.use('fivethirtyeight')
pd.set_option('display.float_format', lambda x: '%.2f' % x)
In [3]:
#Load the dataset
df = pd.read_csv("../data/loan_data_clean.csv")
In [4]:
df.head()
Out[4]:
In [5]:
df.shape
Out[5]:
In [6]:
# Create a crosstab of default and grade
pd.crosstab(df.default, df.grade)
Out[6]:
In [7]:
# Create a crosstab of default and grade - percentage by default type
pd.crosstab(df.default, df.grade, normalize="index")
Out[7]:
In [8]:
# Create a crosstab of default and grade - percentage by all type
pd.crosstab(df.default, df.grade, normalize="all")
Out[8]:
In [17]:
plt.style.use('seaborn')
In [18]:
# Create a crosstab of default and grade - percentage by default type
sns.countplot(x="grade", data=df, hue="default", order=['A','B','C','D','E', 'F', 'G'])
Out[18]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [10]:
sns.stripplot(x="default", y="age", data=df, jitter=True)
Out[10]:
In [11]:
sns.stripplot(x="default", y="income", data=df, jitter=True)
Out[11]:
In [12]:
# Create the transformed income variable
df['income_log'] = np.log(df.income)
In [13]:
sns.stripplot(x="default", y="income_log", data=df, jitter=True ,alpha=0.5)
Out[13]:
In [ ]:
In [14]:
#Plot age, years and default
plt.scatter(df.years, df.age, c=df.default, alpha=0.6, cmap=plt.cm.viridis)
Out[14]:
In [30]:
df.plot(kind="scatter", x='age', y='income_log', c=df.default, cmap=plt.cm.viridis)
Out[30]:
In [ ]:
In [20]:
sns.stripplot(x="grade", y="age", data=df, hue="default" , jitter=True)
Out[20]:
In [24]:
plt.scatter(x=df.amount, y=df.interest, c=df.default, alpha = 0.5, cmap="viridis")
Out[24]:
In [ ]:
In [ ]: