```
In [1]:
```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from collections import defaultdict
from helpers.cashflow import calc_monthly_payment, get_monthly_payments, get_compound_curve
from helpers.preprocessing import process_features
from model.model import StatusModel

```
In [2]:
```df_3c = pd.read_csv('data/LoanStats3c_securev1.csv', header=True).iloc[:-2, :]
df_3b = pd.read_csv('data/LoanStats3b_securev1.csv', header=True).iloc[:-2, :]
df_raw = pd.concat((df_3c, df_3b), axis=0)

```
```

```
In [3]:
```df_3c.iloc[-1:,:][['id', 'loan_amnt', 'int_rate', 'term', 'sub_grade', 'annual_inc', 'issue_d', 'loan_status']]

```
Out[3]:
```

`calc_monthly_payment`

function in `cashflow.py`

, and here we suppose the loan amount was $1.

```
In [4]:
```print "Monthly payment:", calc_monthly_payment(loan_amnt=1, int_rate=0.1920, term=3)

```
```

This is the payment for one month. For the whole 36 months, we simply have a list of 36 payments.

```
In [5]:
```monthly_payments = np.array(get_monthly_payments(X_int_rate=np.array([0.1920]), date_range_length=36)[0])
print "Cashflow of monthly payments:\n", monthly_payments

```
```

```
In [6]:
```plt.figure(figsize=(18,6))
plt.bar(xrange(36), monthly_payments, alpha=0.25)
plt.xlim((0,36))
plt.ylim((0, 0.04))
plt.xlabel('Monthly payment by month', fontsize=13)

```
Out[6]:
```

We'll do the time value adjustment first as it's relatively straightforward. We'll be assuming that payments received are reinvested in a instrument that has the same interest. This is a very strong assumption, but we're able to do this as we're only making comparison between loans (and not, say, comparing loans against stocks).

If the interest rate is 19.20%, then the monthly interest is simply that figure divided by 12. If we had $1, then after one month it would increase by the monthly interest.

```
In [7]:
```print "Amount after 1 month:", (1 + 0.1920 / 12)

```
```

After two months, the amount would be 1.016 compounded again by the monthly interest.

```
In [8]:
```print "Amount after 2 months:", (1 + 0.1920 / 12) ** 2

```
```

For the whole 36 months, we'll use the `get_compound_curve`

function in `cashflow.py`

.

```
In [9]:
```compound_curve = np.array(get_compound_curve(X_compound_rate=np.array([0.1920]), date_range_length=36))[0]
print "Compound curve:\n", compound_curve

```
```

```
In [10]:
```plt.figure(figsize=(18,6))
plt.bar(xrange(36), compound_curve, alpha=0.25, color='m')
plt.xlim((0,36))
plt.ylim((0, 1.8))
plt.xlabel('Compound adjustment by month', fontsize=13)

```
Out[10]:
```

We now return to the loan that we looked into at very start.

```
In [11]:
```df_3c.iloc[-1:,:][['id', 'loan_amnt', 'int_rate', 'term', 'sub_grade', 'annual_inc', 'issue_d', 'loan_status']]

```
Out[11]:
```

This loan was issued in Jan 2014, and the loan status is current. Viewed from Jan 2015, this was 12 months ago. In our model, we assume that should the same loan be issued today, in 12 months' time (Jan 2016) it would have a loan status of current.

The loan status being current means that the probability we would be receiving this payment is 1. If the loan was not current, then we assume that the probability we would receive this payment is given by the schedule at the bottom of the link below:

https://www.lendingclub.com/info/demand-and-credit-profile.action

For example, if this loan has already defaulted, then there is only an 8% chance we would receive this payment. This probability of payment received would be our target to apply a Random Forest Regressor. Before doing so, we pre-process the data to fill in clean up the data and fill in missing values.

```
In [12]:
```df = process_features(df_raw)

```
```

`StatusModel`

class.

```
In [13]:
```model = RandomForestRegressor
parameters = {'n_estimators':100,
'max_depth':10}
features = ['loan_amnt', 'emp_length', 'monthly_inc', 'dti',
'fico', 'earliest_cr_line', 'open_acc', 'total_acc',
'revol_bal', 'revol_util', 'inq_last_6mths',
'delinq_2yrs', 'pub_rec', 'collect_12mths',
'last_delinq', 'last_record', 'last_derog',
'purpose_debt', 'purpose_credit', 'purpose_home',
'purpose_other', 'purpose_buy', 'purpose_biz',
'purpose_medic', 'purpose_car', 'purpose_move',
'purpose_vac', 'purpose_house', 'purpose_wed', 'purpose_energy',
'home_mortgage', 'home_rent', 'home_own',
'home_other', 'home_none', 'home_any']
grade_range = ['D']
date_range = ['Jan-2014']

```
In [14]:
```grade_dict = defaultdict(list)
for grade in grade_range:
for month in date_range:
df_select = df[(df['grade'].isin([grade]))
& (df['issue_d'].isin([month]))]
X = df_select[features].values
y = df_select['loan_status'].values
model = model(**parameters)
model.fit(X, y)
grade_dict[grade].append(model)
print grade, 'training completed...'

```
```

We now predict the status of our original loan after 12 months.

```
In [15]:
```df_select.iloc[-1:,:][['id', 'loan_amnt', 'int_rate', 'term', 'sub_grade', 'monthly_inc', 'issue_d', 'loan_status']]

```
Out[15]:
```

```
In [16]:
```X = df_select.iloc[-1:,:][features].values
print "Probability of receiving payment of loan 9199665 after 12 months:", model.predict(X)

```
```

What we've done so far is train our model on all loans of grade D issued in Jan 2014, and using our model, predict that should loan 9199665 be issued today, there would be a 97.4% of receiving the monthly payment in Jan 2016.

To get the probability of payment being received for the whole loan period, we repeat the process for 36 months. We'll be using the `get_expected_payout`

function inside `model.py`

.

```
In [17]:
```model = StatusModel(model=RandomForestRegressor,
parameters={'n_estimators':100,
'max_depth':10})
model.grade_range = ['D']
model.date_range = ['Dec-2014', 'Nov-2014', 'Oct-2014',
'Sep-2014', 'Aug-2014', 'Jul-2014',
'Jun-2014', 'May-2014', 'Apr-2014',
'Mar-2014', 'Feb-2014', 'Jan-2014',
'Dec-2013', 'Nov-2013', 'Oct-2013',
'Sep-2013', 'Aug-2013', 'Jul-2013',
'Jun-2013', 'May-2013', 'Apr-2013',
'Mar-2013', 'Feb-2013', 'Jan-2013',
'Dec-2012', 'Nov-2012', 'Oct-2012',
'Sep-2012', 'Aug-2012', 'Jul-2012',
'Jun-2012', 'May-2012', 'Apr-2012',
'Mar-2012', 'Feb-2012', 'Jan-2012']

```
In [18]:
```model.train_model(df)

```
```

```
In [19]:
```X_sub_grade = df_select.iloc[-1:,:]['sub_grade'].values
expected_payout = np.array(model.get_expected_payout(X, X_sub_grade))[0]
print "Expected payout:\n", expected_payout

```
```

```
In [20]:
```plt.figure(figsize=(18,6))
plt.bar(xrange(36), expected_payout, alpha=0.25, color='r')
plt.xlim((0,36))
plt.ylim((0, 1.0))
plt.xlabel('Expected payout by month', fontsize=13)

```
Out[20]:
```

`get_cashflows.py`

function inside `cashflow.py`

does.

```
In [21]:
```expected_cashflows = monthly_payments * compound_curve * expected_payout
print "Expected cashflows:\n", expected_cashflows

```
```

```
In [22]:
```plt.figure(figsize=(18,6))
plt.bar(xrange(36), expected_payout, alpha=0.25, color='g')
plt.xlim((0,36))
plt.ylim((0, 1.0))
plt.xlabel('Expected cashflow by month', fontsize=13)

```
Out[22]:
```

`calc_IRR`

function inside `cashflow.py`

can also be used for this calculation.

```
In [23]:
```rate_of_return = (np.sum(expected_cashflows))**(1/3.) - 1
print "Rate of return:", rate_of_return

```
```

`validation.ipynb`

discusses how well the model works.