# Mechanics: How does the model work?

We'll be going through step-by-step how the rate of return is predicted. To start off, let's look at an loan example.

``````

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from collections import defaultdict
from helpers.cashflow import calc_monthly_payment, get_monthly_payments, get_compound_curve
from helpers.preprocessing import process_features
from model.model import StatusModel

``````
``````

In [2]:

df_raw = pd.concat((df_3c, df_3b), axis=0)

``````
``````

/Users/savarin/anaconda/envs/py27/lib/python2.7/site-packages/pandas/io/parsers.py:1139: DtypeWarning: Columns (0,19) have mixed types. Specify dtype option on import or set low_memory=False.
/Users/savarin/anaconda/envs/py27/lib/python2.7/site-packages/pandas/io/parsers.py:1139: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.

``````

Our training set is the set of 3-year loans issued between Jan 2012 and Dec 2014. The loan below has an interest rate of 19.20%, paid on an amortizing principal.

``````

In [3]:

df_3c.iloc[-1:,:][['id', 'loan_amnt', 'int_rate', 'term', 'sub_grade', 'annual_inc', 'issue_d', 'loan_status']]

``````
``````

Out[3]:

id
loan_amnt
int_rate
term
annual_inc
issue_d
loan_status

235628
9199665
10000
19.20%
36 months
D3
46000
Jan-2014
Current

``````

## Monthly payments

We can calculate the required monthly payment with the formula in the link below. This is the `calc_monthly_payment` function in `cashflow.py`, and here we suppose the loan amount was \$1.

http://en.wikipedia.org/wiki/Amortization_calculator

``````

In [4]:

print "Monthly payment:", calc_monthly_payment(loan_amnt=1, int_rate=0.1920, term=3)

``````
``````

Monthly payment: 0.0367572181573

``````

This is the payment for one month. For the whole 36 months, we simply have a list of 36 payments.

``````

In [5]:

monthly_payments = np.array(get_monthly_payments(X_int_rate=np.array([0.1920]), date_range_length=36)[0])

print "Cashflow of monthly payments:\n", monthly_payments

``````
``````

Cashflow of monthly payments:
[ 0.03675722  0.03675722  0.03675722  0.03675722  0.03675722  0.03675722
0.03675722  0.03675722  0.03675722  0.03675722  0.03675722  0.03675722
0.03675722  0.03675722  0.03675722  0.03675722  0.03675722  0.03675722
0.03675722  0.03675722  0.03675722  0.03675722  0.03675722  0.03675722
0.03675722  0.03675722  0.03675722  0.03675722  0.03675722  0.03675722
0.03675722  0.03675722  0.03675722  0.03675722  0.03675722  0.03675722]

``````

This is the list of payments the borrower would make if there was no default. For the expected rate of return, we would need to adjust these payments by risk of default and by the time value of money.

``````

In [6]:

plt.figure(figsize=(18,6))
plt.bar(xrange(36), monthly_payments, alpha=0.25)
plt.xlim((0,36))
plt.ylim((0, 0.04))
plt.xlabel('Monthly payment by month', fontsize=13)

``````
``````

Out[6]:

<matplotlib.text.Text at 0x1029d1950>

``````

## Compound curve

We'll do the time value adjustment first as it's relatively straightforward. We'll be assuming that payments received are reinvested in a instrument that has the same interest. This is a very strong assumption, but we're able to do this as we're only making comparison between loans (and not, say, comparing loans against stocks).

If the interest rate is 19.20%, then the monthly interest is simply that figure divided by 12. If we had \$1, then after one month it would increase by the monthly interest.

``````

In [7]:

print "Amount after 1 month:", (1 + 0.1920 / 12)

``````
``````

Amount after 1 month: 1.016

``````

After two months, the amount would be 1.016 compounded again by the monthly interest.

``````

In [8]:

print "Amount after 2 months:", (1 + 0.1920 / 12) ** 2

``````
``````

Amount after 2 months: 1.032256

``````

For the whole 36 months, we'll use the `get_compound_curve` function in `cashflow.py`.

``````

In [9]:

compound_curve = np.array(get_compound_curve(X_compound_rate=np.array([0.1920]), date_range_length=36))[0]

print "Compound curve:\n", compound_curve

``````
``````

Compound curve:
[ 1.74292933  1.71548162  1.68846617  1.66187615  1.63570487  1.60994574
1.58459226  1.55963805  1.53507682  1.51090238  1.48710865  1.46368961
1.44063938  1.41795215  1.39562219  1.37364389  1.3520117   1.33072018
1.30976396  1.28913775  1.26883637  1.2488547   1.22918769  1.20983041
1.19077796  1.17202555  1.15356846  1.13540202  1.11752168  1.09992291
1.08260129  1.06555245  1.0487721   1.032256    1.016       1.        ]

``````

In particular, note that there would be no compound adjustment at the final month because we would be at final month or maturity of the loan.

``````

In [10]:

plt.figure(figsize=(18,6))
plt.bar(xrange(36), compound_curve, alpha=0.25, color='m')

plt.xlim((0,36))
plt.ylim((0, 1.8))

``````
``````

Out[10]:

<matplotlib.text.Text at 0x103a4eed0>

``````

## Expected payout

We now return to the loan that we looked into at very start.

``````

In [11]:

df_3c.iloc[-1:,:][['id', 'loan_amnt', 'int_rate', 'term', 'sub_grade', 'annual_inc', 'issue_d', 'loan_status']]

``````
``````

Out[11]:

id
loan_amnt
int_rate
term
annual_inc
issue_d
loan_status

235628
9199665
10000
19.20%
36 months
D3
46000
Jan-2014
Current

``````

This loan was issued in Jan 2014, and the loan status is current. Viewed from Jan 2015, this was 12 months ago. In our model, we assume that should the same loan be issued today, in 12 months' time (Jan 2016) it would have a loan status of current.

The loan status being current means that the probability we would be receiving this payment is 1. If the loan was not current, then we assume that the probability we would receive this payment is given by the schedule at the bottom of the link below:

https://www.lendingclub.com/info/demand-and-credit-profile.action

For example, if this loan has already defaulted, then there is only an 8% chance we would receive this payment. This probability of payment received would be our target to apply a Random Forest Regressor. Before doing so, we pre-process the data to fill in clean up the data and fill in missing values.

``````

In [12]:

df = process_features(df_raw)

``````
``````

helpers/preprocessing.py:138: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
'Charged Off': 0.})
helpers/preprocessing.py:140: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['int_rate'] = df['int_rate'].map(lambda x: float(str(x).strip('%')) / 100)
helpers/preprocessing.py:142: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['term'] = df['term'].map(lambda x: int(str(x).strip(' months')))
helpers/preprocessing.py:146: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['emp_length'] = df['emp_length'].map(lambda x: '0.5 years' if x == '< 1 year' else x)
helpers/preprocessing.py:147: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['emp_length'] = df['emp_length'].map(lambda x: '10 years' if x == '10+ years' else x)
helpers/preprocessing.py:148: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['emp_length'] = df['emp_length'].map(lambda x: '-1 years' if x == 'n/a' else x)
helpers/preprocessing.py:149: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['emp_length'] = df['emp_length'].map(lambda x: float(x.strip(' years')))
helpers/preprocessing.py:154: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['emp_length'] = df['emp_length'].map(lambda x: emp_length_mean if x < 0 else x)
helpers/preprocessing.py:156: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['annual_inc'] = df['annual_inc'].map(lambda x: float(x) / 12)
/Users/savarin/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.py:2302: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
**kwargs)
helpers/preprocessing.py:159: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['fico_range_low'] = (df['fico_range_low'] + df['fico_range_high']) / 2.
helpers/preprocessing.py:167: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
else x['earliest_cr_line'], axis=1)
helpers/preprocessing.py:171: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
'%b-%Y')).days / 30, axis=1)
helpers/preprocessing.py:175: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['revol_util'] = df['revol_util'].map(lambda x: float(str(x).strip('%')) / 100)
helpers/preprocessing.py:180: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['revol_util'] = df['revol_util'].fillna(revol_util_mean)
helpers/preprocessing.py:185: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['last_delinq'] = df['last_delinq'].map(lambda x: -1 if x == 'n/a' else x)
helpers/preprocessing.py:187: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['last_record'] = df['last_record'].map(lambda x: -1 if x == 'n/a' else x)
helpers/preprocessing.py:189: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df['last_derog'] = df['last_derog'].map(lambda x: -1 if x == 'n/a' else x)

``````

Since this loan is of grade D, we train our model on all loans of grade D issued in Jan 2014. This involves going into the details of our `StatusModel` class.

``````

In [13]:

model = RandomForestRegressor
parameters = {'n_estimators':100,
'max_depth':10}

features = ['loan_amnt', 'emp_length', 'monthly_inc', 'dti',
'fico', 'earliest_cr_line', 'open_acc', 'total_acc',
'revol_bal', 'revol_util', 'inq_last_6mths',
'delinq_2yrs', 'pub_rec', 'collect_12mths',
'last_delinq', 'last_record', 'last_derog',
'purpose_debt', 'purpose_credit', 'purpose_home',
'purpose_medic', 'purpose_car', 'purpose_move',
'purpose_vac', 'purpose_house', 'purpose_wed', 'purpose_energy',
'home_mortgage', 'home_rent', 'home_own',
'home_other', 'home_none', 'home_any']

date_range = ['Jan-2014']

``````
``````

In [14]:

for month in date_range:
& (df['issue_d'].isin([month]))]

X = df_select[features].values
y = df_select['loan_status'].values

model = model(**parameters)
model.fit(X, y)

``````
``````

D training completed...

``````

We now predict the status of our original loan after 12 months.

``````

In [15]:

df_select.iloc[-1:,:][['id', 'loan_amnt', 'int_rate', 'term', 'sub_grade', 'monthly_inc', 'issue_d', 'loan_status']]

``````
``````

Out[15]:

id
loan_amnt
int_rate
term
monthly_inc
issue_d
loan_status

235628
9199665
10000
0.192
36
D3
3833.333333
Jan-2014
1

``````
``````

In [16]:

X = df_select.iloc[-1:,:][features].values

print "Probability of receiving payment of loan 9199665 after 12 months:", model.predict(X)

``````
``````

Probability of receiving payment of loan 9199665 after 12 months: [ 0.9735888]

``````

What we've done so far is train our model on all loans of grade D issued in Jan 2014, and using our model, predict that should loan 9199665 be issued today, there would be a 97.4% of receiving the monthly payment in Jan 2016.

To get the probability of payment being received for the whole loan period, we repeat the process for 36 months. We'll be using the `get_expected_payout` function inside `model.py`.

``````

In [17]:

model = StatusModel(model=RandomForestRegressor,
parameters={'n_estimators':100,
'max_depth':10})

model.date_range = ['Dec-2014', 'Nov-2014', 'Oct-2014',
'Sep-2014', 'Aug-2014', 'Jul-2014',
'Jun-2014', 'May-2014', 'Apr-2014',
'Mar-2014', 'Feb-2014', 'Jan-2014',
'Dec-2013', 'Nov-2013', 'Oct-2013',
'Sep-2013', 'Aug-2013', 'Jul-2013',
'Jun-2013', 'May-2013', 'Apr-2013',
'Mar-2013', 'Feb-2013', 'Jan-2013',
'Dec-2012', 'Nov-2012', 'Oct-2012',
'Sep-2012', 'Aug-2012', 'Jul-2012',
'Jun-2012', 'May-2012', 'Apr-2012',
'Mar-2012', 'Feb-2012', 'Jan-2012']

``````
``````

In [18]:

model.train_model(df)

``````
``````

D training completed...

``````
``````

In [19]:

print "Expected payout:\n", expected_payout

``````
``````

Expected payout:
[ 0.99178735  0.98364215  0.97556384  0.96755188  0.95960571  0.95172481
0.94390862  0.93615663  0.92846831  0.92084312  0.91328056  0.90578011
0.89834125  0.89096349  0.88364632  0.87638924  0.86919176  0.8620534
0.85497365  0.84795206  0.84098812  0.83408138  0.82723136  0.8204376
0.81369964  0.80701701  0.80038926  0.79381594  0.78729661  0.78083082
0.77441813  0.7680581   0.76175031  0.75549432  0.74928971  0.74313606]

``````
``````

In [20]:

plt.figure(figsize=(18,6))
plt.bar(xrange(36), expected_payout, alpha=0.25, color='r')

plt.xlim((0,36))
plt.ylim((0, 1.0))
plt.xlabel('Expected payout by month', fontsize=13)

``````
``````

Out[20]:

<matplotlib.text.Text at 0x10b6c5610>

``````

## Rate of return

For the final step, we multiply the monthly payment by the compound adjustment to account for time value, and by the expected payout to account for risk of default. This is also what the `get_cashflows.py` function inside `cashflow.py` does.

``````

In [21]:

expected_cashflows = monthly_payments * compound_curve * expected_payout

print "Expected cashflows:\n", expected_cashflows

``````
``````

Expected cashflows:
[ 0.06353909  0.06202487  0.06054673  0.05910382  0.0576953   0.05632034
0.05497815  0.05366795  0.05238897  0.05114047  0.04992172  0.04873202
0.04757067  0.046437    0.04533034  0.04425006  0.04319552  0.04216611
0.04116124  0.04018031  0.03922276  0.03828803  0.03737557  0.03648486
0.03561538  0.03476661  0.03393808  0.03312929  0.03233977  0.03156907
0.03081674  0.03008233  0.02936543  0.02866561  0.02798247  0.02731561]

``````
``````

In [22]:

plt.figure(figsize=(18,6))
plt.bar(xrange(36), expected_payout, alpha=0.25, color='g')

plt.xlim((0,36))
plt.ylim((0, 1.0))
plt.xlabel('Expected cashflow by month', fontsize=13)

``````
``````

Out[22]:

<matplotlib.text.Text at 0x10e139310>

``````

To get the rate of return, we simply add up all the cashflows, take the cube root, and finally subtract 1. The `calc_IRR` function inside `cashflow.py` can also be used for this calculation.

``````

In [23]:

rate_of_return = (np.sum(expected_cashflows))**(1/3.) - 1

print "Rate of return:", rate_of_return

``````
``````

Rate of return: 0.156624220874

``````

To conclude, our model predicts that loan 9199665 has an expected rate of return of 15.66%, based of the headline Lending Club rate of 19.20%. The next notebook `validation.ipynb` discusses how well the model works.