SBA Repay: Predicting Loan Defaults for Small Business Loans

Context
Problem
Exploratory Data Analysis and Cleaning - 7a Loan Report
Adding Macroeconomic Indicators
Grouping Businesses into Industry Sectors
Examining State Information
Encoding Categorical Variables

1. Context

Small Business Administration (SBA) is a government agency that has delivered millions of loans, loan guarantees, contracts, counseling sessions and other forms of assistance to small businesses since 1953.

SBA offers a variety of loan programs for very specific purposes - the most common program is the general small business loans (7(a) program). SBA does NOT provide grants for starting and expanding a business - it provides guarantees on loans that are distributed by lending institutions, thus reducing the risk to those institutions.

7(a) loans have a maximum loan amount of \$5 million. The average 7(a) loan amount in fiscal year 2015 was \$371,628. The actual interest rate for a 7(a) loan guaranteed by the SBA is negotiated between the applicant and lender and subject to the SBA maximums plus some spread, and not available for the data analyzed here.

SBA can guarantee as much as 85 percent on loans of up to \$150,000 and 75 percent on loans of more than \$150,000. SBA’s maximum exposure amount is \$3,750,000.

2. Problem

SMall businesses play a huge role in the U.S. economy - accounting for more than 50% of all active jobs, and contributing to more than 60% of all new jobs since 1995.

SBA provides assistance to small businesses to help them start new operations, expand existing ones, or recover after natural disasters. However (as will be shown below) a significant portion of SBA-backed loans default. This prompts the following questions:

Can we predict which loans will be repaid?
What are the factors that are predictive of default?

3. Exploratory Data Analysis and Cleaning - 7a Loan Report

Data - Part 1.

As a part of the "Open Government" policy, SBA has released a loan-level list of all 7(a) loans approved since October 1990 (Fiscal Year 1991 - present). These are released as multiple Excel files. We will first load these using Pandas and then save them for easier/quicker access later.



In [1]:

    
# Import necessary Python packages 

# Data analysis tools
import numpy as np
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta

# Plotting tools and figure display options
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import graphviz 
sns.set(context = 'poster', style = 'white')
sns.set(font_scale = 1.3)



In [2]:

    
# Load and concatenate the data (3 parts)
#loans_7a_part1 = pd.read_excel('FOIA - 7(a)(FY1991-FY1999).xlsx', skiprows = 1)
#loans_7a_part2 = pd.read_excel('FOIA - 7(a)(FY2000-FY2009).xlsx', skiprows = 1)
#loans_7a_part3 = pd.read_excel('FOIA - 7(a)(FY2010-Present).xlsx', skiprows = 1)

#loans_7a = pd.concat([loans_7a_part1, loans_7a_part2, loans_7a_part3])



In [3]:

    
# Save for quicker access later
#loans_7a.to_pickle('loans_7a')



In [4]:

    
# Read in the raw data 
loans_7a = pd.read_pickle('loans_7a')



In [5]:

    
# Examine the data
loans_7a.columns









    Out[5]:





Index(['Program', 'BorrName', 'BorrStreet', 'BorrCity', 'BorrState', 'BorrZip',
       'BankName', 'BankStreet', 'BankCity', 'BankState', 'BankZip',
       'GrossApproval', 'SBAGuaranteedApproval', 'ApprovalDate',
       'ApprovalFiscalYear', 'FirstDisbursementDate', 'DeliveryMethod',
       'subpgmdesc', 'InitialInterestRate', 'TermInMonths', 'NaicsCode',
       'NaicsDescription', 'FranchiseCode', 'FranchiseName', 'ProjectCounty',
       'ProjectState', 'SBADistrictOffice', 'CongressionalDistrict',
       'BusinessType', 'LoanStatus', 'ChargeOffDate', 'GrossChargeOffAmount',
       'RevolverStatus', 'JobsSupported'],
      dtype='object')



In [6]:

    
loans_7a.describe()









    Out[6]:







  
    
      
      BorrZip
      GrossApproval
      SBAGuaranteedApproval
      ApprovalFiscalYear
      InitialInterestRate
      TermInMonths
      NaicsCode
      FranchiseCode
      CongressionalDistrict
      GrossChargeOffAmount
      RevolverStatus
      JobsSupported
    
  
  
    
      count
      1.444020e+06
      1.444020e+06
      1.444020e+06
      1.444020e+06
      453262.000000
      1.444020e+06
      1.241195e+06
      1.444020e+06
      1.433569e+06
      1.444020e+06
      1.444020e+06
      1.444020e+06
    
    
      mean
      5.225015e+04
      2.404600e+05
      1.768696e+05
      2.005233e+03
      6.259969
      1.097951e+02
      5.214967e+05
      2.526164e+03
      1.037184e+01
      1.251509e+04
      2.570678e-01
      7.198887e+00
    
    
      std
      3.150702e+04
      4.292367e+05
      3.263411e+05
      6.912574e+00
      1.398681
      7.708278e+01
      1.739744e+05
      1.186865e+04
      1.133274e+01
      6.849460e+04
      4.370172e-01
      2.974551e+01
    
    
      min
      0.000000e+00
      0.000000e+00
      0.000000e+00
      1.991000e+03
      0.000000
      0.000000e+00
      1.111100e+05
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
    
    
      25%
      2.240600e+04
      3.500000e+04
      2.000000e+04
      2.000000e+03
      5.450000
      6.000000e+01
      4.244600e+05
      0.000000e+00
      3.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
    
    
      50%
      5.370400e+04
      9.850000e+04
      6.035000e+04
      2.006000e+03
      6.000000
      8.400000e+01
      5.311100e+05
      1.000000e+00
      6.000000e+00
      0.000000e+00
      0.000000e+00
      2.000000e+00
    
    
      75%
      8.090700e+04
      2.500000e+05
      1.825192e+05
      2.011000e+03
      6.990000
      1.200000e+02
      6.244100e+05
      1.000000e+00
      1.300000e+01
      0.000000e+00
      1.000000e+00
      7.000000e+00
    
    
      max
      9.999900e+04
      5.000000e+06
      5.250000e+06
      2.017000e+03
      56.000000
      8.470000e+02
      9.999900e+05
      9.308600e+04
      5.300000e+01
      3.512596e+06
      1.000000e+00
      9.500000e+03



In [7]:

    
# See how many values are missing in each column
loans_7a.isnull().sum()









    Out[7]:





Program                        0
BorrName                      52
BorrStreet                   132
BorrCity                      25
BorrState                     28
BorrZip                        0
BankName                       0
BankStreet                  2046
BankCity                    2046
BankState                   2108
BankZip                     2096
GrossApproval                  0
SBAGuaranteedApproval          0
ApprovalDate                   0
ApprovalFiscalYear             0
FirstDisbursementDate     194718
DeliveryMethod                30
subpgmdesc                     0
InitialInterestRate       990758
TermInMonths                   0
NaicsCode                 202825
NaicsDescription          236519
FranchiseCode                  0
FranchiseName            1351534
ProjectCounty                656
ProjectState                  58
SBADistrictOffice             25
CongressionalDistrict      10451
BusinessType                  77
LoanStatus                     0
ChargeOffDate            1250443
GrossChargeOffAmount           0
RevolverStatus                 0
JobsSupported                  0
dtype: int64

We can see that majority of loans are missing information about the interest rate. NaicsCode is missing for 202825 loans.



In [8]:

    
# Examine the first 5 rows
loans_7a.head(5)









    Out[8]:







  
    
      
      Program
      BorrName
      BorrStreet
      BorrCity
      BorrState
      BorrZip
      BankName
      BankStreet
      BankCity
      BankState
      ...
      ProjectCounty
      ProjectState
      SBADistrictOffice
      CongressionalDistrict
      BusinessType
      LoanStatus
      ChargeOffDate
      GrossChargeOffAmount
      RevolverStatus
      JobsSupported
    
  
  
    
      0
      7A
      TECHNIGRAPHICS
      1880 PARKWAY BLVD.
      WEST VALLEY
      UT
      84119
      Bank of the West
      180 Montgomery St
      SAN FRANCISCO
      CA
      ...
      SALT LAKE
      UT
      UTAH DISTRICT OFFICE
      4.0
      PARTNERSHIP
      PIF
      NaT
      0
      0
      0
    
    
      1
      7A
      RUTH M. AND RONALD E. RIVARD
      36 CABOT ROAD
      MERRIMACK
      NH
      3054
      MISSINGMAINBANKID
      NaN
      NaN
      NaN
      ...
      HILLSBOROUGH
      NH
      NEW HAMPSHIRE DISTRICT OFFICE
      1.0
      INDIVIDUAL
      CANCLD
      NaT
      0
      0
      0
    
    
      2
      7A
      FOUR SEASONS PLANT CTR., INC.
      222 S. POPE ST.
      BRECKENRIDGE
      MN
      56520
      Bank of the West
      180 Montgomery St
      SAN FRANCISCO
      CA
      ...
      WILKIN
      MN
      MINNESOTA DISTRICT OFFICE
      7.0
      CORPORATION
      PIF
      NaT
      0
      0
      0
    
    
      3
      7A
      Michael Davis
      P O BOX 1457
      FOND DU LAC
      WI
      54936
      U.S. Bank National Association
      425 Walnut St
      Cincinnati
      OH
      ...
      FOND DU LAC
      WI
      MILWAUKEE
      6.0
      CORPORATION
      PIF
      NaT
      0
      0
      0
    
    
      4
      7A
      ROBINS INSURANCE AGENCY, INC.
      30 BURTON HILLS BLVD.
      NASHVILLE
      TN
      37215
      SunTrust Bank
      25 Park Place NE
      ATLANTA
      GA
      ...
      DAVIDSON
      TN
      TENNESSEE DISTRICT OFFICE
      5.0
      CORPORATION
      PIF
      NaT
      0
      0
      0
    
  

5 rows × 34 columns

Our target variable could be Loan Status (for binary classification) or Charged Off Amount (for continuos prediction/regression).

We may choose to limit the data to only:

Loans that have been completed (paid in full or defaulted/charged off), or
Completed loans that have matured?



In [9]:

    
# See how many loans there are in the data set per each status
loans_7a['LoanStatus'].value_counts()
sns.countplot(x = 'LoanStatus', data = loans_7a)









    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x1361114e0>

PIF = paid in full
CHGOFF = charged off / defaulted
CANCLD = cancelled
COMMIT = commited but not disbursed
EXEMPT = exempt, usually loans that are still in the process of being paid off

It is important to keep in mind that the countplot displayed above includes loans that may have been paid in full before they were due, while as some loans that were awarded at the same time may eventually default.

Potentially useful features:

Borrower Location (State)
Gross Approval - the total amount a loan is approved for
SBA Guaranteed Approval - the amount that is guaranteed by SBA (up to 85% of 2.)
Approval Fiscal Year (1991 - 2017)
Term In Months - loan duration
NAICS Code - North American Industry Classification System code
Franchise Code
Business Type - individual, partnership, corporation
Revolver Status
Jobs Supported



In [10]:

    
# Remove irrelevant columns
loans_7a.drop(['Program', 'BorrName', 'BorrStreet', 'BorrCity', 'BorrZip', 'BankStreet',  'BankCity', 'BankState',                
                'BankZip', 'NaicsDescription', 'FranchiseName', 'ProjectCounty', 'ProjectState',
               'SBADistrictOffice', 'CongressionalDistrict', 'BankName', 'DeliveryMethod', 'subpgmdesc'], axis = 1, inplace = True)



In [11]:

    
loans_7a.columns









    Out[11]:





Index(['BorrState', 'GrossApproval', 'SBAGuaranteedApproval', 'ApprovalDate',
       'ApprovalFiscalYear', 'FirstDisbursementDate', 'InitialInterestRate',
       'TermInMonths', 'NaicsCode', 'FranchiseCode', 'BusinessType',
       'LoanStatus', 'ChargeOffDate', 'GrossChargeOffAmount', 'RevolverStatus',
       'JobsSupported'],
      dtype='object')



In [12]:

    
# Since initial interest rate is missing in 990758 rows, remove it as well
loans_7a.drop('InitialInterestRate', axis = 1, inplace = True)

Let's check the correlation beetween the total approval amount and guaranteed amount.



In [13]:

    
loans_7a['GrossApproval'].corr(loans_7a['SBAGuaranteedApproval'])









    Out[13]:





0.99203049820361355

Since these two quantitites are highly correlated we will drop the gross approval amount.



In [14]:

    
loans_7a.drop('GrossApproval', axis = 1, inplace = True)



In [15]:

    
loans_7a['BusinessType'].value_counts()
sns.countplot(x = 'BusinessType', data = loans_7a)









    Out[15]:





<matplotlib.axes._subplots.AxesSubplot at 0x109bd6470>

Most businesses receiving SBA loans are corporations, but a large portion are individuals, too.

Let's examine how many jobs these businesses support.



In [16]:

    
loans_7a['JobsSupported'].value_counts().head()









    Out[16]:





0    517358
2    125538
1    103506
3     92584
4     89419
Name: JobsSupported, dtype: int64



In [17]:

    
loans_7a['JobsSupported'].value_counts()[0]/len(loans_7a)









    Out[17]:





0.35827620116064873

About 36% of businesses are non-employers! Jobs Supported = "Total Jobs Created + Jobs Retained as reported by lender on SBA Loan Application. SBA does not review, audit, or validate these numbers - they are simply self-reported, good faith estimates by the lender."



In [18]:

    
loans_7a['FranchiseCode'].value_counts().head(10)









    Out[18]:





1        745759
0        605775
78760      4725
68020      2603
50564      1269
21780      1245
25650      1129
17998       943
24850       718
22470       685
Name: FranchiseCode, dtype: int64

Franchise code is 0 or 1 for non-franchises and a 5-digit code otherwise!



In [19]:

    
# Percent of franchises
(loans_7a['FranchiseCode'] > 1).sum()/len(loans_7a)*100









    Out[19]:





6.4047589368568305

Only 6.4% of approved loans were given to franchises!



In [20]:

    
# Turn franchise code into a binary variable
def binarizeFranCode(code):
    if code > 1:
        return 1
    else:
        return 0

loans_7a['FranchiseCodeBin'] = loans_7a['FranchiseCode'].apply(binarizeFranCode)
print(loans_7a['FranchiseCodeBin'].value_counts())
loans_7a.drop('FranchiseCode', axis = 1, inplace = True)









    



0    1351534
1      92486
Name: FranchiseCodeBin, dtype: int64

The North American Industry Classification System (NAICS) classifies business establishments based on the activities in which they are primarily engaged.



In [21]:

    
loans_7a['NaicsCode'].nunique()









    Out[21]:





1380

There are more than 1300 different NAICS values in this dataset!



In [22]:

    
loans_7a['NaicsCode'].isnull().sum()









    Out[22]:





202825



In [23]:

    
# Remove rows with missing NAICS Codes
loans_7a.dropna(subset = ['NaicsCode'], inplace = True)



In [24]:

    
loans_7a.reset_index(drop = True, inplace = True)



In [25]:

    
# Replace CHGOFF with DEFAULT for simplicity 
def repCHGOFF_DEFAULT(status):
    if status == 'CHGOFF':
        return 'DEFAULT'
    else:
        return status
loans_7a['LoanStatus'] = loans_7a['LoanStatus'].apply(repCHGOFF_DEFAULT)

4. Adding Macroeconomic Indicators

As a an indicator of economic state at the time a loan is awarded, we will use S&P 1500 composite index. Source: http://www.econ.yale.edu/~shiller/data.htm



In [26]:

    
# Load monthly consumer price index and composite S&P index data
cpi_sp = pd.read_excel('ie_data.xls', sep = ' ', skiprows=7)
# Use only values starting in 1990
cpi_sp  = cpi_sp[cpi_sp['Date']>1989]



In [27]:

    
# Keep only the relevant columns: date (year+month), S&P 1500 index and Consumer Price Index (CPI)
cpi_sp = cpi_sp[['Date', 'P', 'CPI']]



In [28]:

    
# Rename the columns
cpi_sp.columns = ['Date', 'SP', 'CPI']
cpi_sp.head()



In [29]:

    
# Change the format of the date from year.month to datetime (and deal with issues of .10 getting shortented to .1)
def split_date(x):
    tmp = str(x).split('.')
    if tmp[1] == '1':
        tmp[1] = '10'
    return tmp
cpi_sp['Date_dt'] = cpi_sp['Date'].apply(split_date)
s = '-'
cpi_sp['Date_dt'] = cpi_sp['Date_dt'].apply(lambda x: s.join(x))
cpi_sp['Date_dt'] = cpi_sp['Date_dt'].apply(lambda x: x + '-01')
cpi_sp.reset_index()
cpi_sp.head(10)



In [30]:

    
cpi_sp.drop('Date', axis = 1, inplace = True)
# Rename the columns
cpi_sp.columns = ['SP', 'CPI', 'Date']



In [31]:

    
# Adjust S&P based on CPI, relative to January 2016 US Dollar value
cpi2016 = float(cpi_sp[cpi_sp['Date'] == '2016-01-01']['CPI'])
cpi_sp['CPI_to2016'] = cpi2016/cpi_sp['CPI']
cpi_sp['SP_to2016'] = cpi_sp['SP'].multiply(cpi_sp['CPI_to2016'])
cpi_sp.drop(['CPI', 'SP'], axis = 1, inplace = True)
cpi_sp.reset_index(inplace = True, drop = True)
cpi_sp.head()



In [32]:

    
# Plot monhtly S&P (adjusted to 2016)
fig = plt.figure(figsize = (10,5))
axes = fig.add_subplot(1, 1, 1, facecolor ='gainsboro')
axes.plot(pd.to_datetime(cpi_sp['Date']), cpi_sp['SP_to2016'], c = 'k', linewidth = 3)
plt.xlabel('Year', fontsize = 20)
plt.ylabel('S&P 1500', fontsize = 20)
plt.xticks(fontsize = 20)
plt.yticks(fontsize = 20)
plt.tight_layout()
plt.show()



In [33]:

    
fig.savefig('1_SP1500.png', dpi = 300)



In [34]:

    
# Use S&P value relative to loan's approval date
cpi_sp.columns = [['ApprovalDate', 'CPI_to2016', 'SP_to2016']]

# Merge loan data with CPI + S&P data 
# Keep only the date and set it to first of the month in order to use CPI and SP data
loans_7a['ApprovalDate'] = loans_7a['ApprovalDate'].apply(lambda x:x.date())
loans_7a['ApprovalDate'] = loans_7a['ApprovalDate'].apply(lambda x: x.replace(day = 1)) 
loans_7a['ApprovalDate']  = pd.to_datetime(loans_7a['ApprovalDate'])
cpi_sp['ApprovalDate'] = pd.to_datetime(cpi_sp['ApprovalDate'])

We may be tempted to incorporate changes in S&P index during the loan pay-off period into the model; however,

we do not have information about this period for paid in full loans (end data is only indicated for charged off loans), and
doing so would make the model unusable for future loans (at any given point in time, we cannot predict future changes in S&P Index)



In [35]:

    
loans_7a = pd.merge(loans_7a, cpi_sp, on = 'ApprovalDate', how = 'left')



In [36]:

    
loans_7a.head()









    Out[36]:







  
    
      
      BorrState
      SBAGuaranteedApproval
      ApprovalDate
      ApprovalFiscalYear
      FirstDisbursementDate
      TermInMonths
      NaicsCode
      BusinessType
      LoanStatus
      ChargeOffDate
      GrossChargeOffAmount
      RevolverStatus
      JobsSupported
      FranchiseCodeBin
      CPI_to2016
      SP_to2016
    
  
  
    
      0
      UT
      671360
      1990-10-01
      1991
      1990-10-31
      120
      323121.0
      PARTNERSHIP
      PIF
      NaT
      0
      0
      0
      0
      1.77465
      545.031
    
    
      1
      NH
      108000
      1990-10-01
      1991
      NaT
      240
      421830.0
      INDIVIDUAL
      CANCLD
      NaT
      0
      0
      0
      0
      1.77465
      545.031
    
    
      2
      FL
      170000
      1990-10-01
      1991
      1991-01-31
      60
      624310.0
      CORPORATION
      PIF
      NaT
      0
      0
      0
      0
      1.77465
      545.031
    
    
      3
      SD
      116200
      1990-10-01
      1991
      1991-01-31
      180
      442210.0
      INDIVIDUAL
      PIF
      NaT
      0
      0
      0
      0
      1.77465
      545.031
    
    
      4
      IL
      280500
      1990-10-01
      1991
      1990-10-31
      84
      421990.0
      CORPORATION
      PIF
      NaT
      0
      0
      0
      0
      1.77465
      545.031



In [37]:

    
# Check for NaNs
loans_7a['SP_to2016'].isnull().sum()









    Out[37]:





0

Adjust all dollar amounts to 2016 US Dollar value



In [38]:

    
# Adjust all dollar amounts to 2016 US Dollar value 
loans_7a['SBAGuaranteedApprovalAdj'] = loans_7a['SBAGuaranteedApproval'].multiply(loans_7a['CPI_to2016']).astype('float64')
loans_7a['GrossChargeOffAmountAdj'] = loans_7a['GrossChargeOffAmount'].multiply(loans_7a['CPI_to2016']).astype('float64')



In [39]:

    
# Get a sense of number of each Loan Status per approval year
clrs    = ["#4878CF", "#6ACC65", "#D65F5F", "#B47CC7", "#C4AD66", "#77BEDB"]
newPal   = dict(PIF = clrs[0], DEFAULT = clrs[2], CANCLD = clrs[1], COMMIT = clrs[3], EXEMPT = clrs[4])


g = sns.factorplot(x ='ApprovalFiscalYear', y = 'SBAGuaranteedApprovalAdj',
            hue = 'LoanStatus', hue_order = ['PIF', 'DEFAULT', 'CANCLD', 'COMMIT', 'EXEMPT'],
                   kind = 'point', data = loans_7a, estimator = np.count_nonzero, 
               ci = None, size = 5, aspect = 3, palette = newPal, legend = False)

g.set_xticklabels(rotation = 30)

sns.plt.title('7A Loans')
sns.plt.xlabel('Approval Fiscal Year')
sns.plt.ylabel('Number of Loans')
sns.plt.legend(['Paid in Full', 'Defaulted', 'Cancelled', 'Committed', 'Exempt'], loc = 'best', fontsize = 20)









    Out[39]:





<matplotlib.legend.Legend at 0x10eea5f98>



In [40]:

    
g.savefig('2_AllLoans_Counts_FY.png', dpi = 300)

We see a clear spike in defaults for loans approved immediately before and during the U.S. financial crisis. This indicates that macroeconomic factors may be important.



In [41]:

    
# Group data by approval fiscal year 
loans_7a_by_FY = loans_7a.groupby('ApprovalFiscalYear')



In [42]:

    
# Get a sense of the number of each Loan Status per approval year

g = sns.factorplot(x ='ApprovalFiscalYear', y = 'SBAGuaranteedApprovalAdj',
            hue = 'LoanStatus', hue_order = ['PIF', 'DEFAULT', 'CANCLD', 'COMMIT', 'EXEMPT'],
                   kind = 'bar', data = loans_7a, estimator = np.count_nonzero, 
               ci = None, size = 5, aspect = 3, palette = newPal, legend = False)

g.set_xticklabels(rotation = 30)

sns.plt.title('7A Loans')
sns.plt.xlabel('Approval Fiscal Year')
sns.plt.ylabel('Number of Loans')
sns.plt.legend(['Paid in Full', 'Defaulted', 'Cancelled', 'Committed', 'Exempt'], loc = 'best', fontsize = 20)









    Out[42]:





<matplotlib.legend.Legend at 0x122031da0>

The plot bellow shows the actual amounts paid in full and charged off each year.



In [43]:

    
# Group by year and loan status 
loans_7a_by_FY_by_LoanStatus = loans_7a.groupby(by = ['ApprovalFiscalYear', 'LoanStatus'], as_index = False).sum()
loans_7a_by_FY_by_LoanStatus.head()









    Out[43]:







  
    
      
      ApprovalFiscalYear
      LoanStatus
      SBAGuaranteedApproval
      TermInMonths
      NaicsCode
      GrossChargeOffAmount
      RevolverStatus
      JobsSupported
      FranchiseCodeBin
      SBAGuaranteedApprovalAdj
      GrossChargeOffAmountAdj
    
  
  
    
      0
      1991
      CANCLD
      161133219
      115877
      4.423785e+08
      0
      1
      0
      45
      2.820334e+08
      0.000000e+00
    
    
      1
      1991
      DEFAULT
      175799656
      87780
      4.045433e+08
      108218977
      0
      0
      38
      3.083094e+08
      1.897405e+08
    
    
      2
      1991
      EXEMPT
      10037760
      7599
      2.839959e+07
      0
      0
      0
      2
      1.756865e+07
      0.000000e+00
    
    
      3
      1991
      PIF
      1148843333
      860704
      3.107813e+09
      0
      3
      0
      235
      2.011165e+09
      0.000000e+00
    
    
      4
      1992
      CANCLD
      192405480
      132171
      5.085422e+08
      0
      4
      0
      40
      3.269076e+08
      0.000000e+00



In [44]:

    
# Plot amount paid in full and amount charged off per approval year 
fig = plt.figure(figsize = (16, 6))
ax = fig.add_subplot(1, 1, 1, facecolor ='gainsboro')

n_groups = loans_7a['ApprovalFiscalYear'].nunique()

sum_pif = loans_7a_by_FY_by_LoanStatus[loans_7a_by_FY_by_LoanStatus['LoanStatus']=='PIF']['SBAGuaranteedApprovalAdj']

sum_choff = loans_7a_by_FY_by_LoanStatus[loans_7a_by_FY_by_LoanStatus['LoanStatus']== 'DEFAULT']['GrossChargeOffAmountAdj']

x = np.arange(1991, 2017)

y = np.array(sum_pif)[:-1]
z = np.array(sum_choff)

ax.bar(x-0.4, y,width=0.5,color='b',align='center')
ax.bar(x, z,width=0.5,color='r',align='center')
plt.xlabel('Approval Fiscal Year', fontsize = 20)
plt.xticks(loans_7a['ApprovalFiscalYear'].unique(), rotation = 30)
plt.ylabel('2016 US Dollars', fontsize = 20)
plt.title('7A Loans', fontsize = 20)
plt.legend(labels = ['Paid in Full', 'Defaulted'], fontsize = 20)
plt.tight_layout()
plt.show()



In [45]:

    
fig.savefig('3_Completed_per_FY.png', dpi = 300)

The plot above may be too optimistic since it accounts for loans that were paid in full before they were due, and does not account for loans that have been awarded but may default in the future.



In [46]:

    
# Look at Maturity Date and only use loans whose maturity data has pased
loans_7a['MaturityDate'] = loans_7a['ApprovalDate'] + loans_7a['TermInMonths'].values.astype("timedelta64[M]")
loans_7a['MaturityYear'] = loans_7a['MaturityDate'].dt.year
loans_7a_matured = loans_7a[loans_7a['MaturityDate'] < '2017-07-01']
loans_7a_matured.reset_index(inplace = True, drop = True)



In [47]:

    
loans_7a_matured['MaturityYear'].isnull().sum()









    Out[47]:





0



In [48]:

    
loans_7a_matured.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 755660 entries, 0 to 755659
Data columns (total 20 columns):
BorrState                   755658 non-null object
SBAGuaranteedApproval       755660 non-null int64
ApprovalDate                755660 non-null datetime64[ns]
ApprovalFiscalYear          755660 non-null int64
FirstDisbursementDate       663929 non-null datetime64[ns]
TermInMonths                755660 non-null int64
NaicsCode                   755660 non-null float64
BusinessType                755619 non-null object
LoanStatus                  755660 non-null object
ChargeOffDate               158154 non-null datetime64[ns]
GrossChargeOffAmount        755660 non-null int64
RevolverStatus              755660 non-null int64
JobsSupported               755660 non-null int64
FranchiseCodeBin            755660 non-null int64
CPI_to2016                  755660 non-null object
SP_to2016                   755660 non-null object
SBAGuaranteedApprovalAdj    755660 non-null float64
GrossChargeOffAmountAdj     755660 non-null float64
MaturityDate                755660 non-null datetime64[ns]
MaturityYear                755660 non-null int64
dtypes: datetime64[ns](4), float64(3), int64(8), object(5)
memory usage: 115.3+ MB



In [49]:

    
# Plot amount paid in full and amount charged off per approval year when accounting for matured loans only 

n_groups = loans_7a_matured['ApprovalFiscalYear'].nunique()

loans_7a_by_AY_by_LoanStatus = loans_7a_matured.groupby(by = ['ApprovalFiscalYear', 'LoanStatus'], as_index = False).sum()

sum_pif = loans_7a_by_AY_by_LoanStatus[loans_7a_by_AY_by_LoanStatus['LoanStatus']=='PIF']['SBAGuaranteedApprovalAdj']

sum_def = loans_7a_by_AY_by_LoanStatus[loans_7a_by_AY_by_LoanStatus['LoanStatus']=='DEFAULT']['GrossChargeOffAmountAdj']



In [50]:

    
fig = plt.figure(figsize = (16, 6))
ax = fig.add_subplot(1, 1, 1, facecolor = "gainsboro")
sns.set_context('poster', font_scale = 1.2)


x = np.arange(1991, 2017)

y = np.array(sum_pif)[:-1]
z = np.array(sum_def)

ax.bar(x-0.4, y,width=0.5,color='b',align='center')
ax.bar(x, z,width=0.5,color='r',align='center')
plt.xlabel('Approval Fiscal Year')
plt.xticks(loans_7a_matured['ApprovalFiscalYear'].unique(), rotation = 30)
plt.ylabel('2016 US Dollars', fontsize = 20)
plt.title('SBA 7A Loans (Matured)', fontsize = 20)
plt.legend(labels = ['Paid in Full', 'Defaulted'], fontsize = 20)
plt.tight_layout()
plt.show()



In [51]:

    
fig.savefig('4_Matured_per_FY.png', dpi = 300)



In [52]:

    
loans_7a_matured['MaturityYear'].unique()









    Out[52]:





array([2000, 2010, 1995, 2005, 1997, 1991, 1992, 1996, 2003, 2015, 2008,
       2004, 1998, 2009, 1999, 2014, 2006, 1993, 2001, 1994, 2012, 1990,
       2016, 2002, 2007, 2011, 2013, 2017])



In [53]:

    
loans_7a_matured['NaicsCode'].nunique()









    Out[53]:





1338



In [54]:

    
# Missing NAICS codes?
loans_7a_matured['NaicsCode'].isnull().sum()/len(loans_7a_matured)









    Out[54]:





0.0

5. Grouping Businesses into Industry Sectors

NAICS codes may be grouped into 20 broad business sectors.



In [55]:

    
code_groups = {'Agriculture': 11, 'Mining' : 21, 'Utilities' : 22,
'Construction' : 23, 'Manufacturing' : [31, 32, 33], 'Wholesale Trade' : 42,
'Retail Trade' : [44, 45], 'Transportation & Warehousing' : [48, 49],
'Information' : 51, 'Finance & Insurance' : 52, 
'Real Estate Rental & Leasing' : 53, 
'Professional, Scientific, & Technical Services' : 54,
'Management of Companies & Enterprises': 55,
'Administrative/Support/Waste Management' : 56,
'Educational Services' : 61,
'Health Care & Social Assistance' : 62,
'Arts, Entertainment, & Recreation' : 71,
'Accommodation & Food Services' : 72,
'Other Services' : 81,
'Public Administration' : 92}



In [56]:

    
# Get the "Sector" variable from NAICS codes

# Only the first two digits of NAICS code are relevant to group into 20 sectors
loans_7a_matured['Sector'] = (loans_7a_matured['NaicsCode'].apply(lambda x:x/10000))
loans_7a_matured['Sector'].fillna(0, inplace = True)
loans_7a_matured['Sector'] = (loans_7a_matured['Sector'].apply(int))


code_groups.values()

tmp = 0
list_codes = []
list_cat = []
for i in range(len(code_groups.values())):
    val = list(code_groups.values())[i]
    if type(val) == int:
        list_codes.append(val)
        list_cat.append(list(code_groups.keys())[i])
        
    else:
        for j in range(len(val)):
            list_codes.append(val[j])
            list_cat.append(list(code_groups.keys())[i])
        
code_cat = pd.DataFrame(list_codes, columns = ['Code'])
code_cat['Sector'] = list_cat
code_cat

cols = list(loans_7a_matured.columns)
cols[-1] = 'Code'

loans_7a_matured.columns = cols
loans_7a_matured = pd.merge(loans_7a_matured, code_cat, on = 'Code', how = 'left')
loans_7a_matured.head()









    



/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/anaconda/lib/python3.6/site-packages/pandas/core/generic.py:3660: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy






    Out[56]:







  
    
      
      BorrState
      SBAGuaranteedApproval
      ApprovalDate
      ApprovalFiscalYear
      FirstDisbursementDate
      TermInMonths
      NaicsCode
      BusinessType
      LoanStatus
      ChargeOffDate
      ...
      JobsSupported
      FranchiseCodeBin
      CPI_to2016
      SP_to2016
      SBAGuaranteedApprovalAdj
      GrossChargeOffAmountAdj
      MaturityDate
      MaturityYear
      Code
      Sector
    
  
  
    
      0
      UT
      671360
      1990-10-01
      1991
      1990-10-31
      120
      323121.0
      PARTNERSHIP
      PIF
      NaT
      ...
      0
      0
      1.77465
      545.031
      1.191430e+06
      0.0
      2000-09-30 10:12:00
      2000
      32
      Manufacturing
    
    
      1
      NH
      108000
      1990-10-01
      1991
      NaT
      240
      421830.0
      INDIVIDUAL
      CANCLD
      NaT
      ...
      0
      0
      1.77465
      545.031
      1.916624e+05
      0.0
      2010-09-30 20:24:00
      2010
      42
      Wholesale Trade
    
    
      2
      FL
      170000
      1990-10-01
      1991
      1991-01-31
      60
      624310.0
      CORPORATION
      PIF
      NaT
      ...
      0
      0
      1.77465
      545.031
      3.016908e+05
      0.0
      1995-10-01 05:06:00
      1995
      62
      Health Care & Social Assistance
    
    
      3
      SD
      116200
      1990-10-01
      1991
      1991-01-31
      180
      442210.0
      INDIVIDUAL
      PIF
      NaT
      ...
      0
      0
      1.77465
      545.031
      2.062145e+05
      0.0
      2005-09-30 15:18:00
      2005
      44
      Retail Trade
    
    
      4
      IL
      280500
      1990-10-01
      1991
      1990-10-31
      84
      421990.0
      CORPORATION
      PIF
      NaT
      ...
      0
      0
      1.77465
      545.031
      4.977898e+05
      0.0
      1997-09-30 16:44:24
      1997
      42
      Wholesale Trade
    
  

5 rows × 22 columns



In [57]:

    
loans_7a_matured['Sector'].isnull().sum()









    Out[57]:





7



In [58]:

    
loans_7a_matured.dropna(subset = ['Sector'], inplace = True)



In [59]:

    
loans_7a_matured.reset_index(inplace = True, drop=True)



In [60]:

    
# Keep only paid in full or defaulted loans
loans_7a_matured = loans_7a_matured[(loans_7a_matured['LoanStatus']=='PIF') | 
                                                             (loans_7a_matured['LoanStatus']=='DEFAULT')]



In [61]:

    
#Group matured loans by industry sector
by_sector = loans_7a_matured.groupby('Sector').sum()
by_sector.head()









    Out[61]:







  
    
      
      SBAGuaranteedApproval
      ApprovalFiscalYear
      TermInMonths
      NaicsCode
      GrossChargeOffAmount
      RevolverStatus
      JobsSupported
      FranchiseCodeBin
      SBAGuaranteedApprovalAdj
      GrossChargeOffAmountAdj
      MaturityYear
      Code
    
    
      Sector
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Accommodation & Food Services
      6004734098
      119267043
      4768187
      4.296205e+10
      1817428021
      11963
      751018
      12721
      7.498952e+09
      2.196629e+09
      119646986
      4283424
    
    
      Administrative/Support/Waste Management
      1950027704
      67847434
      2331275
      1.901068e+10
      436594187
      13673
      321284
      1929
      2.450614e+09
      5.266983e+08
      68032112
      1895320
    
    
      Agriculture
      1409228337
      16150594
      803146
      9.125816e+08
      176757041
      932
      22335
      7
      2.023685e+09
      2.379037e+08
      16215308
      88803
    
    
      Arts, Entertainment, & Recreation
      1320386479
      26932229
      1103091
      9.590061e+09
      307113658
      3250
      86409
      1296
      1.797776e+09
      3.855993e+08
      27020250
      954453
    
    
      Construction
      4759155784
      131253767
      4354561
      1.551183e+10
      1096864025
      29646
      462572
      537
      6.208953e+09
      1.345566e+09
      131598158
      1506316



In [62]:

    
# Look at total guaranteed and total charged off amount per sector
fig = plt.figure(figsize=(16,6))
axes = fig.add_subplot(1, 1, 1, facecolor = 'gainsboro')
axes.plot(by_sector['SBAGuaranteedApprovalAdj'], by_sector['GrossChargeOffAmountAdj'], 'ro')
plt.xlabel('Total SBA Guaranteed Amount per Sector \n(2016 US Dollars)')
plt.ylabel('Total Charged Off Amount per Sectos \n(2016 US Dollars)')
plt.tight_layout()

Not surprising: industry sectors that receive more help, also default on higher amounts!



In [63]:

    
by_sector.reset_index(inplace = True)



In [64]:

    
# Look at percent defaulted (in terms of US $) per sector
by_sector['Percent Defaulted'] = (by_sector['GrossChargeOffAmountAdj']/ by_sector['SBAGuaranteedApprovalAdj'])*100



In [65]:

    
# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(8, 7))

pco = by_sector.sort_values("Percent Defaulted", ascending=False)

# Plot the total % defaulted (in dollars)
sns.set_color_codes("pastel")
sns.barplot(y = "Sector", x = "Percent Defaulted", data=pco, palette="Reds_d")
ax.set_xlabel('Percent Defaulted  ($)')
ax.set_ylabel('')

ax.set_title('Industry Sector')









    Out[65]:





<matplotlib.text.Text at 0x13a291e48>



In [66]:

    
# Number of loans per sector
by_sector_ct = loans_7a_matured.groupby('Sector').agg(lambda ts: (ts > 0).sum()) 
by_sector_ct.reset_index(inplace = True)



In [67]:

    
# Look at percent defaulted per sector (in terms of the number of loans)

by_sector_ct['Percent Defaulted'] = (by_sector_ct['GrossChargeOffAmountAdj']/ by_sector_ct['SBAGuaranteedApprovalAdj'])*100



In [68]:

    
# Look at percent defaulted per sector (in terms of the total number of loans)

# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(8, 7))

pco = by_sector_ct.sort_values("Percent Defaulted", ascending=False)

# plot the total percent defaulted (in number of loans)
sns.set_color_codes("pastel")
sns.barplot(y="Sector", x= "Percent Defaulted", data=pco, palette="Reds_d")
ax.set_xlabel('Percent Defaulted  (# of Loans)')

ax.set_ylabel('')
ax.set_title('Industry Sector')









    Out[68]:





<matplotlib.text.Text at 0x11c427710>



In [69]:

    
f.savefig('Sector.png', dpi = 300)

6. Examining State Information



In [70]:

    
# Examine state information (borrower location)

borr_states = set(loans_7a_matured['BorrState'].unique())

states = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]

#US teritories 
borr_states.difference(states)

#Count all the "odd" ones (not one of 50 states)
for s in borr_states.difference(states):
   print((loans_7a_matured['BorrState'] == s).sum())

#PR has 9834 loans, all others < 1000

print((loans_7a_matured['BorrState']).isnull().sum())



In [71]:

    
# Number of missing values for states
print((loans_7a_matured['BorrState']).isnull().sum())


# Drop the territories except for PR

for s in {'FM', 'MP', 'GU', 'MH', 'PW', 'AE', 'VI', 'AS', 'TQ'}:
     loans_7a_matured = loans_7a_matured[loans_7a_matured['BorrState'] != s]   


# Drop missing values
loans_7a_matured = loans_7a_matured[loans_7a_matured['BorrState'].notnull()]



In [72]:

    
loans_7a_matured.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 650263 entries, 0 to 755647
Data columns (total 22 columns):
BorrState                   650263 non-null object
SBAGuaranteedApproval       650263 non-null int64
ApprovalDate                650263 non-null datetime64[ns]
ApprovalFiscalYear          650263 non-null int64
FirstDisbursementDate       648495 non-null datetime64[ns]
TermInMonths                650263 non-null int64
NaicsCode                   650263 non-null float64
BusinessType                650227 non-null object
LoanStatus                  650263 non-null object
ChargeOffDate               158036 non-null datetime64[ns]
GrossChargeOffAmount        650263 non-null int64
RevolverStatus              650263 non-null int64
JobsSupported               650263 non-null int64
FranchiseCodeBin            650263 non-null int64
CPI_to2016                  650263 non-null object
SP_to2016                   650263 non-null object
SBAGuaranteedApprovalAdj    650263 non-null float64
GrossChargeOffAmountAdj     650263 non-null float64
MaturityDate                650263 non-null datetime64[ns]
MaturityYear                650263 non-null int64
Code                        650263 non-null int64
Sector                      650263 non-null object
dtypes: datetime64[ns](4), float64(3), int64(9), object(6)
memory usage: 114.1+ MB

7. Encoding Categorical Variables



In [73]:

    
#One-hot encoding: state, year approved, category, business type, loan tatus
loans_7a_matured = pd.concat([loans_7a_matured, pd.get_dummies(loans_7a_matured['BorrState'], drop_first=True)], axis  = 1)
loans_7a_matured = pd.concat([loans_7a_matured, pd.get_dummies(loans_7a_matured['Sector'], drop_first=True)], axis  = 1)
loans_7a_matured = pd.concat([loans_7a_matured, pd.get_dummies(loans_7a_matured['LoanStatus'], drop_first = True)], axis  = 1)
loans_7a_matured = pd.concat([loans_7a_matured, pd.get_dummies(loans_7a_matured['BusinessType'], drop_first = True)], axis  = 1)
loans_7a_matured = pd.concat([loans_7a_matured, pd.get_dummies(loans_7a_matured['ApprovalFiscalYear'], drop_first = True)], axis  = 1)
loans_7a_matured.reset_index(inplace = True, drop = True)



In [74]:

    
list(loans_7a_matured.columns)









    Out[74]:





['BorrState',
 'SBAGuaranteedApproval',
 'ApprovalDate',
 'ApprovalFiscalYear',
 'FirstDisbursementDate',
 'TermInMonths',
 'NaicsCode',
 'BusinessType',
 'LoanStatus',
 'ChargeOffDate',
 'GrossChargeOffAmount',
 'RevolverStatus',
 'JobsSupported',
 'FranchiseCodeBin',
 'CPI_to2016',
 'SP_to2016',
 'SBAGuaranteedApprovalAdj',
 'GrossChargeOffAmountAdj',
 'MaturityDate',
 'MaturityYear',
 'Code',
 'Sector',
 'AL',
 'AR',
 'AZ',
 'CA',
 'CO',
 'CT',
 'DC',
 'DE',
 'FL',
 'GA',
 'HI',
 'IA',
 'ID',
 'IL',
 'IN',
 'KS',
 'KY',
 'LA',
 'MA',
 'MD',
 'ME',
 'MI',
 'MN',
 'MO',
 'MS',
 'MT',
 'NC',
 'ND',
 'NE',
 'NH',
 'NJ',
 'NM',
 'NV',
 'NY',
 'OH',
 'OK',
 'OR',
 'PA',
 'PR',
 'RI',
 'SC',
 'SD',
 'TN',
 'TX',
 'UT',
 'VA',
 'VT',
 'WA',
 'WI',
 'WV',
 'WY',
 'Administrative/Support/Waste Management',
 'Agriculture',
 'Arts, Entertainment, & Recreation',
 'Construction',
 'Educational Services',
 'Finance & Insurance',
 'Health Care & Social Assistance',
 'Information',
 'Management of Companies & Enterprises',
 'Manufacturing',
 'Mining',
 'Other Services',
 'Professional, Scientific, & Technical Services',
 'Public Administration',
 'Real Estate Rental & Leasing',
 'Retail Trade',
 'Transportation & Warehousing',
 'Utilities',
 'Wholesale Trade',
 'PIF',
 'INDIVIDUAL',
 'PARTNERSHIP',
 1992,
 1993,
 1994,
 1995,
 1996,
 1997,
 1998,
 1999,
 2000,
 2001,
 2002,
 2003,
 2004,
 2005,
 2006,
 2007,
 2008,
 2009,
 2010,
 2011,
 2012,
 2013,
 2014,
 2015,
 2016,
 2017]



In [75]:

    
# Drop those columns that we won't use in the (binary classification) model
loans_7a_matured.drop(['BorrState', 'SBAGuaranteedApproval', 'ApprovalDate', 'FirstDisbursementDate',
 'NaicsCode', 'BusinessType', 'LoanStatus', 'ChargeOffDate', 'GrossChargeOffAmount',
 'CPI_to2016','GrossChargeOffAmountAdj', 'MaturityDate','MaturityYear', 'Code', 'Sector', ], axis = 1, inplace = True)



In [76]:

    
#Save for later use
pd.to_pickle(loans_7a_matured, 'loans_7a_matured')

	Date	SP	CPI
1416	1989.01	285.4	121.1
1417	1989.02	294	121.6
1418	1989.03	292.7	122.3
1419	1989.04	302.3	123.1
1420	1989.05	313.9	123.8

	Date	SP	CPI	Date_dt
1416	1989.01	285.4	121.1	1989-01-01
1417	1989.02	294	121.6	1989-02-01
1418	1989.03	292.7	122.3	1989-03-01
1419	1989.04	302.3	123.1	1989-04-01
1420	1989.05	313.9	123.8	1989-05-01
1421	1989.06	323.7	124.1	1989-06-01
1422	1989.07	331.9	124.4	1989-07-01
1423	1989.08	346.6	124.6	1989-08-01
1424	1989.09	347.3	125	1989-09-01
1425	1989.10	347.4	125.6	1989-10-01

	Date	CPI_to2016	SP_to2016
0	1989-01-01	1.95637	558.347
1	1989-02-01	1.94832	572.807
2	1989-03-01	1.93717	567.01
3	1989-04-01	1.92458	581.801
4	1989-05-01	1.9137	600.71

	BorrZip	GrossApproval	SBAGuaranteedApproval	ApprovalFiscalYear	InitialInterestRate	TermInMonths	NaicsCode	FranchiseCode	CongressionalDistrict	GrossChargeOffAmount	RevolverStatus	JobsSupported
count	1.444020e+06	1.444020e+06	1.444020e+06	1.444020e+06	453262.000000	1.444020e+06	1.241195e+06	1.444020e+06	1.433569e+06	1.444020e+06	1.444020e+06	1.444020e+06
mean	5.225015e+04	2.404600e+05	1.768696e+05	2.005233e+03	6.259969	1.097951e+02	5.214967e+05	2.526164e+03	1.037184e+01	1.251509e+04	2.570678e-01	7.198887e+00
std	3.150702e+04	4.292367e+05	3.263411e+05	6.912574e+00	1.398681	7.708278e+01	1.739744e+05	1.186865e+04	1.133274e+01	6.849460e+04	4.370172e-01	2.974551e+01
min	0.000000e+00	0.000000e+00	0.000000e+00	1.991000e+03	0.000000	0.000000e+00	1.111100e+05	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
25%	2.240600e+04	3.500000e+04	2.000000e+04	2.000000e+03	5.450000	6.000000e+01	4.244600e+05	0.000000e+00	3.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
50%	5.370400e+04	9.850000e+04	6.035000e+04	2.006000e+03	6.000000	8.400000e+01	5.311100e+05	1.000000e+00	6.000000e+00	0.000000e+00	0.000000e+00	2.000000e+00
75%	8.090700e+04	2.500000e+05	1.825192e+05	2.011000e+03	6.990000	1.200000e+02	6.244100e+05	1.000000e+00	1.300000e+01	0.000000e+00	1.000000e+00	7.000000e+00
max	9.999900e+04	5.000000e+06	5.250000e+06	2.017000e+03	56.000000	8.470000e+02	9.999900e+05	9.308600e+04	5.300000e+01	3.512596e+06	1.000000e+00	9.500000e+03

	Program	BorrName	BorrStreet	BorrCity	BorrState	BorrZip	BankName	BankStreet	BankCity	BankState	...	ProjectCounty	ProjectState	SBADistrictOffice	CongressionalDistrict	BusinessType	LoanStatus	ChargeOffDate
0	7A	TECHNIGRAPHICS	1880 PARKWAY BLVD.	WEST VALLEY	UT	84119	Bank of the West	180 Montgomery St	SAN FRANCISCO	CA	...	SALT LAKE	UT	UTAH DISTRICT OFFICE	4.0	PARTNERSHIP	PIF	NaT
1	7A	RUTH M. AND RONALD E. RIVARD	36 CABOT ROAD	MERRIMACK	NH	3054	MISSINGMAINBANKID	NaN	NaN	NaN	...	HILLSBOROUGH	NH	NEW HAMPSHIRE DISTRICT OFFICE	1.0	INDIVIDUAL	CANCLD	NaT
2	7A	FOUR SEASONS PLANT CTR., INC.	222 S. POPE ST.	BRECKENRIDGE	MN	56520	Bank of the West	180 Montgomery St	SAN FRANCISCO	CA	...	WILKIN	MN	MINNESOTA DISTRICT OFFICE	7.0	CORPORATION	PIF	NaT
3	7A	Michael Davis	P O BOX 1457	FOND DU LAC	WI	54936	U.S. Bank National Association	425 Walnut St	Cincinnati	OH	...	FOND DU LAC	WI	MILWAUKEE	6.0	CORPORATION	PIF	NaT
4	7A	ROBINS INSURANCE AGENCY, INC.	30 BURTON HILLS BLVD.	NASHVILLE	TN	37215	SunTrust Bank	25 Park Place NE	ATLANTA	GA	...	DAVIDSON	TN	TENNESSEE DISTRICT OFFICE	5.0	CORPORATION	PIF	NaT

	BorrState	SBAGuaranteedApproval	ApprovalDate	ApprovalFiscalYear	FirstDisbursementDate	TermInMonths	NaicsCode	BusinessType	LoanStatus	ChargeOffDate	CPI_to2016	SP_to2016
0	UT	671360	1990-10-01	1991	1990-10-31	120	323121.0	PARTNERSHIP	PIF	NaT	1.77465	545.031
1	NH	108000	1990-10-01	1991	NaT	240	421830.0	INDIVIDUAL	CANCLD	NaT	1.77465	545.031
2	FL	170000	1990-10-01	1991	1991-01-31	60	624310.0	CORPORATION	PIF	NaT	1.77465	545.031
3	SD	116200	1990-10-01	1991	1991-01-31	180	442210.0	INDIVIDUAL	PIF	NaT	1.77465	545.031
4	IL	280500	1990-10-01	1991	1990-10-31	84	421990.0	CORPORATION	PIF	NaT	1.77465	545.031

	ApprovalFiscalYear	LoanStatus	SBAGuaranteedApproval	TermInMonths	NaicsCode	GrossChargeOffAmount	RevolverStatus	FranchiseCodeBin	SBAGuaranteedApprovalAdj	GrossChargeOffAmountAdj
0	1991	CANCLD	161133219	115877	4.423785e+08	0	1	45	2.820334e+08	0.000000e+00
1	1991	DEFAULT	175799656	87780	4.045433e+08	108218977	0	38	3.083094e+08	1.897405e+08
2	1991	EXEMPT	10037760	7599	2.839959e+07	0	0	2	1.756865e+07	0.000000e+00
3	1991	PIF	1148843333	860704	3.107813e+09	0	3	235	2.011165e+09	0.000000e+00
4	1992	CANCLD	192405480	132171	5.085422e+08	0	4	40	3.269076e+08	0.000000e+00

	SBAGuaranteedApproval	ApprovalFiscalYear	TermInMonths	NaicsCode	GrossChargeOffAmount	RevolverStatus	JobsSupported	FranchiseCodeBin	SBAGuaranteedApprovalAdj	GrossChargeOffAmountAdj	MaturityYear	Code
Sector
Accommodation & Food Services	6004734098	119267043	4768187	4.296205e+10	1817428021	11963	751018	12721	7.498952e+09	2.196629e+09	119646986	4283424
Administrative/Support/Waste Management	1950027704	67847434	2331275	1.901068e+10	436594187	13673	321284	1929	2.450614e+09	5.266983e+08	68032112	1895320
Agriculture	1409228337	16150594	803146	9.125816e+08	176757041	932	22335	7	2.023685e+09	2.379037e+08	16215308	88803
Arts, Entertainment, & Recreation	1320386479	26932229	1103091	9.590061e+09	307113658	3250	86409	1296	1.797776e+09	3.855993e+08	27020250	954453
Construction	4759155784	131253767	4354561	1.551183e+10	1096864025	29646	462572	537	6.208953e+09	1.345566e+09	131598158	1506316