Pandas Exercises

Time to test your new pandas skills! Use the two csv files in this folder to complete the tasks in bold below!

NOTE: ALL TASKS MUST BE DONE IN ONE LINE OF PANDAS CODE. GOT STUCK? NO PROBLEM! CHECK OUT THE SOLUTIONS LECTURE!

Import pandas and read in the banklist.csv file into a dataframe called banks.


In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('./banklist.csv')

Show the head of the dataframe


In [3]:
# CODE HERE
df.head()


Out[3]:
Bank Name City ST CERT Acquiring Institution Closing Date Updated Date
0 Fayette County Bank Saint Elmo IL 1802 United Fidelity Bank, fsb 26-May-17 1-Jun-17
1 Guaranty Bank, (d/b/a BestBank in Georgia & Mi... Milwaukee WI 30003 First-Citizens Bank & Trust Company 5-May-17 1-Jun-17
2 First NBC Bank New Orleans LA 58302 Whitney Bank 28-Apr-17 23-May-17
3 Proficio Bank Cottonwood Heights UT 35495 Cache Valley Bank 3-Mar-17 18-May-17
4 Seaway Bank and Trust Company Chicago IL 19328 State Bank of Texas 27-Jan-17 18-May-17

What are the column names?


In [4]:
# CODE HERE
df.columns


Out[4]:
Index(['Bank Name', 'City', 'ST', 'CERT', 'Acquiring Institution',
       'Closing Date', 'Updated Date'],
      dtype='object')

How many States (ST) are represented in this data set?


In [5]:
# CODE HERE
df['ST'].unique().shape[0]
# or
# len(df['ST'].unique())


Out[5]:
44

Get a list or array of all the states in the data set.


In [6]:
# CODE HERE
df['ST'].unique()


Out[6]:
array(['IL', 'WI', 'LA', 'UT', 'NJ', 'AR', 'GA', 'PA', 'TN', 'WA', 'CO',
       'PR', 'FL', 'MN', 'CA', 'MD', 'OK', 'OH', 'SC', 'VA', 'ID', 'TX',
       'CT', 'AZ', 'NV', 'NC', 'KY', 'MO', 'KS', 'AL', 'MI', 'IN', 'IA',
       'NE', 'MS', 'NM', 'OR', 'NY', 'MA', 'SD', 'WY', 'WV', 'NH', 'HI'], dtype=object)

What are the top 5 states with the most failed banks?


In [7]:
# CODE HERE
df['ST'].value_counts().head()


Out[7]:
GA    93
FL    75
IL    67
CA    41
MN    23
Name: ST, dtype: int64

What are the top 5 acquiring institutions?


In [8]:
# CODE HERE
df['Acquiring Institution'].value_counts().head()


Out[8]:
No Acquirer                            31
State Bank and Trust Company           12
First-Citizens Bank & Trust Company    11
Ameris Bank                            10
U.S. Bank N.A.                          9
Name: Acquiring Institution, dtype: int64

How many banks has the State Bank of Texas acquired? How many of them were actually in Texas?


In [9]:
# CODE HERE
df[df['Acquiring Institution'] == 'State Bank of Texas']


Out[9]:
Bank Name City ST CERT Acquiring Institution Closing Date Updated Date
4 Seaway Bank and Trust Company Chicago IL 19328 State Bank of Texas 27-Jan-17 18-May-17
21 The National Republic Bank of Chicago Chicago IL 916 State Bank of Texas 24-Oct-14 6-Jan-16
450 Millennium State Bank of Texas Dallas TX 57667 State Bank of Texas 2-Jul-09 26-Oct-12

In [10]:
len(df[df['Acquiring Institution'] == 'State Bank of Texas'])


Out[10]:
3

In [11]:
len(df[(df['Acquiring Institution'] == 'State Bank of Texas') & (df['ST'] == 'TX')])


Out[11]:
1

What is the most common city in California for a bank to fail in?


In [12]:
# CODE HERE
df[df['ST'] == 'CA']['City'].value_counts()[0]


Out[12]:
4

How many failed banks don't have the word "Bank" in their name?


In [13]:
# CODE HERE
len(df[~df['Bank Name'].str.contains('Bank')]['Bank Name'].unique())


Out[13]:
14

How many bank names start with the letter 's' ?


In [14]:
# CODE HERE
df['Bank Name'].apply(lambda name: name[0].lower() == 's').value_counts()


Out[14]:
False    498
True      53
Name: Bank Name, dtype: int64

How many CERT values are above 20000 ?


In [15]:
# CODE HERE
sum(df['CERT'] > 20000)


Out[15]:
417

How many bank names consist of just two words? (e.g. "First Bank" , "Bank Georgia" )


In [16]:
# CODE HERE
sum(df['Bank Name'].apply(lambda name: len(name.split()) == 2))


Out[16]:
114

Bonus: How many banks closed in the year 2008? (this is hard because we technically haven't learned about time series with pandas yet! Feel free to skip this one!


In [17]:
# CODE HERE
sum(df['Closing Date'].apply(lambda date: date[-2:] == '08'))


Out[17]:
25

GREAT JOB!