Time to test your new pandas skills! Use the two csv files in this folder to complete the tasks in bold below!
NOTE: ALL TASKS MUST BE DONE IN ONE LINE OF PANDAS CODE. GOT STUCK? NO PROBLEM! CHECK OUT THE SOLUTIONS LECTURE!
Import pandas and read in the banklist.csv file into a dataframe called banks.
In [1]:
import pandas as pd
In [2]:
df = pd.read_csv('./banklist.csv')
Show the head of the dataframe
In [3]:
# CODE HERE
df.head()
Out[3]:
What are the column names?
In [4]:
# CODE HERE
df.columns
Out[4]:
How many States (ST) are represented in this data set?
In [5]:
# CODE HERE
df['ST'].unique().shape[0]
# or
# len(df['ST'].unique())
Out[5]:
Get a list or array of all the states in the data set.
In [6]:
# CODE HERE
df['ST'].unique()
Out[6]:
What are the top 5 states with the most failed banks?
In [7]:
# CODE HERE
df['ST'].value_counts().head()
Out[7]:
What are the top 5 acquiring institutions?
In [8]:
# CODE HERE
df['Acquiring Institution'].value_counts().head()
Out[8]:
How many banks has the State Bank of Texas acquired? How many of them were actually in Texas?
In [9]:
# CODE HERE
df[df['Acquiring Institution'] == 'State Bank of Texas']
Out[9]:
In [10]:
len(df[df['Acquiring Institution'] == 'State Bank of Texas'])
Out[10]:
In [11]:
len(df[(df['Acquiring Institution'] == 'State Bank of Texas') & (df['ST'] == 'TX')])
Out[11]:
What is the most common city in California for a bank to fail in?
In [12]:
# CODE HERE
df[df['ST'] == 'CA']['City'].value_counts()[0]
Out[12]:
How many failed banks don't have the word "Bank" in their name?
In [13]:
# CODE HERE
len(df[~df['Bank Name'].str.contains('Bank')]['Bank Name'].unique())
Out[13]:
How many bank names start with the letter 's' ?
In [14]:
# CODE HERE
df['Bank Name'].apply(lambda name: name[0].lower() == 's').value_counts()
Out[14]:
How many CERT values are above 20000 ?
In [15]:
# CODE HERE
sum(df['CERT'] > 20000)
Out[15]:
How many bank names consist of just two words? (e.g. "First Bank" , "Bank Georgia" )
In [16]:
# CODE HERE
sum(df['Bank Name'].apply(lambda name: len(name.split()) == 2))
Out[16]:
Bonus: How many banks closed in the year 2008? (this is hard because we technically haven't learned about time series with pandas yet! Feel free to skip this one!
In [17]:
# CODE HERE
sum(df['Closing Date'].apply(lambda date: date[-2:] == '08'))
Out[17]: