Building a pandas Cheat Sheet, Part 1
Import pandas with the right name
In [11]:
import pandas as pd
In [12]:
df = pd.read_csv("07-hw-animals.csv")
Set all graphics from matplotlib to display inline
In [13]:
#!pip install matplotlib
In [14]:
import matplotlib.pyplot as plt
%matplotlib inline
#This lets your graph show you in your notebook
In [15]:
df
Out[15]:
Display the names of the columns in the csv
In [16]:
df['name']
Out[16]:
Display the first 3 animals.
In [17]:
df.head(3)
Out[17]:
Sort the animals to see the 3 longest animals.
In [18]:
df.sort_values('length', ascending=False).head(3)
Out[18]:
What are the counts of the different values of the "animal" column? a.k.a. how many cats and how many dogs.
In [19]:
df['animal'].value_counts()
Out[19]:
Only select the dogs.
In [20]:
dog_df = df['animal'] == 'dog'
df[dog_df]
Out[20]:
Display all of the animals that are greater than 40 cm.
In [21]:
long_animals = df['length'] > 40
df[long_animals]
Out[21]:
In [22]:
df['length_inches'] = df['length'] / 2.54
df
Out[22]:
Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."
In [23]:
cats = df['animal'] == 'cat'
dogs = df['animal'] == 'dog'
Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe.
In [24]:
long_animals = df['length_inches'] > 12
df[cats & long_animals]
Out[24]:
In [25]:
df[(df['length_inches'] > 12) & (df['animal'] == 'cat')]
#Amazing!
Out[25]:
In [26]:
df[cats].mean()
Out[26]:
In [27]:
df[dogs].mean()
Out[27]:
Use groupby to accomplish both of the above tasks at once.
In [28]:
df.groupby('animal').mean()
#groupby
Out[28]:
Make a histogram of the length of dogs. I apologize that it is so boring.
In [29]:
df[dogs].plot.hist(y='length_inches')
Out[29]:
Change your graphing style to be something else (anything else!)
In [30]:
df[dogs].plot.bar(x='name', y='length_inches')
Out[30]:
Make a horizontal bar graph of the length of the animals, with their name as the label (look at the billionaires notebook I put on Slack!)
In [31]:
df[dogs].plot.barh(x='name', y='length_inches')
#Fontaine is such an annoying name for a dog
Out[31]:
Make a sorted horizontal bar graph of the cats, with the larger cats on top.
In [34]:
df[cats].sort(['length_inches'], ascending=False).plot(kind='barh', x='name', y='length_inches')
#df[df['animal']] == 'cat'].sort_values(by='length).plot(kind='barh', x='name', y='length', legend=False)
Out[34]:
In [ ]: