01: Building a pandas Cheat Sheet, Part 1

Use the csv I've attached to answer the following questions

  • Import pandas with the right name
  • Set all graphics from matplotlib to display inline
  • Read the csv in (it should be UTF-8 already so you don't have to worry about encoding), save it with the proper boring name
  • Display the names of the columns in the csv
  • Display the first 3 animals.
  • Sort the animals to see the 3 longest animals.
  • What are the counts of the different values of the "animal" column? a.k.a. how many cats and how many dogs.
  • Only select the dogs.
  • Display all of the animals that are greater than 40 cm.
  • 'length' is the animal's length in cm. Create a new column called inches that is the length in inches.
  • Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."
  • Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe.
  • What's the mean length of a cat?
  • What's the mean length of a dog?
  • Use groupby to accomplish both of the above tasks at once.
  • Make a histogram of the length of dogs. I apologize that it is so boring.
  • Change your graphing style to be something else (anything else!)
  • Make a horizontal bar graph of the length of the animals, with their name as the label (look at the billionaires notebook I put on Slack!)
  • Make a sorted horizontal bar graph of the cats, with the larger cats on top.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


/usr/local/lib/python3.5/dist-packages/matplotlib/backends/backend_gtk3agg.py:18: UserWarning: The Gtk3Agg backend is known to not work on Python 3.x with pycairo. Try installing cairocffi.
  "The Gtk3Agg backend is known to not work on Python 3.x with pycairo. "

In [3]:
df=pd.read_csv("/home/sean/Downloads/07-hw-animals.csv")

In [4]:
print(df.columns.values)


['animal' 'name' 'length']

In [5]:
df.sort_values(by='length', ascending=False).head(3)


Out[5]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
1 cat Bob 45

In [6]:
df['animal'].value_counts()


Out[6]:
dog    3
cat    3
Name: animal, dtype: int64

In [7]:
df['animal']=='dog'
#this only gives a list of true/false values


Out[7]:
0    False
1    False
2     True
3     True
4    False
5     True
Name: animal, dtype: bool

In [8]:
dogs=df[df['animal']=='dog']
#this will give a datafram

In [9]:
dogs


Out[9]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
5 dog Fontaine 35

In [10]:
df['inches']=df['length']/2.54
df


Out[10]:
animal name length inches
0 cat Anne 35 13.779528
1 cat Bob 45 17.716535
2 dog Egglesburg 65 25.590551
3 dog Devon 50 19.685039
4 cat Charlie 32 12.598425
5 dog Fontaine 35 13.779528

In [11]:
cats=df[df['animal']=='cat']

In [12]:
cats[cats['inches']>12]


Out[12]:
animal name length inches
0 cat Anne 35 13.779528
1 cat Bob 45 17.716535
4 cat Charlie 32 12.598425

In [18]:
df[(df['animal']=='cat') & (df['inches']>12)]


Out[18]:
animal name length inches
0 cat Anne 35 13.779528
1 cat Bob 45 17.716535
4 cat Charlie 32 12.598425

In [19]:
df[df['animal']=='dog'].mean()


Out[19]:
length    50.000000
inches    19.685039
dtype: float64

In [20]:
df[df['animal']=='cat'].mean()


Out[20]:
length    37.333333
inches    14.698163
dtype: float64

In [23]:
df.groupby('animal').describe()


Out[23]:
inches length
animal
cat count 3.000000 3.000000
mean 14.698163 37.333333
std 2.679866 6.806859
min 12.598425 32.000000
25% 13.188976 33.500000
50% 13.779528 35.000000
75% 15.748031 40.000000
max 17.716535 45.000000
dog count 3.000000 3.000000
mean 19.685039 50.000000
std 5.905512 15.000000
min 13.779528 35.000000
25% 16.732283 42.500000
50% 19.685039 50.000000
75% 22.637795 57.500000
max 25.590551 65.000000

In [21]:
df.groupby('animal').mean()


Out[21]:
length inches
animal
cat 37.333333 14.698163
dog 50.000000 19.685039

In [24]:
dogs=df[df['animal']=='dog']

In [25]:
dogs['length'].hist()


Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c439ca470>

In [26]:
df['length'].plot(kind='bar')


Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c438d4da0>

In [27]:
df.plot(kind='bar', x='name', y='length')


Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c4182b198>

In [29]:
df.plot(kind='barh', x='name', y='length', legend=False)


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c4171e668>

In [33]:
df[df['animal']=='cat'].sort_values(by='length').plot(kind='barh', x='name', y='length')


Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c413dec88>

In [ ]: