01: Building a pandas Cheat Sheet, Part 1
Use the csv I've attached to answer the following questions Import pandas with the right name
In [1]:
# !workon dataanalysis
import pandas as pd
Having matplotlib play nice with virtual environments
The matplotlib library has some issues when you’re using a Python 3 virtual environment. The error looks like this:
RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are Working with Matplotlib in a virtual enviroment see ‘Working with Matplotlib in Virtual environments’ in the Matplotlib FAQ Luckily it’s an easy fix.
mkdir -p ~/.matplotlib && echo 'backend: TkAgg' >> ~/.matplotlib/matplotlibrc (ADD THIS LINE TO TERMINAL)
This adds a line to the matplotlib startup script to set the backend to TkAgg, whatever that means.
Set all graphics from matplotlib to display inline
In [29]:
import matplotlib.pyplot as plt
#DISPLAY MOTPLOTLIB INLINE WITH THE NOTEBOOK AS OPPOSED TO POP UP WINDOW
%matplotlib inline
Read the csv in (it should be UTF-8 already so you don't have to worry about encoding), save it with the proper boring name
In [30]:
df = pd.read_csv('07-hw-animals.csv')
In [32]:
df
Out[32]:
In [25]:
# Display the names of the columns in the csv
In [31]:
df.columns
Out[31]:
Display the first 3 animals.
In [6]:
df.head(3)
Out[6]:
In [26]:
# Sort the animals to see the 3 longest animals.
In [8]:
df.sort_values('length', ascending = False).head(3)
Out[8]:
In [27]:
# What are the counts of the different values of the "animal" column? a.k.a. how many cats and how many dogs.
# Only select the dogs.
In [10]:
(df['animal'] == 'dog').value_counts()
Out[10]:
In [28]:
# Display all of the animals that are greater than 40 cm.
In [12]:
df[df['length'] > 40]
Out[12]:
'length' is the animal's length in cm. Create a new column called inches that is the length in inches.
In [46]:
length_in = df['length']* 0.3937
df['length (in.)'] = length_in
Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."
In [14]:
dogs = df[df['animal'] == 'dog']
cats = df[df['animal'] == 'cat']
Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe.
In [15]:
cats['length'] > 12
Out[15]:
In [16]:
df[(df['length'] > 12) & (df['animal'] == 'cat')]
Out[16]:
What's the mean length of a cat?
In [17]:
# cats.describe() displays all stats for length
In [36]:
cats['length'].mean()
Out[36]:
In [18]:
#only shows mean length
cats.mean()
Out[18]:
What's the mean length of a dog?
In [37]:
dogs['length'].mean()
Out[37]:
In [39]:
dogs['length'].describe()
Out[39]:
In [19]:
dogs.mean()
Out[19]:
Use groupby to accomplish both of the above tasks at once.
In [51]:
df.groupby('animal')['length (in.)'].mean()
Out[51]:
Make a histogram of the length of dogs. I apologize that it is so boring.
In [21]:
dogs.plot(kind='hist', y = 'length (in.)') # all the same length "/
Out[21]:
Change your graphing style to be something else (anything else!)
In [63]:
df.plot(kind="bar", x="name", y="length", color = "red", legend =False)
Out[63]:
In [64]:
df.plot(kind="barh", x="name", y="length", color = "red", legend =False)
Out[64]:
In [22]:
dogs
Out[22]:
In [23]:
dogs.plot(kind='bar')
Out[23]:
In [24]:
# dogs.plot(kind='scatter', x='name', y='length (in.)')
Make a horizontal bar graph of the length of the animals, with their name as the label
In [66]:
df.columns
Out[66]:
In [99]:
dogs['name']
Out[99]:
In [65]:
dogs.plot(kind='bar', x='name', y = 'length', legend=False)
Out[65]:
Make a sorted horizontal bar graph of the cats, with the larger cats on top.
In [66]:
cats.sort_values('length').plot(kind='barh', x='name', y = 'length', legend = False)
Out[66]:
In [ ]: