In [6]:
import pandas as pd


/Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages/matplotlib/__init__.py:1035: UserWarning: Duplicate key in file "/Users/mercybenzaquen/.matplotlib/matplotlibrc", line #2
  (fname, cnt))

In [7]:
df = pd.read_csv("07-hw-animals.csv")

In [8]:
!pip install matplotlib


Requirement already satisfied (use --upgrade to upgrade): matplotlib in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pyparsing!=2.0.0,!=2.0.4,>=1.5.6 in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pytz in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): cycler in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from python-dateutil->matplotlib)

In [9]:
%matplotlib inline

In [10]:
df


Out[10]:
animal name length
0 cat Anne 35
1 cat Bob 45
2 dog Egglesburg 65
3 dog Devon 50
4 cat Charlie 32
5 dog Fontaine 35

Display the names of the columns in the csv


In [11]:
df.columns


Out[11]:
Index(['animal', 'name', 'length'], dtype='object')

In [12]:
df.columns.values


Out[12]:
array(['animal', 'name', 'length'], dtype=object)

Display the first 3 animals


In [13]:
df[['name']].head(3)


Out[13]:
name
0 Anne
1 Bob
2 Egglesburg

Sort the animals to see the 3 longest animals.


In [14]:
df.sort_values('length', ascending=False).head(3)


Out[14]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
1 cat Bob 45

What are the counts of the different values of the "animal" column? a.k.a. how many cats and how many dogs.


In [15]:
df['animal'].value_counts()


Out[15]:
cat    3
dog    3
Name: animal, dtype: int64

Only select the dogs.


In [16]:
df[df['animal'] == 'dog']


Out[16]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
5 dog Fontaine 35

Display all of the animals that are greater than 40 cm.


In [17]:
df[df['length'] > 40]


Out[17]:
animal name length
1 cat Bob 45
2 dog Egglesburg 65
3 dog Devon 50

'length' is the animal's length in cm. Create a new column called inches that is the length in inches.


In [18]:
df['inches'] = df['length'] * 0.39
df.head()


Out[18]:
animal name length inches
0 cat Anne 35 13.65
1 cat Bob 45 17.55
2 dog Egglesburg 65 25.35
3 dog Devon 50 19.50
4 cat Charlie 32 12.48

Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."


In [19]:
cats = df['animal'] == 'cat'
#print(cats)
dogs = df['animal'] == 'dog'
#print(dogs)

Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe.


In [20]:
cats_above_12_inches = df['inches'] >12
cats = df['animal'] == 'cat'
df[cats & cats_above_12_inches]


Out[20]:
animal name length inches
0 cat Anne 35 13.65
1 cat Bob 45 17.55
4 cat Charlie 32 12.48

In [21]:
df[(df['animal'] == 'cat') & (df['inches'] > 12)]


Out[21]:
animal name length inches
0 cat Anne 35 13.65
1 cat Bob 45 17.55
4 cat Charlie 32 12.48

What's the mean length of a cat?


In [22]:
#df['length'].describe()

cats= df[df['animal'] == 'cat']
cats["length"].describe()


Out[22]:
count     3.000000
mean     37.333333
std       6.806859
min      32.000000
25%      33.500000
50%      35.000000
75%      40.000000
max      45.000000
Name: length, dtype: float64

What's the mean length of a dog?


In [23]:
dogs= df[df['animal'] == 'dog']
dogs["length"].describe()


Out[23]:
count     3.0
mean     50.0
std      15.0
min      35.0
25%      42.5
50%      50.0
75%      57.5
max      65.0
Name: length, dtype: float64

Use groupby to accomplish both of the above tasks at once.


In [24]:
df.groupby('animal')['length'].describe()


Out[24]:
animal       
cat     count     3.000000
        mean     37.333333
        std       6.806859
        min      32.000000
        25%      33.500000
        50%      35.000000
        75%      40.000000
        max      45.000000
dog     count     3.000000
        mean     50.000000
        std      15.000000
        min      35.000000
        25%      42.500000
        50%      50.000000
        75%      57.500000
        max      65.000000
Name: length, dtype: float64

Make a histogram of the length of dogs. I apologize that it is so boring.


In [31]:
dogs["length"].hist()


Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x10fa3dd68>

Change your graphing style to be something else (anything else!)


In [38]:
dogs["length"].plot(kind='pie', y=dogs['length'],labels=dogs['name'], legend=False)


Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x10fdb4198>

Make a horizontal bar graph of the length of the animals, with their name as the label (look at the billionaires notebook I put on Slack!)


In [46]:
df.plot(kind='barh', x='name', y='length', legend=False)


Out[46]:
<matplotlib.axes._subplots.AxesSubplot at 0x11000c9b0>

Make a sorted horizontal bar graph of the cats, with the larger cats on top.


In [56]:
cats_ordered= cats.sort_values(by='length', ascending=True)
print(cats_ordered)
cats_ordered.plot(kind='barh', x='name', y='length', legend=False)


  animal     name  length  inches
4    cat  Charlie      32   12.48
0    cat     Anne      35   13.65
1    cat      Bob      45   17.55
Out[56]:
<matplotlib.axes._subplots.AxesSubplot at 0x1108c1940>

In [ ]: