notebook.community

Edit and run



In [6]:

    
import pandas as pd









    



/Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages/matplotlib/__init__.py:1035: UserWarning: Duplicate key in file "/Users/mercybenzaquen/.matplotlib/matplotlibrc", line #2
  (fname, cnt))



In [7]:

    
df = pd.read_csv("07-hw-animals.csv")



In [8]:

    
!pip install matplotlib









    



Requirement already satisfied (use --upgrade to upgrade): matplotlib in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pyparsing!=2.0.0,!=2.0.4,>=1.5.6 in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pytz in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): cycler in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /Users/mercybenzaquen/.virtualenvs/Homework7/lib/python3.5/site-packages (from python-dateutil->matplotlib)



In [9]:

    
%matplotlib inline



In [10]:

    
df









    Out[10]:






  
    
      
      animal
      name
      length
    
  
  
    
      0
      cat
      Anne
      35
    
    
      1
      cat
      Bob
      45
    
    
      2
      dog
      Egglesburg
      65
    
    
      3
      dog
      Devon
      50
    
    
      4
      cat
      Charlie
      32
    
    
      5
      dog
      Fontaine
      35

Display the names of the columns in the csv



In [11]:

    
df.columns









    Out[11]:





Index(['animal', 'name', 'length'], dtype='object')



In [12]:

    
df.columns.values









    Out[12]:





array(['animal', 'name', 'length'], dtype=object)

Display the first 3 animals



In [13]:

    
df[['name']].head(3)

Sort the animals to see the 3 longest animals.



In [14]:

    
df.sort_values('length', ascending=False).head(3)









    Out[14]:






  
    
      
      animal
      name
      length
    
  
  
    
      2
      dog
      Egglesburg
      65
    
    
      3
      dog
      Devon
      50
    
    
      1
      cat
      Bob
      45

What are the counts of the different values of the "animal" column? a.k.a. how many cats and how many dogs.



In [15]:

    
df['animal'].value_counts()









    Out[15]:





cat    3
dog    3
Name: animal, dtype: int64

Only select the dogs.



In [16]:

    
df[df['animal'] == 'dog']









    Out[16]:






  
    
      
      animal
      name
      length
    
  
  
    
      2
      dog
      Egglesburg
      65
    
    
      3
      dog
      Devon
      50
    
    
      5
      dog
      Fontaine
      35

Display all of the animals that are greater than 40 cm.



In [17]:

    
df[df['length'] > 40]









    Out[17]:






  
    
      
      animal
      name
      length
    
  
  
    
      1
      cat
      Bob
      45
    
    
      2
      dog
      Egglesburg
      65
    
    
      3
      dog
      Devon
      50

'length' is the animal's length in cm. Create a new column called inches that is the length in inches.



In [18]:

    
df['inches'] = df['length'] * 0.39
df.head()

Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."



In [19]:

    
cats = df['animal'] == 'cat'
#print(cats)
dogs = df['animal'] == 'dog'
#print(dogs)

Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe.



In [20]:

    
cats_above_12_inches = df['inches'] >12
cats = df['animal'] == 'cat'
df[cats & cats_above_12_inches]



In [21]:

    
df[(df['animal'] == 'cat') & (df['inches'] > 12)]

What's the mean length of a cat?



In [22]:

    
#df['length'].describe()

cats= df[df['animal'] == 'cat']
cats["length"].describe()









    Out[22]:





count     3.000000
mean     37.333333
std       6.806859
min      32.000000
25%      33.500000
50%      35.000000
75%      40.000000
max      45.000000
Name: length, dtype: float64

What's the mean length of a dog?



In [23]:

    
dogs= df[df['animal'] == 'dog']
dogs["length"].describe()









    Out[23]:





count     3.0
mean     50.0
std      15.0
min      35.0
25%      42.5
50%      50.0
75%      57.5
max      65.0
Name: length, dtype: float64

Use groupby to accomplish both of the above tasks at once.



In [24]:

    
df.groupby('animal')['length'].describe()









    Out[24]:





animal       
cat     count     3.000000
        mean     37.333333
        std       6.806859
        min      32.000000
        25%      33.500000
        50%      35.000000
        75%      40.000000
        max      45.000000
dog     count     3.000000
        mean     50.000000
        std      15.000000
        min      35.000000
        25%      42.500000
        50%      50.000000
        75%      57.500000
        max      65.000000
Name: length, dtype: float64

Make a histogram of the length of dogs. I apologize that it is so boring.



In [31]:

    
dogs["length"].hist()









    Out[31]:





<matplotlib.axes._subplots.AxesSubplot at 0x10fa3dd68>

Change your graphing style to be something else (anything else!)



In [38]:

    
dogs["length"].plot(kind='pie', y=dogs['length'],labels=dogs['name'], legend=False)









    Out[38]:





<matplotlib.axes._subplots.AxesSubplot at 0x10fdb4198>

Make a horizontal bar graph of the length of the animals, with their name as the label (look at the billionaires notebook I put on Slack!)



In [46]:

    
df.plot(kind='barh', x='name', y='length', legend=False)









    Out[46]:





<matplotlib.axes._subplots.AxesSubplot at 0x11000c9b0>

Make a sorted horizontal bar graph of the cats, with the larger cats on top.



In [56]:

    
cats_ordered= cats.sort_values(by='length', ascending=True)
print(cats_ordered)
cats_ordered.plot(kind='barh', x='name', y='length', legend=False)









    



  animal     name  length  inches
4    cat  Charlie      32   12.48
0    cat     Anne      35   13.65
1    cat      Bob      45   17.55






    Out[56]:





<matplotlib.axes._subplots.AxesSubplot at 0x1108c1940>



In [ ]:

	animal	name	length
0	cat	Anne	35
1	cat	Bob	45
2	dog	Egglesburg	65
3	dog	Devon	50
4	cat	Charlie	32
5	dog	Fontaine	35

	animal	name	length	inches
0	cat	Anne	35	13.65
1	cat	Bob	45	17.55
2	dog	Egglesburg	65	25.35
3	dog	Devon	50	19.50
4	cat	Charlie	32	12.48