Lesson 2 Lab: Practical Pandas

This is a quiz given in Roger Peng's Coursera class Computing for Data Analysis.

Sourced from Research Computing MeetUp's Python course.



In [10]:

    
import pandas as pd
import numpy as np
import os
path = ""
data = pd.read_csv("https://github.com/JamesByers/GA-SEA-DAT2/raw/master/data/ozone.csv")



In [11]:

    
print data.head()









    



   Ozone  Solar.R  Wind  Temp  Month  Day
0   41.0    190.0   7.4    67      5    1
1   36.0    118.0   8.0    72      5    2
2   12.0    149.0  12.6    74      5    3
3   18.0    313.0  11.5    62      5    4
4    NaN      NaN  14.3    56      5    5

Print the column names of the dataset to the screen, one column name per line.



In [43]:

    
data.columns









    Out[43]:





Index([u'Ozone', u'Solar.R', u'Wind', u'Temp', u'Month', u'Day'], dtype='object')

Extract the first 2 rows of the data frame and print them to the console. Console in this case is output into the jupyter notebook. What does the output look like?



In [4]:

    
print data.head(2)









    



   Ozone  Solar.R  Wind  Temp  Month  Day
0   41.0    190.0   7.4    67      5    1
1   36.0    118.0   8.0    72      5    2

How many observations (i.e. rows) are in this data frame?



In [9]:

    
print data.count()









    



Ozone      116
Solar.R    146
Wind       153
Temp       153
Month      153
Day        153
dtype: int64

Extract the last 2 rows of the data frame and print them to the console. What does the output look like?



In [10]:

    
print data.tail(2)









    



     Ozone  Solar.R  Wind  Temp  Month  Day
151   18.0    131.0   8.0    76      9   29
152   20.0    223.0  11.5    68      9   30

What is the value of Ozone in the 47th row?



In [15]:

    
print data.loc[47:47,['Ozone']]









    



    Ozone
47   37.0

How many missing values are in the Ozone column of this data frame?



In [17]:

    
pd.isnull(data['Ozone']).sum()
#print misscnt

#cnt = data['Ozone'].count()
#print cnt

#np.count_nonzero(np.eye(4))

#cnt1 = np.count_nonzero(pd.isnull(data['Ozone']).values)   
#np.count_nonzero(df.isnull())  
#print cnt1









    Out[17]:





37



In [18]:

    
cnt = data['Ozone'] == np.nan
print cnt.count()

What is the mean of the Ozone column in this dataset? Exclude missing values (coded as NA) from this calculation.



In [19]:

    
data['Ozone'].mean()









    Out[19]:





42.12931034482759

Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values are above 90. What is the mean of Solar.R in this subset?



In [34]:

    
#df_posA[df_posA.A < 0] = -1*df_posA

newdf = data[(data.Ozone> 31 )& (data.Temp >90)]

newdf.mean()









    Out[34]:





Ozone       89.5
Solar.R    212.8
Wind         5.6
Temp        93.4
Month        8.2
Day         14.5
dtype: float64

What is the mean of "Temp" when "Month" is equal to 6?



In [ ]:

What was the maximum ozone value in the month of May (i.e. Month = 5)?



In [ ]:

Next Steps

Recommended Resources

Name	Description
Official Pandas Tutorials	Wes & Company's selection of tutorials and lectures
Julia Evans Pandas Cookbook	Great resource with eamples from weather, bikes and 311 calls
Learn Pandas Tutorials	A great series of Pandas tutorials from Dave Rojas
Research Computing Python Data PYNBs	A super awesome set of python notebooks from a meetup-based course exclusively devoted to pandas