In [6]:

    
%matplotlib inline

Working with Data

I want to expand a little bit on the example that I used in the DataFrames tutorial, and demonstrate some more advanced ways to grab, slice, and plot data. This first cell is just a copy/pase job for reading in the datafile from the last lesson.



In [7]:

    
def cdf_to_dataframe(netcdf_file, exclude_qc=True):
    """Takes in a netCDF object and returns a pandas DataFrame object
    """
    
    # import packages
    from netCDF4 import Dataset
    import pandas as pd
    import datetime
    
    with Dataset(netcdf_file, 'r') as D:
        
        # create an empty dictionary for the netCDF variables
        ncvars = {}
        
        for v in D.variables.keys():
            time_check = (D.variables[v].dimensions 
                          == D.variables['time'].dimensions)
            if exclude_qc:
                qc_check = 'qc_' not in v
                var_check = qc_check and time_check
            else:
                var_check = time_check
                
            if var_check:
                ncvars[v] = D.variables[v][:]
            
        D = pd.DataFrame(ncvars,
                index = (datetime.datetime.utcfromtimestamp(D.variables['base_time'][:])+
                        pd.to_timedelta(D.variables['time'][:], unit='s')))
        
    return D

import os
file_path = os.path.abspath('enametC1.b1.20140531.000000.cdf')
DATA = cdf_to_dataframe(file_path)

What can we do from here?

How about we take a simple example. Let's dive further into the temperature data; specifically, let's do the following:

resample to hourly averages
for each hour, plot the min, mean, and max
for each hour, make a boxplot of the values

There are a couple of ways to do this. One is to use the pandas.DataFrame.resample() method we saw earlier to get the data into 1-hour averages. Then, we could do the necessary calculations if we wanted to. Instead, this will demonstrate the DataFrame.groupby() functionality, combined with the aggregate tool. Here we go:



In [9]:

    
import pandas as pd
import numpy as np

hourly = pd.TimeGrouper('1H')
T = DATA['temp_mean'].groupby(hourly).agg([np.min, np.mean, np.max])
T









    Out[9]:






  
    
      
      amin
      mean
      amax
    
  
  
    
      2014-05-31 00:00:00
      18.260000
      18.569334
      19.040001
    
    
      2014-05-31 01:00:00
      17.980000
      18.088501
      18.340000
    
    
      2014-05-31 02:00:00
      17.540001
      17.761333
      18.090000
    
    
      2014-05-31 03:00:00
      17.420000
      17.776833
      18.120001
    
    
      2014-05-31 04:00:00
      17.240000
      17.559999
      18.010000
    
    
      2014-05-31 05:00:00
      17.389999
      17.770334
      18.020000
    
    
      2014-05-31 06:00:00
      17.480000
      17.707666
      17.950001
    
    
      2014-05-31 07:00:00
      18.000000
      19.100834
      20.280001
    
    
      2014-05-31 08:00:00
      20.320000
      20.991501
      21.590000
    
    
      2014-05-31 09:00:00
      20.799999
      21.356333
      21.930000
    
    
      2014-05-31 10:00:00
      21.190001
      22.094500
      22.920000
    
    
      2014-05-31 11:00:00
      22.590000
      22.922501
      23.309999
    
    
      2014-05-31 12:00:00
      22.150000
      22.667500
      23.290001
    
    
      2014-05-31 13:00:00
      22.740000
      23.334333
      23.920000
    
    
      2014-05-31 14:00:00
      23.320000
      23.882833
      24.330000
    
    
      2014-05-31 15:00:00
      23.660000
      23.962166
      24.459999
    
    
      2014-05-31 16:00:00
      23.139999
      23.638000
      24.120001
    
    
      2014-05-31 17:00:00
      23.049999
      23.725500
      24.459999
    
    
      2014-05-31 18:00:00
      23.129999
      23.458000
      23.740000
    
    
      2014-05-31 19:00:00
      22.139999
      22.750999
      23.620001
    
    
      2014-05-31 20:00:00
      20.340000
      21.390333
      22.200001
    
    
      2014-05-31 21:00:00
      19.340000
      19.715334
      20.420000
    
    
      2014-05-31 22:00:00
      18.780001
      19.034500
      19.389999
    
    
      2014-05-31 23:00:00
      18.030001
      18.249500
      18.820000



In [ ]:

	amin	mean	amax
2014-05-31 00:00:00	18.260000	18.569334	19.040001
2014-05-31 01:00:00	17.980000	18.088501	18.340000
2014-05-31 02:00:00	17.540001	17.761333	18.090000
2014-05-31 03:00:00	17.420000	17.776833	18.120001
2014-05-31 04:00:00	17.240000	17.559999	18.010000
2014-05-31 05:00:00	17.389999	17.770334	18.020000
2014-05-31 06:00:00	17.480000	17.707666	17.950001
2014-05-31 07:00:00	18.000000	19.100834	20.280001
2014-05-31 08:00:00	20.320000	20.991501	21.590000
2014-05-31 09:00:00	20.799999	21.356333	21.930000
2014-05-31 10:00:00	21.190001	22.094500	22.920000
2014-05-31 11:00:00	22.590000	22.922501	23.309999
2014-05-31 12:00:00	22.150000	22.667500	23.290001
2014-05-31 13:00:00	22.740000	23.334333	23.920000
2014-05-31 14:00:00	23.320000	23.882833	24.330000
2014-05-31 15:00:00	23.660000	23.962166	24.459999
2014-05-31 16:00:00	23.139999	23.638000	24.120001
2014-05-31 17:00:00	23.049999	23.725500	24.459999
2014-05-31 18:00:00	23.129999	23.458000	23.740000
2014-05-31 19:00:00	22.139999	22.750999	23.620001
2014-05-31 20:00:00	20.340000	21.390333	22.200001
2014-05-31 21:00:00	19.340000	19.715334	20.420000
2014-05-31 22:00:00	18.780001	19.034500	19.389999
2014-05-31 23:00:00	18.030001	18.249500	18.820000