In [2]:
%matplotlib inline
import numpy as np
import pandas as pd
from netCDF4 import num2date
import netCDF4 as nc
import matplotlib.pyplot as plt
netCDF is a binary storage format popular in Earth Sciences. In this exercis, we simply want to corece the data into Numpy array (2D). Becaue netCDF is a a more sophisticated data format that we're used to, we'll use a dedicated library netCDF4. The main point to remember is that we'll soon have a Numoy array - a familiar object that we know how to handlde.
In [19]:
f = nc.Dataset('../../data/example_data.nc')
# We acess the data through a variables dictionary.
# Let's print th dictionary keys so we know what we're dealing with
print(f.variables.keys())
In [26]:
# use the keys 'Tair' and 'time' to get netcdf data as Numpy arrays
tair = f.variables['Tair'][:] # temperature variable
time = f.variables['time'][:] #time variable
In [21]:
#Let's find our what shape our numpy array is:
tair.shape
Out[21]:
Okay, so that looks good. The netcdf variables dictionary (metadata) told us that the dimensions of this data set were: dimensions(sizes): time(36), y(205), x(275)
It also gave us a a nice summary of our data. The time axis has values that correspond to:
calendar: noleap
dimensions: 1
long_name: time
type_preferred: int
units: days since 0001-01-01
Let's use a slice to take a look at data for the first month:
In [23]:
#Use our go-to 2D array plotting
month1 = tair[0,:,:]
plt.imshow(test, cmap='terrain')
Out[23]:
In [25]:
#And as usual, let's check the type of the data:
type(month1)
Out[25]:
This might be somthing new to you. Masked arrays are arrays that may have missing or invalid entries. The numpy.ma module provides a nearly work-alike replacement for numpy that supports data arrays with masks.
In the lessons we dealt with some numpy arrays that contained NaNs. Masked arrays are nice because we use them almost exactly like we would an normal array. The masked parts of the array are simply invisible to the operations we perform.
Remeber that when our Numpy arrays contained NaNs, we had to resort to using the special methods like np.nanmean() fucntions. With masked arrays we just call regular methods:
In [28]:
print('mean of month one is:', month1.mean())
In [29]:
np.nan
Out[29]:
In [33]:
time[-1]
Out[33]:
Remember the netcdf variable dictionary, which contained metadata, told us that out time had "units: days since 0001-01-01".
This is a great in one way, because days are a linear measurement of time, whereas months and years are not. We will, however, want to work with months and years, and this is where we use the nc.num2date function that we imported.
In [101]:
times = nc.num2date(time, 'days since 0001-01-01', 'noleap')
times
Out[101]:
What we have done is to transform those "days since 0001-01-01" into special object that originate in a module called datetime.
In [31]:
print(type(times[0]))
You don't need to know much about the datetime objects, only that they have some useful attributes and methods:
In [38]:
print("day is:",times[0].day, "month is:",times[0].month, "year is:", times[0].year)
Before operating on the whol data array, we will have a play around with the times array and see if we can find a way to manipulate our data so that we have, for instance, all of the Autumn data in one, all of the Summer data, amd so on. Let's have a look at how the times array looks:
In [73]:
print(times)
In [81]:
#First split array of 36 months into 12 separate arrays, each an individual season
#We do this with the np.split fucntion (note that the output is actually a list of arrays)
seasons = np.array_split(times, 12)
In [84]:
#Next we use slicing on our list of arrays, with a 'step' parameter
#This gathers every fourth seasons, which is what we want
SON = seasons[::4]
DJF = seasons[1::4]
MAM = seasons[2::4]
JJA = seasons[3::4]
print(JJA, type(JJA))
In [92]:
#Now we apply exactly the same steps to the time axis of our temperature data
#Note that because array_split creates lists, we need to manually convert back to arrays
dseasons = np.array_split(tair[:, :, :], 12)
yearlySON = np.array(dseasons[::4])
yearlyDJF = np.array(dseasons[1::4])
yearlyMAM = np.array(dseasons[2::4])
yearlyJJA = np.array(dseasons[3::4])
In [93]:
#Now we can compute the (UNWEIGHTED) Seasonal means
mean_spring = yearlySON.mean(axis=(0,1))
mean_summer = yearlyDJF.mean(axis=(0,1))
mean_autumn = yearlyMAM.mean(axis=(0,1))
mean_winter = yearlyJJA.mean(axis=(0,1))
In [94]:
np.ma.masked_where(mean_summer > 1e3, mean_summer)
Out[94]:
In [95]:
# plot with various axes scales
fig = plt.figure(figsize=(10,8))
is_null = np.isnan(mean_summer)
# summer
ax1 = fig.add_subplot(221)
plt.pcolormesh(np.ma.masked_where(mean_summer > 1e3, mean_summer), vmin=-30, vmax=30, cmap='Spectral_r')
plt.colorbar(extend='both')
plt.tight_layout()
ax1.axes.axis('tight')
# autumn
ax2 = fig.add_subplot(222)
plt.pcolormesh(np.ma.masked_where(mean_summer > 1e3, mean_autumn), vmin=-30, vmax=30, cmap='Spectral_r')
plt.colorbar(extend='both')
ax2.axes.axis('tight')
# winter
ax3 = fig.add_subplot(223)
plt.pcolormesh(np.ma.masked_where(mean_summer > 1e3, mean_spring), vmin=-30, vmax=30, cmap='Spectral_r')
plt.colorbar(extend='both')
plt.tight_layout()
ax3.axes.axis('tight')
# spring
ax4 = fig.add_subplot(224)
plt.pcolormesh(np.ma.masked_where(mean_summer > 1e3, mean_spring), vmin=-30, vmax=30, cmap='Spectral_r')
plt.colorbar(extend='both')
plt.tight_layout()
ax4.axes.axis('tight')
plt.show()
So far we've computed a reasonable approximation to the the seasonal averages. The deficiency in our code is that our averages were unweighted with respect to the number of days in each month, which is variable.
Your task is to rectify this. We have provided nearly all of the tools to do this, you'll juts have to and combine things in a slightly different way.
So, you need to create some new arrays, (similar to e.g. mean_summer, mean_spring), but you need to apply a weighted mean. Once you've created your new arrays, try plotting the arrays for unweighted average , the weighted average, and the difference between the two. You shoud be able to reuse the plot setup from above, just change the name of teh array.
Hints: the np.average() function takes an array of weights as an optional argument.
In [69]:
#Utility dictionary
dpm = {'noleap': [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
'365_day': [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
'standard': [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
'gregorian': [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
'proleptic_gregorian': [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
'all_leap': [0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
'366_day': [0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
'360_day': [0, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30]}
In [70]:
def leap_year(year, calendar='standard'):
"""Determine if year is a leap year"""
leap = False
if ((calendar in ['standard', 'gregorian',
'proleptic_gregorian', 'julian']) and
(year % 4 == 0)):
leap = True
if ((calendar == 'proleptic_gregorian') and
(year % 100 == 0) and
(year % 400 != 0)):
leap = False
elif ((calendar in ['standard', 'gregorian']) and
(year % 100 == 0) and (year % 400 != 0) and
(year < 1583)):
leap = False
return leap
def get_dpm(time, calendar='standard'):
"""
return a array of days per month corresponding to the months provided in `months`
"""
month_length = np.zeros(len(time), dtype=np.int)
cal_days = dpm[calendar]
for i, dt in enumerate(times):
#print i, dt
month_length[i] = cal_days[dt.month]
if leap_year(dt.year, calendar=calendar):
month_length[i] += 1
return month_length
In [71]:
##Example of how to use the get_dpm fucntion ("get days per month")
dayspermonth = get_dpm(times, calendar='standard')
dayspermonth
Out[71]: