``````

In [2]:

%matplotlib inline

``````

# Toy weather data

Here is an example of how to easily manipulate a toy weather dataset using xarray and other recommended Python libraries:

• Examine a dataset with pandas and seaborn
• Probability of freeze by calendar month
• Monthly averaging
• Calculate monthly anomalies
• Fill missing values with climatology

Shared setup:

``````

In [3]:

import xarray as xr
import numpy as np
import pandas as pd
import seaborn as sns # pandas aware plotting library

np.random.seed(123)

times = pd.date_range('2000-01-01', '2001-12-31', name='time')
annual_cycle = np.sin(2 * np.pi * (times.dayofyear / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)

ds = xr.Dataset({'tmin': (('time', 'location'), tmin_values),
'tmax': (('time', 'location'), tmax_values)},
{'time': times, 'location': ['IA', 'IN', 'IL']})

``````

## Examine a dataset with pandas and seaborn

``````

In [40]:

ds

``````
``````

Out[40]:

<xarray.Dataset>
Dimensions:   (location: 3, time: 731)
Coordinates:
* time      (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...
* location  (location) <U2 'IA' 'IN' 'IL'
Data variables:
tmax      (time, location) float64 12.98 3.31 6.779 0.4479 6.373 4.843 ...
tmin      (time, location) float64 -8.037 -1.788 -3.932 -9.341 -6.558 ...

``````
``````

In [5]:

df = ds.to_dataframe()

``````
``````

In [6]:

``````
``````

Out[6]:

tmax
tmin

location
time

IA
2000-01-01
12.980549
-8.037369

2000-01-02
0.447856
-9.341157

2000-01-03
5.322699
-12.139719

2000-01-04
1.889425
-7.492914

2000-01-05
0.791176
-0.447129

``````
``````

In [7]:

df.describe()

``````
``````

Out[7]:

tmax
tmin

count
2193.000000
2193.000000

mean
20.108232
9.975426

std
11.010569
10.963228

min
-3.506234
-13.395763

25%
9.853905
-0.040347

50%
19.967409
10.060403

75%
30.045588
20.083590

max
43.271148
33.456060

``````
``````

In [8]:

ds.mean(dim='location').to_dataframe().plot()

``````
``````

Out[8]:

<matplotlib.axes._subplots.AxesSubplot at 0x141373ae710>

``````
``````

In [9]:

sns.pairplot(df.reset_index(), vars=ds.data_vars)

``````
``````

Out[9]:

<seaborn.axisgrid.PairGrid at 0x14137318780>

``````

## Probability of freeze by calendar month

``````

In [10]:

freeze = (ds['tmin'] <= 0).groupby('time.month').mean('time')

``````
``````

In [11]:

freeze

``````
``````

Out[11]:

<xarray.DataArray 'tmin' (month: 12, location: 3)>
array([[ 0.9516129 ,  0.88709677,  0.93548387],
[ 0.84210526,  0.71929825,  0.77192982],
[ 0.24193548,  0.12903226,  0.16129032],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.01612903,  0.        ],
[ 0.33333333,  0.35      ,  0.23333333],
[ 0.93548387,  0.85483871,  0.82258065]])
Coordinates:
* location  (location) <U2 'IA' 'IN' 'IL'
* month     (month) int64 1 2 3 4 5 6 7 8 9 10 11 12

``````
``````

In [12]:

freeze.to_pandas().plot()

``````
``````

Out[12]:

<matplotlib.axes._subplots.AxesSubplot at 0x14138f639b0>

``````

## Monthly averaging

``````

In [13]:

monthly_avg = ds.resample('1MS', dim='time', how='mean')

``````
``````

In [14]:

monthly_avg.sel(location='IA').to_dataframe().plot(style='s-')

``````
``````

Out[14]:

<matplotlib.axes._subplots.AxesSubplot at 0x14138f59860>

``````

Note that `MS` here refers to Month-Start; `M` labels Month-End (the last day of the month).

## Calculate monthly anomalies

In climatology, “anomalies” refer to the difference between observations and typical weather for a particular season. Unlike observations, anomalies should not show any seasonal cycle.

``````

In [15]:

climatology = ds.groupby('time.month').mean('time')

``````
``````

In [16]:

anomalies = ds.groupby('time.month') - climatology

``````
``````

In [17]:

anomalies.mean('location').to_dataframe()[['tmin', 'tmax']].plot()

``````
``````

Out[17]:

<matplotlib.axes._subplots.AxesSubplot at 0x14139021780>

``````

## Fill missing values with climatology

The `fillna()` method on grouped objects lets you easily fill missing values by group:

``````

In [18]:

# throw away the first half of every month
some_missing = ds.tmin.sel(time=ds['time.day'] > 15).reindex_like(ds)

``````
``````

In [19]:

filled = some_missing.groupby('time.month').fillna(climatology.tmin)

``````
``````

In [20]:

both = xr.Dataset({'some_missing': some_missing, 'filled': filled})

``````
``````

In [21]:

both

``````
``````

Out[21]:

<xarray.Dataset>
Dimensions:       (location: 3, time: 731)
Coordinates:
* location      (location) <U2 'IA' 'IN' 'IL'
* time          (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...
month         (time) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
Data variables:
filled        (time, location) float64 -5.163 -4.216 -4.681 -5.163 ...
some_missing  (time, location) float64 nan nan nan nan nan nan nan nan ...

``````
``````

In [22]:

df = both.sel(time='2000').mean('location').reset_coords(drop=True).to_dataframe()

``````
``````

In [23]:

df[['filled', 'some_missing']].plot()

``````
``````

Out[23]:

<matplotlib.axes._subplots.AxesSubplot at 0x14139044908>

``````
``````

In [ ]:

``````