In [123]:
import numpy as np
In [124]:
import pandas as pd
In [125]:
import xarray as xr
In [126]:
xr.DataArray(np.random.randn(2, 3))
Out[126]:
In [127]:
data = xr.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])
In [128]:
data
Out[128]:
If you supply a pandas Series or DataFrame, metadata is copied directly:
In [129]:
xr.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))
Out[129]:
Here are the key properties for a DataArray:
In [130]:
data.values
Out[130]:
In [131]:
data.dims
Out[131]:
In [132]:
data.coords
Out[132]:
In [133]:
len(data.coords)
Out[133]:
In [134]:
data.coords['x']
Out[134]:
In [135]:
data.attrs
Out[135]:
In [136]:
data[[0, 1]]
Out[136]:
In [137]:
data.loc['a':'b']
Out[137]:
In [138]:
data.loc
Out[138]:
In [139]:
data.isel(x=slice(2))
Out[139]:
In [140]:
data.sel(x=['a', 'b'])
Out[140]:
In [141]:
data
Out[141]:
In [142]:
data + 10
Out[142]:
In [143]:
np.sin(data)
Out[143]:
In [144]:
data.T
Out[144]:
In [145]:
data.sum()
Out[145]:
However, aggregation operations can use dimension names instead of axis numbers:
In [146]:
data.mean(dim='x')
Out[146]:
Arithmetic operations broadcast based on dimension name. This means you don’t need to insert dummy dimensions for alignment:
In [147]:
a = xr.DataArray(np.random.randn(3), [data.coords['y']])
In [148]:
b = xr.DataArray(np.random.randn(4), dims='z')
In [149]:
a
Out[149]:
In [150]:
b
Out[150]:
In [151]:
a + b
Out[151]:
Another broadcast example:
In [152]:
v1 = xr.DataArray(np.random.rand(3, 2, 4), dims=['t', 'y', 'x'])
In [153]:
v2 = xr.DataArray(np.random.rand(2, 4), dims=['y', 'x'])
In [154]:
v1
Out[154]:
In [155]:
v2
Out[155]:
In [156]:
v1 + v2
Out[156]:
It also means that in most cases you do not need to worry about the order of dimensions:
In [157]:
data - data.T
Out[157]:
Operations also align based on index labels:
In [173]:
data[:-1]
Out[173]:
In [177]:
data[:1]
Out[177]:
In [176]:
data[:-1] - data[:1]
Out[176]:
In [159]:
labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')
In [160]:
labels
Out[160]:
In [185]:
data
Out[185]:
In [184]:
data.groupby(labels).mean('y')
Out[184]:
In [186]:
data.groupby(labels).apply(lambda x: x - x.min())
Out[186]:
In [187]:
data.to_series()
Out[187]:
In [188]:
data.to_pandas()
Out[188]:
In [189]:
ds = data.to_dataset(name='foo')
In [190]:
ds
Out[190]:
You can do almost everything you can do with DataArray
objects with Dataset
objects if you prefer to work with multiple variables at once.
Datasets also let you easily read and write netCDF files:
In [191]:
ds.to_netcdf('example.nc')
In [192]:
xr.open_dataset('example.nc')
Out[192]: