Hack Night - Xarray tutorial - Lvl: basic intro

Author: Sai Nudurupati 19Oct17

Material presented here is extensively mined (copied with permission) from the tutorial (https://github.com/geohackweek/tutorial_contents/blob/master/nDarrays/notebooks/ndarrays_intro.ipynb) presented by Joe Hamman (github: jhamman) at Geohack week (Sep 17) at University of Washington. I have also referred online documentation for 'xarray' (http://xarray.pydata.org/en/stable/)


In [ ]:
# Ignore warnings
import warnings; warnings.simplefilter('ignore')

In [ ]:
%matplotlib inline

N-dimensional Numpy Arrays


In [ ]:
import numpy as np

sst = np.random.random(size=(2, 4, 4))

print sst
# print(sst[:, ::1, ::3])  # indexing

In [ ]:
sst.shape

In [ ]:
print(sst[:, ::1, ::3])

Pandas dataframe


In [ ]:
import pandas as pd

df = pd.read_csv('https://climate.nasa.gov/system/internal_resources/details/original/647_Global_Temperature_Data_File.txt',
                 sep=r"\s*", names=['year', '1yr', '5yr'], index_col='year')

df.loc[1984: ]

In [ ]:
df.plot()

The network Common Data Format (netCDF)

  • collection of self-describing, machine-independent binary data formats and software tools that facilitate the creation, access and sharing of data stored in N-dimensional arrays

Xarray

  • extends some of core functionality of the Pandas library to N-dimensional arrays
  • can be used for:
    • multidimensional data (e.g. climate data: x, y, z, time)
    • structured data on a regular grid
    • data in netCDF format
  • Two main data structures of xarray
    • DataArray
    • Dataset

DataArray

  • xarray's implementation of a labeled, multi-dimensional array
  • the DataArray has these key properties:
    • data: Ndimensional array(NumPy or dask)
    • dims: dimension names for each axis
    • coords: dictionary-like container of arrays that label each point, and
    • attrs: ordered dictionary holding metadata

Dataset

  • xarray's multi=dimensional equivalent of a Pandas Dataframe
  • dict-like container of DataArray objects with aligned dimensions
  • Datasets have these key properties:
    • dims: dictionary mapping from dimension names to the fixed length of each dimension,
    • data_vars: dict-like container of DataArrays corresponding to data variables,
    • coords: dictionary-like container of DataArrays intended to label points used in data_vars
    • attrs: ordered dictionary holding metadata

Lets look at some datasets

Import xarray library


In [ ]:
import xarray as xr

Open a dataset


In [ ]:
ds = xr.open_dataset('3B43.20100101.7A.nc')

At this point Python is just scanning the contents of the file. It is not reading the data into its memory.

Dataset properties


In [ ]:
ds

Extracting DataArrays from a Dataset


In [ ]:
precipitation = ds['pcp']

Indexing this DataArray

traditional or the Matlab/Numpy way


In [ ]:
precipitation[0, 0, 0]

using .loc (positional indexing)


In [ ]:
ds['pcp'].loc['2010-01-01']

using .isel (refers to seleciton by integer position)


In [ ]:
ds['pcp'].isel(time=0, latitude=0, longitude=0)

Plotting data in 2 dimensions


In [ ]:
map_data = ds['pcp'].sel(time='2010-01-01')

In [ ]:
map_data.plot()

Want to customize it?


In [ ]:
import matplotlib.pyplot as plt
map_data.plot(cmap=plt.cm.Blues)
plt.title('Global Precipitation data')
plt.tight_layout()
plt.show()

In [ ]: