In an earlier post I briefly described the use of the xarray module to easily manipulate medium sized datasets. I then went on to describe a more complex though not necessarily more complicated example of aggregating data from multiple netCDF files and reshaping them as input to easily build an intuitive graphical output. Now for the fun part of building some nice visualization.
The data in question happens to be geophysical, 4-daily mapped remote sensing reflectance uncertainty for a number of spectra bands, loosely ranging from blue to red. At least in my experience, visualizing these data for public consumption can be cumbersome, requiring some significant, and therefore error-prone, coding. Typically, this means a lot of back-and-forth until a useful but also presentable format is achieved. For a different experience, I opted to give GeoViews a try. GeoViews is designed to allow rapid creation of compelling visualizations with very few lines of code. This is quite in contrast with my experience with, e.g., Matplotlib's Basemap toolkit. Note that there are a few dependencies to install, so check the documentation if you wish to follow along.
GeoViews sits on top of HoloViews, a library where the user is directed to annotate his or her data. Holoviews then takes the annotated data to render it with minimum fuss.
A quick note on the data: the data I use here in this post is from my own work on remote sensing uncertainty, and not currently publicly available. However, this data has the same format as SMI (level-3, mapped) data that can be readily downloaded from the Ocean Color Processing Group's level-3 browser. The methodology laid out in this post should therefore be directly applicable to any SMI data you care to download.
So without further ado, I'll jump right in!
Let's begin first with the necessary imports:
In [1]:
import warnings
warnings.filterwarnings("ignore")
import holoviews as hv
import geoviews as gv
from geoviews import feature as gf
from cartopy import crs
import numpy as np
import xarray as xr
from matplotlib import rcParams
import matplotlib.pyplot as pl
import cmocean as cm
import seaborn
import os
import glob
In [2]:
%matplotlib inline
Jupyter notebook by default restricts the size of embedded figures. I don't like that so to change that for some layout freedom:
In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
Finally, I would like some uniformity in text layout across this notebook's graphic output - although for greater flexibility, I'll explore ways of customizing text in some parts of the graphic output.
Holoviews features matplotlib and bokeh support, among others. For my purpose, the matplotlib backend is sufficient
In [4]:
hv.notebook_extension('matplotlib')
In [14]:
dataDir = '/accounts/ekarakoy/disk02/UNCERTAINTIES/Monte-Carlo/DATA/AncillaryMC/Synthesis/8D/'
origName = 'S20031932003200.L3m_*.nc'
fGen = glob.iglob(os.path.join(dataDir,origName))
Reminder: in the last post, I merged Rrs arrays creating in addition to the already existing lat/lon dimensions a third dimension, 'experiment, which indexed all five experiments conducted in this study, at a given band. The names of the new files now feature the spectral band that corresponds to the data they contain. I'm going to quickly organize the file paths into a dictionary with bands as keys so I can easily pass relevant file paths to xarray, and also so it's easy to see what I'm doing. Just so...
In [15]:
fileDict={file.rsplit('_',1)[1].strip('.nc'): file for file in fGen}
fileDict
Out[15]:
The next thing is to identify the relevant key dimensions (kdims) and value dimensions (vdims), which are inputs description semantics that HoloViews (and GeoViews) use to compose graphical output. The key dimensions, kdims determine how data can be indexed and therefore selected. vdims is the data that will be plotted. Given the datasets above all share the same dimension, this is pretty short.
In [16]:
kdims = ['lat', 'lon', 'experiment']
vdims = ['rrs_unc']
I'll load the first file, corresponding to the blue wavelength, 412nm. In the original dataset, rrs_unc is an array with dimensions a 5 x 1556 x 4318 (experiment x lat x lon) array. To avoid waiting too long for the holoviews renderer, I subsample $\frac{1}{2}$ along the lat dimension, and $\frac{1}{5}$ along the lon dimension; this is the purpose of the SubSample function defined below
In [17]:
def SubSample(wvl):
with xr.open_dataset(fileDict[wvl]) as rawEnsemble:
subSample = rawEnsemble.data_vars['rrs_unc'][:,::2,::5].copy()
dsSubSample = subSample.to_dataset()
return dsSubSample
In [18]:
def PlotHists(wvl,ds, rng=(0,1e-3)):
f,ax=pl.subplots(figsize=(16,8))
rrsUnc = ds.to_array()
experiments = rrsUnc.experiment.data.tolist()
cmap = pl.cm.Vega10
for i,exp in enumerate(experiments):
ax.hist(rrsUnc.loc[:,exp,:,:].to_masked_array().compressed(), bins=100, range=rng,
alpha=0.5, color=cmap(i), label=exp, normed=True)
ax.hist(rrsUnc.loc[:,exp,:,:].to_masked_array().compressed(), histtype='step',bins=100,
range=rng, linewidth=3, color='k', normed=True)
ax.legend(loc='best', fontsize=15);
ax.set_xlabel('Rrs uncertainty $sr^{-1}$', fontsize=16)
ax.set_ylabel('freq.', fontsize=16);
ax.set_title('$\lambda = %s nm$' % wvl, fontsize=18)
In [19]:
def PlotLatLonPatterns(wvl, gvDs, exp):
titleString = 'Rrs Unc (%s) - Noise Source: %s' %(wvl, exp)
return hv.Spread(gvDs.select(experiment=exp).aggregate('lat', np.mean, np.std),
group=titleString) + hv.Spread(gvDs.select(experiment=exp).aggregate('lon', np.mean, np.std), group=titleString)
In [ ]:
In [20]:
band = '412'
dsSub = SubSample(band)
Next is to explain to GeoViews how to organize the visualization (via kdims and vdims) what projection to use (here I'll use the plate carrée projection). This will result in a gv.Dataset object wrapping the xarray dataset.
In [21]:
gvDataset = gv.Dataset(dsSub, crs=crs.PlateCarree(),
kdims=kdims, vdims=vdims, group='Rrs uncertainty (%s)' % band)
Next is to create a collection of images indexed by lon/lat by building a HoloViews HoloMap; basically a fancy dictionary of gv.Image elements. The important point here is that I only specify two of the key dimensions; lat and lon. Let's see what happens to the third 'implied' dimension, experiment. The '*' specifies an overlay, in contrast to a '+' that would create a second frame in the layout. I'll demonstrate the latter further below
The next line specifies options for the Image object. Of potential interest to geoscientists is the use of one of the cmocean colormaps, which are adjusted for uniform luminance to avoid unintended overemphasis of random features.
In [29]:
mycmap = cm.cm.haline
mycmap.set_under('k', alpha=1)
mycmap.set_under('k', alpha=1)
Before plotting, I defined some helper function ("hooks"). These are
In [26]:
def setaxis(plot, element):
fig = plot.handles['fig']
iax = plot.handles['axis']
cax = plot.handles['cax']
cbar = plot.handles['cbar']
fig.set_size_inches(30,15)
ttl = iax.get_title().split(':')[1].lstrip()
iax.set_title(ttl)
iaxPos = iax.axes.get_position().bounds
caxPos = cax.axes.get_position().bounds
#cax.axes.set_position([iaxPos[0]+iaxPos[2]*1.05, 0.15, caxPos[2], 0.75])
if ttl in ['Lt', 'Pressure', 'WindSpeed']:
cax.set_visible(False)
if 'Lt' in ttl or 'All' in ttl:
minBar, maxBar = 1e-4, 1e-3
step = 5e-4
stop = maxBar + step
else:
minBar, maxBar = 1e-6, 1e-4
step = minBar * 10
stop = maxBar + step
cbar.set_clim(minBar, maxBar)
#cbar.set_ticks(np.arange(minBar, stop, step))
#cbar.set_ticklabels(np.arange(minBar, stop, step))
cbar.formatter.set_powerlimits((-2, 2))
cbar.update_ticks()
cbar.draw_all()
In [24]:
def set_cbar_fonts(plot, element):
# handles: 'cax', 'bbox_extra_artists','title','artist','axis','cbar','fig'
cax = plot.handles['cax']
cax.tick_params(labelsize=18)
caxPos = cax.axes.get_position().bounds
cax.axes.set_position([caxPos[0],caxPos[1], caxPos[2], caxPos[3]*0.9])
cax.yaxis.label.set_fontsize(18)
cax.set_ylabel(r'$sr^{-1}$')
exp = cax.yaxis.get_offset_text()
exp.set_size(18)
In [220]:
%%opts Image {+axiswise}[ fontsize=None, colorbar=True, cbar_width=1e-1, logz=False, final_hooks=[setaxis, set_cbar_fonts], cbar_width=2e-2] style(cmap=mycmap ) Overlay[fontsize=22] NdLayout [aspect_weight=0.7 figsize=1000 hspace=False fontsize={'title':24}]
image = gvDataset.to(gv.Image,['lon','lat']) * gf.land(style=dict(facecolor='gray'))
layout = hv.NdLayout(image)
layout.cols(2)
# 4D
Out[220]:
In [27]:
%%opts Image {+axiswise}[ fontsize=None, colorbar=True, cbar_width=1e-1, logz=False, final_hooks=[setaxis, set_cbar_fonts], cbar_width=2e-2] style(cmap=mycmap ) Overlay[fontsize=22] NdLayout [aspect_weight=0.7 figsize=1000 hspace=False fontsize={'title':24}]
image =gvDataset.to(gv.Image,['lon','lat']) * gf.land(style=dict(facecolor='gray'))
layout = hv.NdLayout(image)
layout.cols(2)
#8D
Out[27]:
In [ ]:
PlotHists('412', dsSub)
In [ ]:
%%output fig="png"
PlotLatLonPatterns('412', gvDataset, exp='AllAnc_Lt')
In [ ]:
PlotLatLonPatterns('412', gvDataset, exp='Lt')
In [ ]:
PlotLatLonPatterns('412', gvDataset, exp='Pressure')
In [ ]:
PlotLatLonPatterns('412', gvDataset, exp='WindSpeed')
In [ ]:
dsSub.close()
band='443'
dsSub = SubSample(band)
gvDataset = gv.Dataset(dsSub, crs=crs.PlateCarree(),
kdims=kdims, vdims=vdims, group='Rrs uncertainty (%s)' % band)
In [ ]:
rrs_unc_hdl = gvDataset.redim(rrs_unc=dict(range=(2e-6, 2e-3))).to(gv.Image, ['lon', 'lat']) * gf.land(style=dict(facecolor='black'))
rrs_unc_hdl
In [ ]:
PlotHists(band, dsSub)
In [ ]:
dsSub.close()
band='490'
dsSub = SubSample(band)
gvDataset = gv.Dataset(dsSub, crs=crs.PlateCarree(),
kdims=kdims, vdims=vdims, group='Rrs uncertainty (%s)' % band)
In [ ]:
rrs_unc_hdl = gvDataset.redim(rrs_unc=dict(range=(2e-6, 2e-3))).to(gv.Image, ['lon', 'lat']) * gf.land(style=dict(facecolor='black'))
rrs_unc_hdl
In [ ]:
PlotHists(band, dsSub)
In [ ]:
dsSub.close()
band='510'
dsSub = SubSample(band)
gvDataset = gv.Dataset(dsSub, crs=crs.PlateCarree(),
kdims=kdims, vdims=vdims, group='Rrs uncertainty (%s)' % band)
In [ ]:
rrs_unc_hdl = gvDataset.redim(rrs_unc=dict(range=(2e-6, 2e-3))).to(gv.Image, ['lon', 'lat']) * gf.land(style=dict(facecolor='black'))
rrs_unc_hdl
In [ ]:
PlotHists(band, dsSub)
In [ ]:
dsSub.close()
band='555'
dsSub = SubSample(band)
gvDataset = gv.Dataset(dsSub, crs=crs.PlateCarree(),
kdims=kdims, vdims=vdims, group='Rrs uncertainty (%s)' % band)
In [ ]:
rrs_unc_hdl = gvDataset.redim(rrs_unc=dict(range=(2e-6, 2e-3))).to(gv.Image, ['lon', 'lat']) * gf.land(style=dict(facecolor='black'))
rrs_unc_hdl
In [ ]:
PlotHists(band, dsSub)
In [ ]:
dsSub.close()
band='670'
dsSub = SubSample(band)
gvDataset = gv.Dataset(dsSub, crs=crs.PlateCarree(),
kdims=kdims, vdims=vdims, group='Rrs uncertainty (%s)' % band)
In [ ]:
rrs_unc_hdl = gvDataset.redim(rrs_unc=dict(range=(2e-6, 2e-3))).to(gv.Image, ['lon', 'lat']) * gf.land(style=dict(facecolor='black'))
rrs_unc_hdl
In [ ]:
PlotHists(band, dsSub)
In [ ]: