Automatic Range Update

This document intends to implement a proof of concept to automatically adjust the y-range of bokeh plots. It uses python 2.7 and the specific packages numpy, netcdf and bokeh.

The purpose is to get a similar visualization as the jwebView in socib thredds mooring stations.

Basic Processing


In [1]:
from netCDF4 import Dataset
import numpy as np
from bokeh.io import output_notebook, show, output_file
from bokeh.models import Range1d, CustomJS, PanTool
from bokeh.plotting import figure, ColumnDataSource
from collections import OrderedDict

In [2]:
def get_data_array(data_array):
    """
    returns pure data in NetCDF variable (without mask)
    :param data_array: NetCDF Variable
    :return: data array (just [xxx])
    """
    if type(data_array.__array__()) is np.ma.masked_array:
        return data_array.__array__().data
    else:
        return data_array.__array__()

Now we define the input netcdf document. Which we simply set to the L1 2016 February ParcBit data.


In [3]:
link = 'http://thredds.socib.es/thredds/dodsC/mooring/weather_station/station_parcbit-scb_met004/L1/2016/dep0002_station-parcbit_scb-met004_L1_2016-02.nc'
root = Dataset(link)
time = get_data_array(root.variables.get('time'))
pre = get_data_array(root.variables.get('AIR_PRE'))

Simple bokeh line graph

We now create a simple bokeh line graph to visualize the air pressure dataset.


In [4]:
output_notebook()
fig_standard = figure(plot_width=600, plot_height=300, tools=["pan, xwheel_zoom, reset"])
fig_standard.line(time, pre)
pan_tool_standard = fig_standard.select(dict(type=PanTool))
pan_tool_standard.dimensions = ["width"]
show(fig_standard)


Loading BokehJS ...
Out[4]:

<Bokeh Notebook handle for In[4]>

Adjust the y range automatically

As we now explore the dataset (scrolling around the x-axis), we recognise that the initial set to maximum y-range values is not intuitive.

To modify this behaviour we try to create a customJS that sets the y-range attributes to the current maximum extend.


In [5]:
output_file("range_update.html")
fig = figure(plot_width=600, plot_height=300, tools=["pan, xwheel_zoom, reset"])
pan_tool = fig.select(dict(type=PanTool))
pan_tool.dimensions = ["width"]

fig.line(time, pre)

source = ColumnDataSource({'x': time, 'y': pre})

jscode = """
function isNumeric(n) {
  return !isNaN(parseFloat(n)) && isFinite(n);
}
var data = source.get('data');
    var start = yrange.get('start');
    var end = yrange.get('end');
    
    var time_start = xrange.get('start');
    var time_end = xrange.get('end');
    
    var pre_max_old = end;
    var pre_min_old = start;
    
    var time = data['x'];
    var pre = data['y'];
    t_idx_start = time.filter(function(st){return st>=time_start})[0];
    t_idx_start = time.indexOf(t_idx_start);
    
    t_idx_end = time.filter(function(st){return st>=time_end})[0];
    t_idx_end = time.indexOf(t_idx_end);
    
    var pre_interval = pre.slice(t_idx_start, t_idx_end);
    pre_interval = pre_interval.filter(function(st){return !isNaN(st)});
    var pre_max = Math.max.apply(null, pre_interval);
    var pre_min = Math.min.apply(null, pre_interval);
    var ten_percent = (pre_max-pre_min)*0.1;
    
    pre_max = pre_max + ten_percent;
    pre_min = pre_min - ten_percent;
    
    if((!isNumeric(pre_max)) || (!isNumeric(pre_min))) {
        pre_max = pre_max_old;
        pre_min = pre_min_old;
    }
    
    yrange.set('start', pre_min);
    yrange.set('end', pre_max);
    source.trigger('change');
    """

fig.y_range.callback = CustomJS(
        args=dict(source=source, yrange=fig.y_range, xrange=fig.x_range), code=jscode)
fig.x_range.callback = CustomJS(
        args=dict(source=source, yrange=fig.y_range, xrange=fig.x_range), code=jscode)
show(fig)


Out[5]:

<Bokeh Notebook handle for In[5]>

We see that it is possible to use callbacks for the automatic range update. I am just a bit concerned that it could be slow, since we reprocess the whole dataset client-side to obtain the specific ranges.

Automatic y range update with x-datetime axis.

We now add support for converted timestamps (x-axis update). For this purpose, we need to import the datetime, time and pandas packages.

Bokeh can handle pandas timestamp series. However, internally they are handled in milliseconds from 1.1.1970. Hence, we also have to convert the data to this datetime.


In [6]:
from __future__ import division
from datetime import datetime, timedelta
import pandas as pd
import time as timeImport

def get_pandas_timestamp_series(datetime_array):
    out = pd.Series(np.zeros(len(datetime_array)))
    counter = 0
    for i in datetime_array:
        out[counter] = pd.tslib.Timestamp(i)
        counter += 1
    return out

def get_str_time(x): return str(x)

def totimestamp(dt, epoch=datetime(1970,1,1)):
    td = dt - epoch
    # return td.total_seconds()
    return (td.microseconds + (td.seconds + td.days * 86400) * 10**6) / 10**6 


date_converted = [datetime.fromtimestamp(ts) for ts in time]
converted_time = get_pandas_timestamp_series(date_converted)

translate_time = converted_time.apply(lambda x: x.to_pydatetime())
converted_time_backward = map(totimestamp, translate_time)

The plotting is now basically the same. We just set the x_axis_type to "datetime" and set the x-custom ColumnDataSource to the backward converted time. Besides this, we convert the bokeh millisecond representation to match our timestamps in seconds by dividing the xranges by 1000.


In [7]:
fig = figure(plot_width=600, plot_height=300, tools=["pan, xwheel_zoom, reset"], x_axis_type="datetime")
pan_tool = fig.select(dict(type=PanTool))
pan_tool.dimensions = ["width"]

fig.line(converted_time, pre)

source = ColumnDataSource({'x': converted_time_backward, 'y': pre})

jscode = """
    function isNumeric(n) {
      return !isNaN(parseFloat(n)) && isFinite(n);
    }
    var data = source.get('data');
    var start = yrange.get('start');   
    var end = yrange.get('end');
    
    var time_start = xrange.get('start')/1000;
    var time_end = xrange.get('end')/1000;
    
    var pre_max_old = end;
    var pre_min_old = start;
    
    var time = data['x'];
    var pre = data['y'];
    t_idx_start = time.filter(function(st){return st>=time_start})[0];
    t_idx_start = time.indexOf(t_idx_start);
    
    t_idx_end = time.filter(function(st){return st>=time_end})[0];
    t_idx_end = time.indexOf(t_idx_end);
    
    var pre_interval = pre.slice(t_idx_start, t_idx_end);
    pre_interval = pre_interval.filter(function(st){return !isNaN(st)});
    var pre_max = Math.max.apply(null, pre_interval);
    var pre_min = Math.min.apply(null, pre_interval);
    var ten_percent = (pre_max-pre_min)*0.1;
    
    pre_max = pre_max + ten_percent;
    pre_min = pre_min - ten_percent;
    
    if((!isNumeric(pre_max)) || (!isNumeric(pre_min))) {
        pre_max = pre_max_old;
        pre_min = pre_min_old;
    }
    
    yrange.set('start', pre_min);
    yrange.set('end', pre_max);

    source.trigger('change');
    """

fig.y_range.callback = CustomJS(
        args=dict(source=source, yrange=fig.y_range, xrange=fig.x_range), code=jscode)
fig.x_range.callback = CustomJS(
        args=dict(source=source, yrange=fig.y_range, xrange=fig.x_range), code=jscode)

show(fig)


Out[7]:

<Bokeh Notebook handle for In[7]>

We see that the automatic range update with datetime representation is also possible now. Since the range update is performed on the client side via JavaScript, a possibility to improve the eventual performance issue (not experienced on my PC) will be to use the JS array.some function to stop the search for the index in the time array as soon as one entry is found.

If we assume, that there is a constant time interval, we can calculate the index with the differences very fast. We can also use this approach to estimate the approximate index position in the array to reduce the search time.