Bokeh for Smarties

Creating interactive plots for the web

In this notebook, we will walk through the Bokeh plotting software, and specifically focus on creating a service to deliver interactive widgets to users on the Internet.

The first part of the notebook I will just be demonstrating the basics of Bokeh plotting. The experience is very similar to matplotlib's pyplot, but there are differences in the way plots are created that I'll note as we go along.


By J Guillochon (Harvard)

What is Bokeh, and why should I use it?

Bokeh is a Python package that generates interactive plots for notebooks and web browsers. The package is powered by JavaScript in the browser, but the user mostly uses Python to generate plots (although Javascript is used for some things).

Bokeh is great for sharing your datasets in an accessible way with your peers. You can use Bokeh in one of two ways:

  • To generate standalone plots where all of the data is embedded in an HTML file.
  • To create interfaces for data that are served dynamically to the user.

Questions for the audience:

  • Do you have a personal website where you can upload HTML files?
  • Do you have a personal server where you run your own applications?
  • Have you ever used an interactive plotting package before?
  • Have you ever programmed in Javascript before?

Bokeh alternatives

Bokeh is certainly not the only option for interactive web plots (there is an ever-growing list of competitors, many of which are also free). These include (full list here: https://alternativeto.net/software/bokeh/?license=free):

  • D3.js
  • Matplotlib
  • Plotly
  • Google Charts

Installing Bokeh

Please install bokeh using conda install bokeh.

Note: if you are using Jupyter Lab, you need to install the Bokeh extension: jupyter labextension install jupyterlab_bokeh

Basic structure of Bokeh

At first we're going to do all our work in Jupyter. We need to tell Bokeh we want to do this with output_notebook.


In [55]:
import numpy as np
# from six.moves import zip

from bokeh.plotting import figure, show, output_notebook, output_file, reset_output

reset_output()
output_notebook()

# Disable "retina" line below if your monitor doesn't support it.
%matplotlib inline
%config InlineBackend.figure_format = 'retina'


Loading BokehJS ...

Let's download some sample data. You only need to run this cell once.


In [ ]:
import bokeh

bokeh.sampledata.download()

Now let's generate some random data to play with.


In [48]:
N = 400

x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 3.0

Let's make a scatter plot of this data. We're going to turn on many of the interactive tools in this first example. Take a few minutes to play with each of the tools!


In [49]:
TOOLS="hover,crosshair,pan,wheel_zoom,box_zoom,reset,tap,save,box_select,poly_select,lasso_select"

colors2 = ["#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(50+2*x, 30+2*y)]
p1 = figure(width=500, height=300, tools=TOOLS)
p1.scatter(x, y, radius=radii, fill_color=colors2, fill_alpha=0.6, line_color=None)

show(p1)


Now, let's output the above to an HTML file:


In [52]:
from bokeh.io import reset_output, output_file

reset_output()
output_file('test.html')

colors2 = ["#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(50+2*x, 30+2*y)]
p1 = figure(width=500, height=300, tools=TOOLS)
p1.scatter(x, y, radius=radii, fill_color=colors2, fill_alpha=0.6, line_color=None)

show(p1)

Problem 1) Inspecting the Bokeh output

Open test.html produced by the above code in your favorite text editor. Talk to your neighbors about what you see.

Can we output PDFs?

Right off the bat you might be asking: can I export this for use in a scientific paper? The answer is sadly not all good news. You can export relatively easily to svg format with a couple of support libraries: selenium, phantomjs, and pillow.

But! LaTeX does not support svg figures easily! There is an svg package out there but it's not on Overleaf and requires Inkscape, a huge software package. The svg package also conflicts with some commonly used templates (e.g. aastex).

At the moment, the best option is to generate rasterized PNG files if you wish to include Bokeh plots in a paper. This can be done simply enough if you install the above packages:


In [ ]:
from bokeh.io import export_png

export_png(p1, 'test.png')

...but frankly, it's much simpler to just take a screenshot!

How I've used Bokeh

I've primarily used Bokeh for the Open Astronomy Catalogs. I will describe three examples:

https://sne.space/sne/SN2005gj/ (photometry, spectra browser)

https://faststars.space/sky-locations/ (metadata browser on a Hammer projection)

https://sne.space/catexplorer (Bokeh server that plots any supernova's light curve on request)

http://ashleyvillar.com/dlps (Created by Ashley Villar to show Zwicky diagram for her transients)

Cool Bokeh Examples

In the cells below, I've picked some example plots made in Bokeh that are unique and demonstrate how powerful it can be as a data visualization tool. While I go through these examples, I want you to think about your own data and how a particular plot could be used in your data's context.

First up, density plots with hexagonal bins:


In [5]:
import numpy as np

from bokeh.io import output_file, show, reset_output
from bokeh.plotting import figure
from bokeh.transform import linear_cmap
from bokeh.util.hex import hexbin

reset_output()
output_notebook()

n = 50000
x = np.random.standard_normal(n)
y = np.random.standard_normal(n)

bins = hexbin(x, y, 0.1)

p = figure(tools="wheel_zoom,reset,tap", match_aspect=True, background_fill_color='#440154')
p.grid.visible = False

p.hex_tile(q="q", r="r", size=0.1, line_color=None, source=bins,
           fill_color=linear_cmap('counts', 'Viridis256', 0, max(bins.counts)))

show(p)


Images can be seamlessly displayed (with transparency):


In [12]:
from __future__ import division

import numpy as np

from bokeh.plotting import figure, output_file, show

# create an array of RGBA data
N = 500
img = np.empty((N, N), dtype=np.uint32)
view = img.view(dtype=np.uint8).reshape((N, N, 4))
for i in range(N):
    for j in range(N):
        view[i, j, 0] = int(255 * i / N)
        view[i, j, 1] = 158
        view[i, j, 2] = int(255 * j / N)
        view[i, j, 3] = int(255 * j / N)

p = figure(plot_width=400, plot_height=400, x_range=(0, 10), y_range=(0, 10))

p.image_rgba(image=[img], x=[0], y=[0], dw=[10], dh=[10])

show(p)


Problem 2) Displaying Images

Either from the data you've brought, or if you've brought no data, using this image (https://apod.nasa.gov/apod/image/1804/AmericanEclipseHDR_Lefaudeux_1080.jpg), display the image in a Bokeh environment.

We can link two plots to the same data, which may be of high dimension:


In [18]:
from bokeh.io import output_file, show
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure

x = list(range(-20, 21))
y0 = [abs(xx) for xx in x]
y1 = [xx**2 for xx in x]

# create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "box_select,lasso_select,help,reset"

# create a new plot and add a renderer
left = figure(tools=TOOLS, plot_width=300, plot_height=300, title=None)
left.circle('x', 'y0', source=source)

# create another new plot and add a renderer
right = figure(tools=TOOLS, plot_width=300, plot_height=300, title=None)
right.circle('x', 'y1', source=source)

p = gridplot([[left, right]])

show(p)


Below demonstrates interactive highlighting of data when the user hovers their cursor:


In [16]:
from bokeh.plotting import figure, output_file, show
from bokeh.models import HoverTool
from bokeh.sampledata.glucose import data

subset = data.loc['2010-10-06']

x, y = subset.index.to_series(), subset['glucose']

# Basic plot setup
plot = figure(plot_width=600, plot_height=300, x_axis_type="datetime", tools="",
              toolbar_location=None, title='Hover over points')

plot.line(x, y, line_dash="4 4", line_width=1, color='gray')

cr = plot.circle(x, y, size=20,
                fill_color="grey", hover_fill_color="firebrick",
                fill_alpha=0.05, hover_alpha=0.3,
                line_color=None, hover_line_color="white")

plot.add_tools(HoverTool(tooltips=None, renderers=[cr], mode='hline'))

show(plot)



In [38]:
from bokeh.plotting import figure, output_file, show, ColumnDataSource
from bokeh.models import HoverTool

source = ColumnDataSource(data=dict(
    x=[1, 2, 3, 4, 5],
    y=[2, 5, 8, 2, 7],
    desc=['A', 'b', 'C', 'd', 'E'],
    imgs=[
        'https://bokeh.pydata.org/static/snake.jpg',
        'https://bokeh.pydata.org/static/snake2.png',
        'https://bokeh.pydata.org/static/snake3D.png',
        'https://bokeh.pydata.org/static/snake4_TheRevenge.png',
        'https://bokeh.pydata.org/static/snakebite.jpg'
    ],
    fonts=[
        '<i>italics</i>',
        '<pre>pre</pre>',
        '<b>bold</b>',
        '<small>small</small>',
        '<del>del</del>'
    ]
))

hover = HoverTool( tooltips="""
    <div>
        <div>
            <img
                src="@imgs" height="42" alt="@imgs" width="42"
                style="float: left; margin: 0px 15px 15px 0px;"
                border="2"
            ></img>
        </div>
        <div>
            <span style="font-size: 17px; font-weight: bold;">@desc</span>
            <span style="font-size: 15px; color: #966;">[$index]</span>
        </div>
        <div>
            <span>@fonts{safe}</span>
        </div>
        <div>
            <span style="font-size: 15px;">Location</span>
            <span style="font-size: 10px; color: #696;">($x, $y)</span>
        </div>
    </div>
    """
)

p = figure(plot_width=400, plot_height=400, tools=[hover],
           title="Mouse over the dots")

p.circle('x', 'y', size=20, source=source)

show(p)


Problem 3) Displaying high-dimensional data

Create/use a dataset with more than two dimensions to annotate a 2D plot (two of the dimensions should show in the plot, the rest in the tooltip).


In [23]:
from bokeh.io import output_file, show
from bokeh.layouts import gridplot
from bokeh.plotting import figure

x = list(range(11))
y0 = x
y1 = [10-xx for xx in x]
y2 = [abs(xx-5) for xx in x]

# create a new plot
s1 = figure(plot_width=250, plot_height=250, title=None)
s1.circle(x, y0, size=10, color="navy", alpha=0.5)

# create a new plot and share both ranges
s2 = figure(plot_width=250, plot_height=250, x_range=s1.x_range, y_range=s1.y_range, title=None)
s2.triangle(x, y1, size=10, color="firebrick", alpha=0.5)

# create a new plot and share only one range
s3 = figure(plot_width=250, plot_height=250, x_range=s1.x_range, title=None)
s3.square(x, y2, size=10, color="olive", alpha=0.5)

p = gridplot([[s1, s2, s3]], toolbar_location=None)

# show the results
show(p)


Problem 4) Linking plots

Create a plot with your own data where at least one of the two axes are linked with a neighboring plot.


In [56]:
import numpy as np

from bokeh.layouts import row, widgetbox
from bokeh.models import CustomJS, Slider
from bokeh.plotting import figure, output_file, show, ColumnDataSource

x = np.linspace(0, 10, 500)
y = np.sin(x)

source = ColumnDataSource(data=dict(x=x, y=y))

plot = figure(y_range=(-10, 10), plot_width=400, plot_height=400)

plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)

callback = CustomJS(args=dict(source=source), code="""
    var data = source.data;
    var A = amp.value;
    var k = freq.value;
    var phi = phase.value;
    var B = offset.value;
    x = data['x']
    y = data['y']
    for (i = 0; i < x.length; i++) {
        y[i] = B + A*Math.sin(k*x[i]+phi);
    }
    source.change.emit();
""")

amp_slider = Slider(start=0.1, end=10, value=1, step=.1,
                    title="Amplitude", callback=callback)
callback.args["amp"] = amp_slider

freq_slider = Slider(start=0.1, end=10, value=1, step=.1,
                     title="Frequency", callback=callback)
callback.args["freq"] = freq_slider

phase_slider = Slider(start=0, end=6.4, value=0, step=.1,
                      title="Phase", callback=callback)
callback.args["phase"] = phase_slider

offset_slider = Slider(start=-5, end=5, value=0, step=.1,
                       title="Offset", callback=callback)
callback.args["offset"] = offset_slider

layout = row(
    plot,
    widgetbox(amp_slider, freq_slider, phase_slider, offset_slider),
)

# output_file("slider.html", title="slider.py example")

show(layout)



In [57]:
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, HoverTool, CustomJS

# define some points and a little graph between them
x = [2, 3, 5, 6, 8, 7]
y = [6, 4, 3, 8, 7, 5]
links = {
    0: [1, 2],
    1: [0, 3, 4],
    2: [0, 5],
    3: [1, 4],
    4: [1, 3],
    5: [2, 3, 4]
}

p = figure(plot_width=400, plot_height=400, tools="", toolbar_location=None, title='Hover over points')

source = ColumnDataSource({'x0': [], 'y0': [], 'x1': [], 'y1': []})
sr = p.segment(x0='x0', y0='y0', x1='x1', y1='y1', color='olive', alpha=0.6, line_width=3, source=source, )
cr = p.circle(x, y, color='olive', size=30, alpha=0.4, hover_color='olive', hover_alpha=1.0)

# Add a hover tool, that sets the link data for a hovered circle
code = """
var links = %s;
var data = {'x0': [], 'y0': [], 'x1': [], 'y1': []};
var cdata = circle.data;
var indices = cb_data.index['1d'].indices;
for (i=0; i < indices.length; i++) {
    ind0 = indices[i]
    for (j=0; j < links[ind0].length; j++) {
        ind1 = links[ind0][j];
        data['x0'].push(cdata.x[ind0]);
        data['y0'].push(cdata.y[ind0]);
        data['x1'].push(cdata.x[ind1]);
        data['y1'].push(cdata.y[ind1]);
    }
}
segment.data = data;
""" % links

callback = CustomJS(args={'circle': cr.data_source, 'segment': sr.data_source}, code=code)
p.add_tools(HoverTool(tooltips=None, callback=callback, renderers=[cr]))

show(p)


Problem 5) Interactive plots without a server

With your data, create an interactive plot with at least one control to manipulate the plot output.

Useful controls to consider include (see https://bokeh.pydata.org/en/latest/docs/user_guide/interaction/widgets.html):

  • Sliders (as shown above)
  • Buttons
  • Checkboxes
  • Data tables
  • Text input

In [27]:
import pandas as pd

from bokeh.palettes import Spectral4
from bokeh.plotting import figure, output_file, show
from bokeh.sampledata.stocks import AAPL, IBM, MSFT, GOOG

p = figure(plot_width=800, plot_height=250, x_axis_type="datetime")
p.title.text = 'Click on legend entries to hide the corresponding lines'

for data, name, color in zip([AAPL, IBM, MSFT, GOOG], ["AAPL", "IBM", "MSFT", "GOOG"], Spectral4):
    df = pd.DataFrame(data)
    df['date'] = pd.to_datetime(df['date'])
    p.line(df['date'], df['close'], line_width=2, color=color, alpha=0.8, legend=name)

p.legend.location = "top_left"
p.legend.click_policy="hide"

show(p)


Breakout

Create a plot of the data you brought with you and output it to HTML using the output_file() function in Bokeh. On Slack, send this file to a person sitting near you, without telling them what the data is. Let them play with your plot for a few minutes and see if they can guess what you are trying to show with the plot. So things aren't too easy, do not label your axes, only use the ticks/tooltips to describe the data.

Running Bokeh as a server

Running Bokeh in server mode means you do not have to ship all the data to the user immediately when they open the page, instead the data is delivered to them on demand. Since the Bokeh server is written in Python, this means the event handlers that return data back to the user can also be written in Python (no javascript).

Use the bokeh serve command to run the server example by executing:

bokeh serve sliders.py

at your command prompt. Then navigate to the URL http://localhost:5006/sliders.

The code below will not execute in Jupyter as it is intended to run in a server environment.


In [ ]:
import numpy as np

from bokeh.io import curdoc
from bokeh.layouts import row, widgetbox
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import Slider, TextInput
from bokeh.plotting import figure

# Set up data
N = 200
x = np.linspace(0, 4*np.pi, N)
y = np.sin(x)
source = ColumnDataSource(data=dict(x=x, y=y))


# Set up plot
plot = figure(plot_height=400, plot_width=400, title="my sine wave",
              tools="crosshair,pan,reset,save,wheel_zoom",
              x_range=[0, 4*np.pi], y_range=[-2.5, 2.5])

plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)


# Set up widgets
text = TextInput(title="title", value='my sine wave')
offset = Slider(title="offset", value=0.0, start=-5.0, end=5.0, step=0.1)
amplitude = Slider(title="amplitude", value=1.0, start=-5.0, end=5.0, step=0.1)
phase = Slider(title="phase", value=0.0, start=0.0, end=2*np.pi)
freq = Slider(title="frequency", value=1.0, start=0.1, end=5.1, step=0.1)


# Set up callbacks
def update_title(attrname, old, new):
    plot.title.text = text.value

text.on_change('value', update_title)

def update_data(attrname, old, new):

    # Get the current slider values
    a = amplitude.value
    b = offset.value
    w = phase.value
    k = freq.value

    # Generate the new curve
    x = np.linspace(0, 4*np.pi, N)
    y = a*np.sin(k*x + w) + b

    source.data = dict(x=x, y=y)

for w in [offset, amplitude, phase, freq]:
    w.on_change('value', update_data)


# Set up layouts and add to document
inputs = widgetbox(text, offset, amplitude, phase, freq)

curdoc().add_root(row(inputs, plot, width=800))
curdoc().title = "Sliders"

Challenge problem

Implement a Bokeh server on your own computer where none of the data is loaded apriori (e.g. the DLPS example above).