Bokeh Plotting Tutorial

This notebook will walk you through the process of importing a data set, wrangling the data into the desired shape, and plotting them with Bokeh. It will also demonstrate some basic styling commands.

Modules

First let's gather our data analytics modules. Combining Python's numpy and pandas packages provides us with powerful and flexible methods to reference and reshape our data.


In [1]:
import numpy as np
import pandas as pd

Now let's import our plotting tool: Bokeh. The following import convention makes it easy to reference Bokeh tools in our namespace.


In [2]:
from bokeh.plotting import *

Now for some data. This tutorial will attempt to recreate a particular Variance Chart example, which lends some insight into the global temperature trend for the past century. The monthly data were accessed from this URL.

A key feature of Python data analytics is the ability to iteratively interact with your source data, dynamically reshaping and cleaning data. So whereas Variance Charts requires independent data files in a precise format for each of their plotting layers (monthly data, annual mean, and five year mean), we can perform numerical operations to derive the second and third categories from the first. And now with the introduction of Bokeh, we can enjoy an end-to-end analytics workflow in one programming language on one platform.

But talk is cheap. The rest of this tutorial will focus on demonstrating a common Bokeh use case: plotting scatter data and a couple of derived lines.

Import Data


In [3]:
a = pd.read_csv('global_temp_monthly_recordings.csv', index_col=0, header=0)

That was easy!

A perk of being in an IPython Notebook environment: we can quickly get pretty-printed views into our data. For example, let's take a look at the top few rows in this data.


In [4]:
a.head()


Out[4]:
jan feb mar apr may jun jul aug sep oct nov dec
year
1880 -0.33 -0.26 -0.21 -0.29 -0.16 -0.23 -0.18 -0.11 -0.19 -0.18 -0.16 -0.21
1881 -0.12 -0.14 0.00 -0.01 -0.01 -0.24 -0.10 -0.05 -0.16 -0.21 -0.26 -0.16
1882 0.00 0.04 0.06 0.00 -0.22 -0.18 -0.30 -0.25 -0.09 -0.10 -0.24 -0.23
1883 -0.37 -0.36 -0.11 -0.18 -0.19 -0.06 -0.01 -0.12 -0.18 -0.18 -0.27 -0.18
1884 -0.18 -0.12 -0.29 -0.34 -0.32 -0.34 -0.30 -0.23 -0.27 -0.24 -0.27 -0.24

5 rows × 12 columns

Pretty nifty! Now let's wrangle this data into a usable format. First thing we'll do is stack the data.


In [5]:
b = a.stack()
b.head()


Out[5]:
year     
1880  jan   -0.33
      feb   -0.26
      mar   -0.21
      apr   -0.29
      may   -0.16
dtype: float64

Now we can see that pandas has automatically collapsed the DataFrame into a Series, to better represent the new 1-dimensional data. The Series has a MultiIndex, which is currently missing a title for the months, so let's add that in.


In [6]:
b.index.set_names(['year', 'month'], inplace=1)
b.head()


Out[6]:
year  month
1880  jan     -0.33
      feb     -0.26
      mar     -0.21
      apr     -0.29
      may     -0.16
dtype: float64

Bokeh supports a datetime axis, so let's format our data a little bit.

First we'll reset the index, so that the year and month become data columns that we can more easily access.


In [7]:
c = b.reset_index()
c.head()


Out[7]:
year month 0
0 1880 jan -0.33
1 1880 feb -0.26
2 1880 mar -0.21
3 1880 apr -0.29
4 1880 may -0.16

5 rows × 3 columns

And let's add a header for our data column, so we can keep track of it with a label.


In [8]:
c.rename(columns={0:'temp_delta'}, inplace=1)
c.head()


Out[8]:
year month temp_delta
0 1880 jan -0.33
1 1880 feb -0.26
2 1880 mar -0.21
3 1880 apr -0.29
4 1880 may -0.16

5 rows × 3 columns

Now for some magic: pandas has a class method to_datetime() which returns a Series; we're going to assign that to a new column in our DataFrame.


In [9]:
c[['year','month']] = c[['year', 'month']].astype(str) # A little type coercion so our to_datetime() call won't complain
c['date'] = pd.Series(pd.to_datetime(c['year'] + c['month'], format="%Y%b"))

We no longer need the year and month columns, so we can drop them from the DataFrame.


In [10]:
c.drop(['year', 'month'], inplace=1, axis=1)
c.head()


Out[10]:
temp_delta date
0 -0.33 1880-01-01
1 -0.26 1880-02-01
2 -0.21 1880-03-01
3 -0.29 1880-04-01
4 -0.16 1880-05-01

5 rows × 2 columns

Great; we now have the data we want in the format we want: datetime objects for each month and a corresponding temperature delta. It's time to visualize!

Plotting

First let's specify the output format: we want to display the plots directly in the notebook!


In [11]:
output_notebook()


Bokeh Plot

Configuring embedded BokehJS mode.

(That previous call should return with a jaunty response of "Configuring embedded BokehJS mode.")

Let's begin plotting: our first layer will be a scatter plot of the monthly temperature deltas. Below you can see the scatter() command, which accepts data in several formats. Here we're passing in the date and temp_delta columns from our DataFrame, declaring the x_axis_type to be "datetime", setting a legend, and selecting which tools we want available to interact with the plot.


In [12]:
months = c['date']                  # Refer to the 'date' column of the DataFrame as 'months'
monthly_data = c['temp_delta']      # Refer to the 'temp_delta' column of the DataFrame as 'monthly_data'

scatter(
    months,                                             # X coordinates
    monthly_data,                                    # Y coordinates
    x_axis_type = "datetime",
    legend='Temperature Delta (montly)',
    tools="pan,wheel_zoom,box_zoom,reset,previewsave"   # Declare available plot tools
)


Out[12]:
<bokeh.objects.Plot at 0x1037c5810>

There's our Plot instance! We can render it directly in the notebook with a show() command:


In [13]:
show()


Bokeh Plot
Plots

From the above plot you should see that Bokeh has automatically determined the bounds based on the data, with appropriately scaled tick marks.

Next let's explore some common modifications. First we will declare a new figure() with an increased plot width, and call hold() so that new renderers are added to the same plot.


In [14]:
figure(plot_width=1000)
hold()

Now let's get a view into average annual temperature data.

We want the mean temperature delta per year, which involves averaging across twelve months for each year. (This is not entirely accurate, given that we are averaging across months of unequal size, but it is accurate enough for the purpose of demonstration).

Lucky for us, pandas has a mean() function that will let us average across an index level:


In [15]:
annual = b.mean(level=0)
annual.head()


Out[15]:
year
1880   -0.209167
1881   -0.121667
1882   -0.125833
1883   -0.184167
1884   -0.261667
dtype: float64

The fellows over at Variance decided to align their monthly data along the same vertical axis for each year—this helps clean up the presentation a little bit at the slight cost of horizontal accuracy. We can handily reproduce this view with a couple lines of Python: first we convert the annual index values to Timestamps, then we expand the list by a factor of 12.


In [16]:
years = pd.Series(pd.to_datetime(annual.index.values, format="%Y"))
years_expanded = np.array([12*[x] for x in years]).flatten()
years_expanded[0:20]


Out[16]:
array([Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1880-01-01 00:00:00', tz=None),
       Timestamp('1881-01-01 00:00:00', tz=None),
       Timestamp('1881-01-01 00:00:00', tz=None),
       Timestamp('1881-01-01 00:00:00', tz=None),
       Timestamp('1881-01-01 00:00:00', tz=None),
       Timestamp('1881-01-01 00:00:00', tz=None),
       Timestamp('1881-01-01 00:00:00', tz=None),
       Timestamp('1881-01-01 00:00:00', tz=None),
       Timestamp('1881-01-01 00:00:00', tz=None)], dtype=object)

You can see above that each Timestamp is repeated 12 times; monthly_data will now render vertically aligned by year.

Now let's call our scatter() method again with a few tweaks.


In [17]:
# Development for Bokeh's datetime axis support is still ongoing,
#   so in the meantime we have to set the radius by milliseconds.

MS_IN_YEAR = 31556952000    # Number of milliseconds in a year
                            # This lets us scale along the x-axis (datetime) in units of years

scatter(
    years_expanded,
    monthly_data,
    color='#BBBBBB',
    line_alpha=0.2,
    fill_alpha=0.5,
    radius=MS_IN_YEAR/12*4,     # Workaround; radius of points should be a third of a year (four months)
    legend='Monthly Mean',
    x_axis_type = "datetime",
    tools="pan,wheel_zoom,box_zoom,reset,previewsave"
)


Out[17]:
<bokeh.objects.Plot at 0x1037d1290>

Perfect! Now let's add our annual scatter and line renderers to our plot as well.


In [18]:
# Scatter points of annual temperature deltas (calculated)

scatter(
    years,
    annual,
    color='#464678',
    alpha=1,
    radius=MS_IN_YEAR/12*4,
)

# Line to connect scatter points; refers to same data sources (years, annual)

line(
    years,
    annual,
    color='#464678',
    alpha=1,
    radius=MS_IN_YEAR/12*4,
    legend='Annual Mean'
)

show()


Bokeh Plot
Plots

This is a good start, and it's quickly beginning to resemble the plot we are targeting, but here we will diverge a bit from the Variance chart.

Instead of plotting a line of the five year mean, we will use Kaiser window smoothing with the help of numpy's numerical convolution.


In [19]:
# This technique is heavily derived from from glowingpython.blogspot.com/2012/02/convolution-with-numpy.html

def smooth(x, beta):
    """ Smoothing with Kaiser window function """
    # Set window length to five years (5 * 12 months)
    window_len = 60
    # Extend the data at beginning and at the end to apply the window at the borders
    s = np.r_[x[window_len-1:0:-1],x,x[-1:-window_len:-1]]
    w = np.kaiser(window_len, beta)
    y = np.convolve(w/w.sum(),s,mode='valid')
    return y[window_len:len(y)-window_len]

We can pass that function an array of data and specify a beta factor, and it will return a smoothed array.


In [20]:
smoothed = smooth(monthly_data, 2)

Let's put it all together now.

First we declare a new figure, turn on hold(), and draw our first four renderers: monthly scatter data, annual scatter points (calculated), a line to connect those points, and our Kaiser smoothed line.


In [21]:
figure(plot_width=1000)
hold()

scatter(
    years_expanded,
    monthly_data,
    color='#BBBBBB', # 
    line_alpha=0.2,
    fill_alpha=0.5,# 
    radius=MS_IN_YEAR/12*4,     # Workaround; radius of points should be a third of a year (four months)
    legend='Monthly Mean',
    x_axis_type = "datetime",
    background_fill='#F7F7EF',
    tools="pan,wheel_zoom,box_zoom,reset,previewsave"
)

line(
    years,
    annual,
    color='#464678',
    alpha=1,
    radius=MS_IN_YEAR/12*4,
    legend='Annual Mean'
)

scatter(
    years,
    annual,
    color='#464678',
    alpha=1,
    radius=MS_IN_YEAR/12*4,
)

line(
    months,
    smoothed,
    color='#F13239',
    line_width=2,
    legend='Kaiser Smoothed',
    x_axis_type = "datetime",
    tools="pan,wheel_zoom,box_zoom,reset,previewsave"
)


Out[21]:
<bokeh.objects.Plot at 0x1037e2dd0>

Let's also add in the text for completeness.


In [22]:
text([0],                               # X baseline: 0 milliseconds from start of UNIX time = 1970 C.E.
     [-0.5],                            # Y baseline: -0.5 C° 
     "Global Surface Temperature",
     text_font='helvetica',
     text_font_style='bold',
     text_color='#3A3A3A',
     angle=0)

text([0, 0],[-0.6, -0.67],                                # List of x- and y-index positions
     ["Change in global surface temperature relative",
     "to 1951-1980 average temperatures."],
     text_font_size='10pt',
     text_font='helvetica neue',
     text_color='#3A3A3A',
     angle=0)


Out[22]:
<bokeh.objects.Plot at 0x1037e2dd0>

With our renderers declared, let's apply some more advanced plot styling.


In [23]:
legend().orientation = 'top_left'       # Put the legend in an empty corner

# Set axis line, label, and tick color
gray = "#DDDDDD"
axis().axis_line_color = gray
axis().axis_label_color = gray
axis().axis_major_tick_label_color = gray

yaxis().location = 'right'              # Move y-axis to the right side of the plot
yaxis().bounds = (-1, 1)                # Set y-axis bounds to -1..1 C°

# As Bokeh is still under development, we do not yet have an intelligent way to
#   dynamically add units to axis labels. Instead we will give the y-axis a descriptive label.
yaxis().axis_label = "Temperature Delta (C°)"

ygrid().grid_line_color = gray
ygrid().grid_line_dash = "2 2"          # Set line dash pattern (2px on, 2px off)
xgrid().grid_line_color = None          # Disable x-grid lines

Our renderers have been declared and the plot styles have been applied; now let's show() the plot!


In [24]:
show()


Bokeh Plot
Plots

We hope this Notebook has showcased the flexibility and capability of Bokeh.

If you would like to keep up with the development of Bokeh, star us on GitHub and follow us on Twitter for the latest news. We are always welcome to comments and suggestions! Happy data surfing and visualization.