Bokeh is a Python interactive visualization library that targets modern web browsers for presentation.
We illustrate the approach of graphing with Bokeh through 2 examples:
This IPython notebook was created by Zhiqi Guo, Yiran Zheng, Jiamin Zhang as final project for the NYU Stern course Data Bootcamp.
First of all, let's follow our tradition in class and import packages. The following code is from IPython notebook: Data Bootcamp: Examples created by Professor Dave Backus, Chase Coleman, and Spencer Lyon for the NYU Stern course Data Bootcamp.
In [1]:
# import packages
import pandas as pd # data management
import matplotlib.pyplot as plt # graphics
import matplotlib as mpl # graphics parameters
import numpy as np # numerical calculations
# IPython command, puts plots in notebook
%matplotlib inline
# check Python version
import datetime as dt
import sys
print('Today is', dt.date.today())
print('What version of Python are we running? \n', sys.version, sep='')
Data from the UN's Population Division. Remember one of Professor's favorite quotes?
Last year, for the first time, sales of adult diapers in Japan exceeded those for babies.
Now let's take a look at the data again.
In [2]:
url1 = 'http://esa.un.org/unpd/wpp/DVD/Files/'
url2 = '1_Indicators%20(Standard)/EXCEL_FILES/1_Population/'
url3 = 'WPP2015_POP_F07_1_POPULATION_BY_AGE_BOTH_SEXES.XLS'
url = url1 + url2 + url3
cols = [2, 4, 5] + list(range(6,28))
#est = pd.read_excel(url, sheetname=0, skiprows=16, parse_cols=cols, na_values=['…'])
prj = pd.read_excel(url, sheetname=1, skiprows=16, parse_cols=cols, na_values=['…'])
"""
for later: change cols for the two sources, rename 80+ to 80-84, then concat
#pop = pd.concat([est, prj], axis=0, join='outer')
"""
pop = prj
pop.dtypes
Out[2]:
In [3]:
# rename some variables
pop = pop.rename(columns={'Reference date (as of 1 July)': 'Year',
'Major area, region, country or area *': 'Country',
'Country code': 'Code'})
# select Japan and years
countries = ['Japan']
years = [2015, 2025, 2035, 2045, 2055, 2065]
pop = pop[pop['Country'].isin(countries) & pop['Year'].isin(years)]
pop = pop.drop(['Country', 'Code'], axis=1)
pop = pop.set_index('Year').T
pop
Out[3]:
In [4]:
fig, ax = plt.subplots()
pop[2015].plot(ax=ax,kind='line',alpha=0.5, sharey=True, figsize=(6,4))
ax.set_title('2015 Japanese population by age', fontsize=14, loc='left')
Out[4]:
In [5]:
from bokeh import mpl
from bokeh.plotting import output_file, show,figure
fig, ax = plt.subplots()
ax = pop[2015].plot(kind='line')#,alpha=0.5)#, sharey=True, figsize=(6,4))
ax.set_title('2015 Japanese population by age', fontsize=14, loc='left')
output_file('JPN.html') #Get a plot in HTML file
show(mpl.to_bokeh(fig))
Question. What's the difference between the plots generated by two pacakages?
Question.What is the function of each button on the control panel at top of the Bokeh plot?
Comment. From the plot we can see that leveraging other library may not be the best way to graphing in Bokeh, due to the compatibility problem. The reason why we don't use bar chart here is that bar in matplotlib is fully incompatble with Bokeh.
In [ ]:
In [ ]:
Let's make a simple dataframe to test out the functions first.
In [6]:
from bokeh.charts import Bar, output_file, show
def simple_bar():
#Here we first set up a easy data frame to use
#Best support is with data in a format that is table-like
data = {
'sample': ['A','B'],
'value': [40,30]
}
df = pd.DataFrame(data)
# set up the title, x-axis and y-axis
bar = Bar(df,
'sample',
values='value',
bar_width=0.4, #we can manipulate width of bar manually
title="Our first test bar chart")
output_file("Simpe_test_bar.html")
print(df) #Here we print out df to see the plot of bar chart plot with a dataframe
show(bar)
simple_bar()
In [ ]:
In [ ]:
In [7]:
from bokeh.charts import Bar, output_file, show
from bokeh.plotting import * #Here the line from bokeh.plotting import * implicitly pulls
#the output_notebook function into the namespace.
#Here we first set up a easy data frame to use
#Best support is with data in a format that is table-like
data = {
'sample': ['A','B'],
'value': [40,30]
}
df = pd.DataFrame(data)
# set up the title, x-axis and y-axis
bar = Bar(df,
'sample',
values='value',
bar_width=0.4,
title="Our First Test Bar Chart"
#,tools='crosshair'
)
output_notebook()#Here,instead of calling output_file(),
#we call output_notebook() to directly display plot in notebook
show(bar)
Out[7]:
Exercise. Uncomment the "tools='crosshair'" attribute of the bar plot to see what happens. Find out more tools that we can use.
In [ ]:
In [8]:
#The function barplot will give the population's bar plot for the year we choose
#We can choose the year in 2015,2025,2035,2045,2055,2065
from bokeh.charts import Bar, output_file, show
from bokeh.charts.attributes import CatAttr
def barplot(choose_year):
population = pop[int(choose_year)].tolist()
year = list(pop.index)
data = {
'year': year,
'population':population
}
df = pd.DataFrame(data)
bar = Bar(df,
label=CatAttr(columns=["year"], sort=False), #Caution:we have to manually turn off the bar sorting
#or the bar plot desn't follow
values='population',
ylabel="Population(thousands)",
title="Japan's populaiton in " + str(choose_year)+" by age", color="red")
output_file("Japan's populaiton in " + str(choose_year))
show(bar)
return bar
#Try several years and see how it works
barplot(2015)
#barplot(2025)
#barplot(2035)
Out[8]:
In [ ]:
In [ ]:
In [9]:
from bokeh.models.widgets import Panel, Tabs
from bokeh.io import output_file, show
from bokeh.plotting import figure
output_file("tab_panes.html", mode='cdn')
p1 = figure(plot_width=300, plot_height=300)
p1 = barplot(2015)
tab1 = Panel(child=p1, title="Japan's Populaiton for 2015") #tab1 for year 2015
p2 = figure(plot_width=300, plot_height=300)
p2 = barplot(2025)
tab2 = Panel(child=p2, title="Japan's Populaiton for 2025") #tab2 for year 2025
p2 = figure(plot_width=300, plot_height=300)
p2 = barplot(2035)
tab3 = Panel(child=p2, title="Japan's Populaiton for 2035") #tab3 for year 2035
tabs = Tabs(tabs=[ tab1, tab2, tab3 ]) # create different tabs
show(tabs)
Out[9]:
In [ ]:
In [ ]:
This is a challenging example which involves more widgets that we can play with. Some of the widgets requires a CustomJS callback. If you have a nice command of javascript, then go for it. If you don't, just skip it. These interactions can also be done by using Bokeh Server. We didn't cover it here, but if you are interested in it, you can get tutorials on Bokeh's website.
In [10]:
import numpy as np
from bokeh.plotting import Figure
from bokeh.models import ColumnDataSource, HoverTool, HBox, VBoxForm
from bokeh.models.widgets import Slider, Select, TextInput
movies = pd.read_csv("movies.csv")
In [11]:
movies
Out[11]:
In [ ]:
In [12]:
from bokeh.plotting import figure, output_notebook, show
output_notebook()
p = figure(plot_width=400, plot_height=400)
# add a circle renderer with a size, color, and alpha
p.quad(movies["Meter"], movies["Reviews"], color="navy", alpha=0.5)
# show the results
show(p)
Out[12]:
Excersice. Try some other methods (glyphs) such as asterisk, dimond, circle_cross etc.
Comment. See more methods of plotting on Bokeh Reference Guide.
The Bokeh [slider] is a Bokeh widget which(http://bokeh.pydata.org/en/0.10.0/docs/user_guide/interaction.html#slider) can be configured with start and end values, a step size, an initial value and a title.
Before we make a basic slider, we should import some packages first.
In [13]:
from bokeh.models.widgets import Slider
from bokeh.io import vform
In [14]:
#Lets try make a slider of the reviews
reviews = Slider(title="Minimum number of reviews",
value=80, #initial value when the slider is generated
start=10,
end=300,
step=10)
show(vform(reviews))
Out[14]:
In [15]:
movies['Year'].describe()
Out[15]:
In [16]:
#Try with year
reviews = Slider(title="Year of release", value=1950, start=1902, end=2014, step=1)
show(vform(reviews))
Out[16]:
If we want to use the slider to change the data of a plot, we have to use the CustomJS of Widgets.
We encounter JavaScript Callback here. Callbacks allow us to write javascript pieces and included in our python project, in order to trigger sophisticated interactions. Therefore people can play with the graphic a little bit more.
In [17]:
#Import the CustomJS of Widgets package
from bokeh.models import CustomJS, Range1d
In [18]:
# create a column data source for the plots to share
x = movies["Meter"]
y = movies["Reviews"]
source = ColumnDataSource(data=dict(x=x, y=y))
all_data = ColumnDataSource(data=dict(x=x, y=y))
# create a new figure
p = figure(plot_width=400, plot_height=400)
p.circle(x, y, size=5, color="navy", alpha=0.5, source=source)
p.set(y_range=Range1d(0, 310), x_range=Range1d(-5, 105))
In [19]:
#callback
callback = CustomJS(args=dict(source=source, all_data=all_data), code="""
var data = source.get('data');
var all_data = all_data.get('data');
var f = cb_obj.get('value');
x = all_data['x'];
y = all_data['y'];
data['y'] = [];
data['x'] = [];
for (i=0; i < y.length; i++){
if (y[i]>f) {
data['y'].push(y[i]);
data['x'].push(x[i]);
}
}
source.trigger('change');
""")
In [20]:
reviews = Slider(
title="Minimum number of reviews",
value=50, start=10, end=305, step=5,
callback=callback)
In [21]:
layout = vform(reviews, p)
show(layout)
Out[21]:
The Bokeh hover is a passive inspector tool, which displays informational tooltips whenever the cursor is directly over a glyph. The data to show comes from the glyph’s data source, and what is to be displayed is configurable through a tooltips attribute that maps display names to columns in the data source, or to special known variables.
In [22]:
hover = HoverTool(tooltips = [
("$", "@revenue"),
("Title","@title"),
("Year", "@year") #Start with “@” , interpreted as columns on the data source.
])
In [23]:
%reset p
In [24]:
#add revenue to the column data source that will be used by the plot
source = ColumnDataSource(data=dict(x=x, y=y, revenue=movies["revenue"], title=movies["Title"], year=movies["Year"]))
p = figure(plot_width=400, plot_height=400,tools=[hover])
p.circle(x, y, size=5, color="navy", alpha=0.5, source=source)
p.set(y_range=Range1d(0, 310), x_range=Range1d(-5, 105))
In [25]:
show(p)
Out[25]:
In [26]:
%reset p