Bokeh is an impressive visualization library with a wide audience that ranges from begginers to sophisticated developers.
This is my first part on a series of blog entries about Bokeh. In this first part, I am addressing the ease of using Bokeh High Level charts.
In [22]:
from bokeh.io import output_notebook, show
output_notebook()
First let's load some data about diamonds to see easy one-liners high level charts. This data about 50,000 diamonds comes from the vincentarelbundock github
In [23]:
import pandas as pd
diamonds = pd.read_csv('./data/diamonds.csv')
diamonds = diamonds.sample(n=1000)
diamonds.head()
Out[23]:
In our first example, we see a scatter plot defined by the price of diamonds and their carats. But we are also representing the cut of the diamond with colors.
I am not passing axis titles or colors as parameters because they are automatically selected by the Bokeh library.
In [24]:
from bokeh.charts import Scatter, Histogram, Bar
p = Scatter(diamonds, color='cut', x='carat', y='price', title='Price of diamonds by carats')
show(p)
Out[24]:
Now... you don't have to get stuck with the default palette. Bokeh comes with a pre-built list of palettes.
In the example below we have the same chart but in a palette of greens.
In [25]:
from bokeh.palettes import YlGn6
from bokeh.charts import Scatter, Histogram, Bar
p = Scatter(diamonds, color='cut', x='carat', y='price', title='Price of diamonds by carats', palette=YlGn6)
show(p)
Out[25]:
The toolbar is defined by a list of tool names. You can also modify the location of the toolbar through the toolbar_location attribute.
To learn more about the toolbar, including the possible choiced of tools, open bokeh's documentation page
In [26]:
p = Bar(diamonds, 'cut', values='price', title="Sum of carats per diamond cut", color = 'cut',
toolbar_location="right", tools='pan,wheel_zoom, undo')
show(p)
Out[26]:
The sum of carats shown in the previous chart is not really interesting, and this is where aggregations come in. The agg parameter is used in High Level Charts to pass an aggregation method name. In the chart below I am passing mean, but I could have passed in any of the built-in methods: 'sum', 'mean', 'count', 'nunique', 'median', 'min', and 'max'.
In [27]:
p = Bar(diamonds, 'clarity', values='price', title="Average price per clarity", color = 'clarity',
toolbar_location="right", agg='mean')
show(p)
Out[27]:
Another nice feature is groupping, which in tandem with aggregations, can provide further insight into the displayed data. For instance, in the chart below we are again showing the average price per clarity, but now groupped per cut type.
In [29]:
p = Bar(diamonds, 'clarity', values='price', title="Avg price per cut and clarity", color = 'cut',
toolbar_location="right", agg='mean', group='cut', legend="top_right")
show(p)
Out[29]: