One of Bokeh Charts main contributions is that it provides a flexible interface for applying unique attributes based on the unique values in column(s) of a DataFrame.
Internally, the bokeh chart uses the AttrSpec to define the mapping, but allows the user to pass in their own spec, or utilize a function to produce a customized one.
In [ ]:
from bokeh.charts.attributes import AttrSpec, ColorAttr, MarkerAttr
In [ ]:
attr = AttrSpec(items=[1, 2, 3], iterable=['a', 'b', 'c'])
attr.attr_map
You will see that the key in the mapping will be a tuple, and it will always be a tuple. The mapping works like this because the AttrSpec(s) are often used with Pandas DataFrames groupby method. The groupby method can return a single value or a tuple of values when used with multiple columns, so this is just making sure that is consistent.
However, you can still access the values in the following way:
In [ ]:
attr[1]
The ColorAttr is just a custom AttrSpec that has a default palette as the iterable, but can be customized, and will likely provide some other color generation functionality.
In [ ]:
color = ColorAttr(items=[1, 2, 3])
color.attr_map
Let's assume that you don't know how many unique items you are working with, but you have defined the things that you want to assign the items to. The AttrSpec will automatically cycle the iterable for you. This is important for exploratory analysis.
In [ ]:
color = ColorAttr(items=list(range(0, 10)))
color.attr_map
Because there are only 6 unique colors in the default palette, the palette repeats starting on the 7th item.
In [ ]:
from bokeh.sampledata.autompg import autompg as df
In [ ]:
df.head()
In [ ]:
color_attr = ColorAttr(df=df, columns=['cyl', 'origin'])
In [ ]:
color_attr.attr_map
You will notice that this is similar to a pandas series with a MultiIndex, which is seen below.
In [ ]:
color_attr.series
You can think of this as a SQL table with 3 columns, two of which are an index. You can imagine how you might join this view data into the original data source to assign these colors to the associated rows.
In [ ]:
from bokeh.charts.data_source import ChartDataSource
In [ ]:
fill_color = ColorAttr(columns=['cyl', 'origin'])
ds = ChartDataSource.from_data(df)
In [ ]:
ds.join_attrs(fill_color=fill_color).head()
In [ ]:
# add new column
df['large_displ'] = df['displ'] >= 350
fill_color = ColorAttr(columns=['cyl', 'origin'])
line_color = ColorAttr(columns=['large_displ'])
ds = ChartDataSource.from_data(df)
ds.join_attrs(fill_color=fill_color, line_color=line_color).head(10)
You will see that the output contains the combined chart_index and the columns for both attributes. The values of each are joined in based on the original assignment. For example, line_color only has two colors because the large_displ column only has two values.
If we wanted to change the true/false, we can modify the ColorAttr.
In [ ]:
line_color = ColorAttr(df=df, columns=['large_displ'], palette=['Green', 'Red'])
ds.join_attrs(fill_color=fill_color, line_color=line_color).head(10)
You may not have wanted to assign the values in the order that occured. So, you would have five options.
AttrSpec
In [ ]:
df_sorted = df.sort(columns=['large_displ'], ascending=False)
line_color = ColorAttr(df=df_sorted, columns=['large_displ'], palette=['Green', 'Red'], sort=False)
ds.join_attrs(fill_color=fill_color, line_color=line_color).head()
In [ ]:
df.sort(columns='large_displ').head()
In [ ]:
import pandas as pd
df_cat = df.copy()
# create the categorical and set the default (ascending)
df_cat['large_displ'] = pd.Categorical.from_array(df.large_displ).reorder_categories([True, False])
# we don't have to sort here, but doing it so you can see the order that the attr spec will see
df_cat.sort(columns='large_displ').head()
In [ ]:
line_color = ColorAttr(df=df_cat, columns=['large_displ'], palette=['Green', 'Red'])
ds.join_attrs(fill_color=fill_color, line_color=line_color).head()
In [ ]:
# the items will be sorted descending (uses same sorting options as pandas)
line_color = ColorAttr(df=df, columns=['large_displ'], palette=['Green', 'Red'], sort=True, ascending=False)
ds.join_attrs(fill_color=fill_color, line_color=line_color).head()
In [ ]:
# remove df so the items aren't auto-calculated
# still need column name for when palette is joined into the dataset
line_color = ColorAttr(columns=['large_displ'], items=[True, False], palette=['Green', 'Red'])
ds.join_attrs(fill_color=fill_color, line_color=line_color).head()
In [ ]:
line_color = ColorAttr(df=df, columns=['large_displ'], palette=['Red', 'Green'])
ds.join_attrs(fill_color=fill_color, line_color=line_color).head()