Author: David Leonard (DrkSephy1025@gmail.com)
This notebook may be found in it's entirety at: https://github.com/DrkSephy/bokeh-tutorial
In this tutorial, we will explore the basics of Bokeh - ranging from simple plots to more complex figures. We'll be using Pandas to load and operate on a dataset containing Poker Hands and their distributions.
In [1]:
# We use pandas to parse our CSV file
import pandas
# Display Bokeh graphs inline
from bokeh.io import output_notebook
In [2]:
# render inline
output_notebook()
The original dataset consists of a CSV (Comma Separates Values) with various attributes pertaining to each of the five cards in the hand, as well as their suits and the hand strength. Unfortuntely, the dataset does not have column names, so we begin by adding these in programmatically.
In [3]:
# Column names from our CSV
colNames = ['S1', 'C1', 'S2', 'C2', 'S3', 'C3', 'S4', 'C4', 'S5', 'C5', 'CLASS']
# Read CSV file using pandas
data = pandas.read_csv('data.csv', names=colNames)
# Extract all data from the CLASS column
hands = data.CLASS.tolist()
# Remove the first element
hands.pop(0)
Out[3]:
For our first visualization, we would like to show the distributions of winning poker hands across the dataset (consisting of 1,000,000 entries). To extract this data, we parse the last column called CLASS
, which is the strength of the corresponding Poker Hand (annotated below).
In [4]:
# Count occurances of each class
classZero = hands.count('0') # Nothing in hand
classOne = hands.count('1') # One pair
classTwo = hands.count('2') # Two pair
classThree = hands.count('3') # Three of a kind
classFour = hands.count('4') # Straight
classFive = hands.count('5') # Flush
classSix = hands.count('6') # Full House
classSeven = hands.count('7') # Four of a kind
classEight = hands.count('8') # Straight Flush
classNine = hands.count('9') # Royal Flush
Next, we assemble an array of the occurances of each poker hand. This will be used to generate the x-axis points for our first visualization.
In [5]:
# Bundle the dataset - all of the counts of each class
dataset = [classZero, classOne, classTwo, classThree, classFour, classFive, classSix, classSeven, classEight, classNine]
In [6]:
# Import functions for creating figures and showing them inline
from bokeh.plotting import figure, show
# Ranges function is used for generating the y-axis
from bokeh.models.ranges import Range1d
# Used for converting arrays to numpy arrays
import numpy
Similar to Matplotlib, Bokeh supports a generic Figure
class which allows us to build a figure from the ground up by specifying pieces through renderers and glyphs, to name a few. We'll be constructing a combination of a bar chart and line plot using the Figure
class.
In [7]:
# Create a figure object
p = figure(plot_width=1000, plot_height=400)
In [8]:
# Set the bar values - the heights are the sums of occurances of each winning poker hand
h = numpy.array(dataset)
# Correcting the bottom position of the bars to be on the 0 line
adj_h = h/2
In [9]:
# add bar renderer
p.rect(x=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], y=adj_h, width=0.4, height=h, color="#CAB2D6")
Out[9]:
In [10]:
# Add a line renderer
p.line([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dataset, line_width=2)
Out[10]:
In [11]:
# Add circles to our points on the line
p.circle([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dataset, fill_color="white", size=8)
Out[11]:
In [12]:
# Setting the y axis range
p.y_range = Range1d(0, max(dataset))
In [13]:
# Set the title of the graph
p.title = "Distribution of Poker Hands"
In [14]:
# Set the x-axis label
p.xaxis.axis_label = 'Winning Poker Hand Class'
In [15]:
# Set the y-axis label
p.yaxis.axis_label = 'Frequency'
In [16]:
# Show our new graph - a combination of both line and bar charts
show(p)
Out[16]:
Neat! We've built our first visualization showing the frequencies of winning Poker hands. As we can see, class zero (empty hands) occurs more than 1/3 of the time (~35000 empty hands over 1,000,000 poker hands), while class one (one pair) occurs very frequently.
Next, we'll explore one of Bokeh's best features - high-level chart APIs. Our goal is to create a Heatmap of the distribution of cards (suit and rank) in all winning hands containing one pair (class one).
In [17]:
# Get all winning one-pair hands
onePair = data.loc[data['CLASS'] == '1']
# Get all the cards in these winning hands
card1 = onePair.C1.tolist()
card2 = onePair.C2.tolist()
card3 = onePair.C3.tolist()
card4 = onePair.C4.tolist()
card5 = onePair.C5.tolist()
# Get all the suits in these winning hands
suit1 = onePair.S1.tolist()
suit2 = onePair.S2.tolist()
suit3 = onePair.S3.tolist()
suit4 = onePair.S4.tolist()
suit5 = onePair.S5.tolist()
In order to preserve the ordered pairs of our cards, we make sure to append them into the corresponding arrays for both card rank and card suit. We also replace the names of all cards with a value greater than 10 for clarity.
In [18]:
# Bundle all cards, preserving order
x = []
for num in range(0, len(onePair)):
x.append(card1[num])
x.append(card2[num])
x.append(card3[num])
x.append(card4[num])
x.append(card5[num])
# Replace > 10 values with letters
for num in range(0, len(x)):
if x[num] == '11':
x[num] = 'J'
if x[num] == '12':
x[num] = 'Q'
if x[num] == '13':
x[num] = 'K'
if x[num] == '1':
x[num] = 'A'
# Bundle all suits
y = []
for num in range(0, len(onePair)):
y.append(suit1[num])
y.append(suit2[num])
y.append(suit3[num])
y.append(suit4[num])
y.append(suit5[num])
# Replace all values with suits
for num in range(0, len(y)):
if y[num] == '1':
y[num] = 'Hearts'
if y[num] == '2':
y[num] = 'Spades'
if y[num] == '3':
y[num] = 'Diamonds'
if y[num] == '4':
y[num] = 'Clubs'
Using Bokeh's high-level chart API, we can create a Heatmap by creating a Pandas dataframe containing our x-axis (card rank) and y-axis (card suit).
In [19]:
# One of the best features in Bokeh - high-level chart APIs
from bokeh.charts import HeatMap
# Create a dataframe consisting of a dictionary of the x and y axis
df = pandas.DataFrame(
dict(
cards=x,
suits=y
)
)
# Create a heatmap using the high-level heatmap function
p = HeatMap(df, title='Distribution', width=1000, tools='hover')
In [20]:
show(p)
Out[20]:
Viola! We've successfully built a heatmap showing the distribution of cards in all of the winning one-pair Poker hands. From this visualization, we can see that the King of Diamonds occurs mostly in winning hands. Bokeh has various other high-level charts, which can be explored here.