Your job as a data scientist is to communicate. Often you are communicating insight from distilling lots of data (models, stats, EDA, narative/editorialization) // making data accessible/self-service (run queries, dashboards that auto update) // communicating performance/diagnostics of complex models/processes.
In [1]:
from bokeh.plotting import figure, show, output_notebook
output_notebook()
Does it work? http://bokeh.pydata.org/en/latest/docs/user_guide/quickstart.html#userguide-quickstart
In [ ]:
# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]
# output to static HTML file
# output_file("lines.html", title="line plot example")
# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')
# add a line renderer with legend and line thickness
p.line(x, y, legend="Temp.", line_width=15)
# show the results
show(p)
“the fundamental principles or rules of an art or science” (OED Online 1989). A good grammar will allow us to gain insight into the composition of complicated graphics, and reveal unexpected connections between seemingly different graphics (Cox 1978). A grammar provides a strong foundation for understanding a diverse range of graphics. A grammar may also help guide us on what a well-formed or correct graphic looks like, but there will still be many grammatically correct but nonsensical graphics.
-- Wickham (A Layered Grammar of Graphics)
Visual index (Courtesy of yHat)
geom)can change each in relative isolation
In [5]:
from bokeh.charts import Histogram
from bokeh.sampledata.autompg import autompg as df
df.sort('cyl', inplace=True)
hist = Histogram(df, values='hp', title="HP Distribution", legend='top_right')
show(hist)
Out[5]:
In [6]:
import numpy as np
from bokeh.models import HoverTool, BoxSelectTool
TOOLS = [BoxSelectTool(), HoverTool()]
# create our canvas
p1 = figure(title="HP Distribution", background_fill_color="#E8DDCB", tools=TOOLS)
# stat
hist, edges = np.histogram(df.hp, density=True, bins=50)
# geom
p1.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
fill_color="#036564", line_color="#033649")
show(p1)
Out[6]:
In [7]:
df.sort('cyl', inplace=True)
hist = Histogram(df, values='hp', color='cyl',
title="HP Distribution by Cylinder Count", legend='top_right')
show(hist)
Out[7]:
In [3]:
from bokeh.models import GeoJSONDataSource
from bokeh.plotting import figure
from bokeh.sampledata.sample_geojson import geojson
geo_source = GeoJSONDataSource(geojson=geojson)
p = figure()
p.circle(x='x', y='y', alpha=0.9, source=geo_source)
show(p)
Out[3]:
In [5]:
import pandas as pd
# more time/compute intensive to parse dates. but we know we definitely have/need them
df = pd.read_csv('data/sf_listings.csv', parse_dates=['last_review'], infer_datetime_format=True)
df_reviews = pd.read_csv('data/reviews.csv', parse_dates=['date'], infer_datetime_format=True)
In [6]:
# index DataFrame on listing_id in order to join datasets
reindexed_df = df_reviews.set_index('listing_id')
reindexed_df.head()
Out[6]:
In [7]:
# remember the original id in a column to group on
df['listing_id'] = df['id']
df_listing = df.set_index('id')
df_listing.head()
Out[7]:
In [8]:
# join the listing information with the review information
review_timeseries = df_listing.join(reindexed_df)
print review_timeseries.columns
review_timeseries.head()
Out[8]:
In [9]:
# lets try a pivot table...
reviews_over_time = pd.crosstab(review_timeseries.date, review_timeseries.neighbourhood)
reviews_over_time.head()
Out[9]:
In [10]:
# smooth by resampling by month
reviews_over_time.resample('M').mean()[['Mission', 'South of Market', 'Noe Valley']].plot(figsize=(12,6))
Out[10]:
In [20]:
TOOLS = "pan,wheel_zoom,box_zoom,reset,save,hover"
d = reviews_over_time.resample('M').mean()
p = figure(x_axis_type="datetime", tools=TOOLS)
p.line(d.index, d['Mission'])
show(p)
Out[20]:
In [23]:
import bokeh.charts as charts
line = charts.Line(d, y=['Mission', 'South of Market', 'Noe Valley'],
color=['Mission', 'South of Market', 'Noe Valley'],
title="Interpreter Sample Data", ylabel='Duration', legend=True, tools=TOOLS)
show(line)
Out[23]:
In [27]:
from bokeh.models.widgets import Select
from bokeh.io import output_file, show, vform
select = Select(title="Option:", value="foo", options=list(reviews_over_time))
show(vform(select))
Out[27]: