Whiskey Data

This data set contains data on a small number of whiskies


In [1]:
import pandas as pd
from numpy import log, abs, sign, sqrt
import brunel

whiskey = pd.read_csv("data/whiskey.csv")

print('Data on whiskies:', ', '.join(whiskey.columns))


Data on whiskies: Name, Rating, Country, Category, Price, ABV, Age, Brand

Summaries

Shown below are the following charts:

  • A treemap display for each whiskey, broken down by country and category. The cells are colored by the rating, with lower-rated whiskies in blue, and higher-rated in reds. Missing data for ratings show as black.
  • A filtered chart allowing you to select whiskeys based on price and category
  • A line chart showing the relationship between age and rating. A simple treemap of categories is linked to this chart
  • A bubble chart of countries linked to a heatmap of alcohol level (ABV) by rating

In [2]:
%%brunel data('whiskey') x(country, category) color(rating) treemap label(name:3) tooltip(#all) 
    style('.label {font-size:7pt}') legends(none)
:: width=900, height=600


Out[2]:

In [3]:
%%brunel data('whiskey') bubble color(rating:red) sort(rating) size(abv) label(name:6) tooltip(#all) filter(price, category) 
    :: height=500


Out[3]:

In [4]:
%%brunel data('whiskey')
        line x(age) y(rating) mean(rating) using(interpolate) label(country) split(country) 
                bin(age:8)  color(#selection) legends(none) |
        treemap x(category) interaction(select) size(#count) color(#selection) legends(none) sort(#count:ascending) bin(category:9)
                tooltip(country) list(country)  label(#count) style('.labels .label {font-size:14px}')
:: width=900


Out[4]:

In [5]:
%%brunel  data('whiskey')
    bubble label(country:3) bin(country) size(#count) color(#selection) sort(#count) interaction(select) tooltip(name) list(name) legends(none) at(0,10,60,100)
    | x(abv) y(rating) color(#count:blue) legends(none) bin(abv:8) bin(rating:5) style('symbol:rect; stroke:none; size:100%')  
            interaction(select) label(#selection) list(#selection)  at(60,15,100,100) tooltip(rating, abv,#count) legends(none) 
    |  bar label(brand:70) list(brand) at(0,0, 100, 10) axes(none) color(#selection) legends(none) interaction(filter)
:: width=900, height=600


Out[5]:

Some Analysis

Here we use the sci-kit decision tree regression tool to predict the price of a whiskey given its age, rating and ABV value. We transform the output for plotting purposes, but note that the tooltips give the original data


In [6]:
from sklearn import tree
D = whiskey[['Name', 'ABV', 'Age', 'Rating', 'Price']].dropna()
X = D[ ['ABV', 'Age', 'Rating'] ]
y = D['Price']
clf = tree.DecisionTreeRegressor(min_samples_leaf=4)
clf.fit(X, y)
D['Predicted'] = clf.predict(X)
f = D['Predicted'] - D['Price']
D['Diff'] = sqrt(abs(f)) * sign(f)
D['LPrice'] = log(y)
%brunel data('D') y(diff) x(LPrice) tooltip(name, price, predicted, rating) color(rating)  :: width=700


Out[6]:

Simple Linked Charts

Click on a bar to see the proportions of Whiskey categories per country


In [7]:
%%brunel data('whiskey') 
    bar x(country) y(#count) interaction(select) color(#selection) | 
    bar color(category) y(#count) percent(#count) polar stack label(category) legends(none) interaction(filter) tooltip(#count,category)
:: width=900, height=300


Out[7]:

In [ ]: