In [1]:
# Import libraries
import numpy as np
import pandas as pd
# Import the data
import WTBLoad
wtb = WTBLoad.load()
Question: I want to know how similar 2 additions are. For instance, I'm thinking of brewing a beer with plums and vanilla, and I want to know how similar they are.
How to get there: The dataset shows the percentage of votes that said a style-addition combo would likely taste good. So, we can compare the votes on each style for the two additions, and see how similar they are.
In [2]:
import math
# Square the difference of each row, and then return the mean of the column.
# This is the average difference between the two.
# It will be higher if they are different, and lower if they are similar
def similarity(additionA, additionB):
diff = np.square(wtb[additionA] - wtb[additionB])
return diff.mean()
res = []
# Loop through each addition pair
for additionA in wtb.columns:
for additionB in wtb.columns:
# Skip if additionA and combo B are the same.
# To prevent duplicates, skip if A is after B alphabetically
if additionA != additionB and additionA < additionB:
res.append([additionA, additionB, similarity(additionA, additionB)])
df = pd.DataFrame(res, columns=["additionA", "additionB", "similarity"])
In [3]:
df.sort_values("similarity").head(10)
Out[3]:
In [4]:
df.sort_values("similarity", ascending=False).head(10)
Out[4]:
In [5]:
def comboSimilarity(additionA, additionB):
# additionA needs to be before additionB alphabetically
if additionA > additionB:
addition_temp = additionA
additionA = additionB
additionB = addition_temp
return df.loc[df['additionA'] == additionA].loc[df['additionB'] == additionB]
comboSimilarity('plum', 'vanilla')
Out[5]:
But is that good or bad? How does it compare to others?
In [6]:
df.describe()
Out[6]:
We can see that the plum vanilla combo is above the mean, and it's closer to the 75th percentile than the 50th percentile. So, we can conclude it's not likely a combo that will be great together, as it's not great in many of the same beers.