In [7]:
%matplotlib inline
import pandas as pd
import random
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
In [57]:
df_being_normalized = ""
# stressng-stream is in MB/s
# redisbench is in ops/s (with each op being 8 bytes)
def get_mbps(row):
if row['benchmark'] == 'stressng-stream':
return row['result'] / (1024 * 1024)
else:
return (row['result'] * 8) / (1024 * 1024)
def get_slowdown(row):
base = df_being_normalized.query('benchmark == "stressng-stream" and machine == "' + row['machine'] + '"')['mbps']
return 1 / (row['mbps'] / float(base))
def normalize(data):
global df_being_normalized
df_being_normalized = data
data['mbps'] = data.apply(get_mbps, axis=1)
data['slowdown'] = data.apply(get_slowdown, axis=1)
In [58]:
df_without = pd.read_csv('redis_without/all.csv')
normalize(df_without)
df_without['limits'] = 'no'
In [59]:
df_with = pd.read_csv('redis_limited/all.csv')
normalize(df_with)
df_with['limits'] = 'yes'
In [60]:
df = df_with.append([df_without])
In [61]:
df.columns
Out[61]:
In [62]:
df
Out[62]:
We run the redis benchmark (show results for SET operation) and we show results for multiple machines.
In [78]:
sns.barplot(x='machine', y='mbps', data=df.query('limits == "no" and op == "SET"'))
plt.xticks(rotation=30)
Out[78]:
The problem with the above is that these are absolute numbers, and therefore they are missing a context. One way of providing one is to obtain raw memory bandwidth throughput and use it as a baseline (normalize the above w.r.t. raw bandwidth).
In [79]:
for b in df['op'].unique():
if b == 'raw':
continue
sns.barplot(x='machine', y='slowdown', data=df.query('limits == "no" and op == "' + b + '"'))
plt.xticks(rotation=30)
sns.plt.title(b)
plt.show()
The above shows the overhead (slowdown) of redis w.r.t. the raw memory bandwidth. The above makes much more sense: in the first graph we are comparing the same workload on disctinct machines, i.e. we are comparing machines. But this hypothetical experiment was evaluating the performance of the KV store!
So, in the first graph, what we can conclude is that "redis is significantly slower on issdm-0". After we normalize, then this is not the case, actually, the overhead of redis on issdm-0 is the lowest! Also, our focus moves from comparing hardware to talking about the overhead of redis overall across machines (which is the goal of the experiment). In this case, the claim we can make is that redis' overhead is 3-5k over the system memory bandwidth.
Now, would throttling help in this case? Let's see
In [80]:
for b in df['op'].unique():
if b == 'raw':
continue
sns.barplot(x='machine', y='slowdown', hue='limits', data=df.query('op == "' + b + '"'))
plt.xticks(rotation=30)
sns.plt.title(b)
plt.show()
Since we are throttling both, the baseline and the KV store, we don't see any change in terms of the relationship between the overhead on distinct machines. Open question: are there any experiments where proper baselining does not help to contextualize results (i.e. where the 5k upper bound limit for the overhead would be, say, 100k)? Can throttling be used to "fix" these?
In [ ]: