In [1]:
%matplotlib inline
import pandas as pd
import random
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
pd.set_option("display.max_rows", 8)
First, we load all test data.
In [2]:
df = pd.read_csv('stress-ng/second/results/combo/1/alltests.csv')
Let's have a look at the pattern of data.
In [3]:
df.head()
Out[3]:
Show all the test machines.
In [4]:
df['machine'].unique()
Out[4]:
Define a predicate for machine issdm-6
In [5]:
machine_is_issdm_6 = df['machine'] == 'issdm-6'
The number of benchmarks we ran on issdm-6 with limit is
In [6]:
limits_is_with = df['limits'] == 'with'
df_issdm_6_with_limit = df[machine_is_issdm_6 & limits_is_with]
len(df_issdm_6_with_limit)
Out[6]:
The number of benchmarks we ran on issdm-6 without limit is
In [7]:
limits_is_without = df['limits'] == 'without'
len(df[machine_is_issdm_6 & limits_is_without])
Out[7]:
The number of benchmarks we ran on kv3
In [8]:
df_kv3 = df[df['machine'] == 'kv3']
len(df_kv3)
Out[8]:
Because some benchmarks could fail druing the test suite running, those failed tests are not in the result report. We want to know how many common tests they both complated.
In [9]:
df_common = pd.merge(df_issdm_6_with_limit, df_kv3, how='inner', on='benchmark')
len(df_common)
Out[9]:
Read the normalized results.
In [10]:
df = pd.read_csv('stress-ng/second/results/combo/1/alltests_with_normalized_results_1.1.csv')
Show some of the data lines. The normalized value is the speedup based on kv3. It becomes a negative value when the benchmark runs on issdm-6 is slower than on kv3 (slowdown).
In [11]:
df.head()
Out[11]:
There is one benchmark not present in both with and without limit result set.
In [12]:
len(df) / 2
Out[12]:
Since the number of common benchmarks is 113, we wnat to find the one benchmark less than two results (all from issdm-6).
In [13]:
grouped = df.groupby('benchmark')
df[grouped['benchmark'].transform(len) < 2]
Out[13]:
In other words, stressng-memory-oom-pipe should not in the with limit results of issdm-6
In [14]:
df_issdm_6_with_limit[df_issdm_6_with_limit['benchmark'] == 'stressng-memory-oom-pipe'].empty
Out[14]:
We can find the number of benchmarks are speed-up and the number of them are slowdown on without limit results.
In [15]:
predicate_without_limits = df['limits'] == 'without'
predicate = predicate_without_limits & (df['normalized'] >= 0)
len(df[predicate])
Out[15]:
In [16]:
predicate = predicate_without_limits & (df['normalized'] < 0)
len(df[predicate])
Out[16]:
All right, let's draw a bar plot for all results.
In [17]:
sns.set()
sns.set_context("poster")
plt.xticks(rotation=90)
sns.barplot(x='benchmark', y='normalized', hue='limits', data=df)
Out[17]:
Which one have the greatest and smallest speedup on without limit banchmark results?
In [18]:
df_without_sorted = df[df['limits'] == 'without'].sort_values(by='normalized', ascending=0)
head_without = df_without_sorted.head()
tail_without = df_without_sorted.tail()
head_without.append(tail_without)
Out[18]:
Let's have a look at the speedup frequency on without limit benchmark results.
In [19]:
ax = df[df['limits'] == 'without'].groupby('limits').normalized.hist(bins=100,xrot=90,figsize=(20,10),alpha=0.5)
plt.xlabel('Speedup (re-execution / original)')
plt.ylabel('Frequency (# of benchmarks)')
Out[19]:
Which one have the greatest and smallest speedup on with limit benchmark results?
In [20]:
df_with_sorted = df[df['limits'] == 'with'].sort_values(by='normalized', ascending=0)
head_with = df_with_sorted.head()
tail_with = df_with_sorted.tail()
head_with.append(tail_with)
Out[20]:
The average speedup of with limit benchmarks is,
In [21]:
df[df['limits'] == 'with']['normalized'].mean()
Out[21]:
Let's have a look at the speedup frequency on with limit benchmark results.
In [22]:
ax = df[df['limits'] == 'with'].groupby('limits').normalized.hist(bins=100,xrot=90,figsize=(20,10),alpha=0.5)
plt.xlabel('Speedup (re-execution / original)')
plt.ylabel('Frequency (# of benchmarks)')
Out[22]:
The stressng-cpu-jenkin benchmark is a collection of (non-cryptographic) hash functions for multi-byte keys. See Jenkins hash function from Wikipedia for more details.
We got the speedup boundary from -276.278268 to 127.716328 by using parameters --cpuset-cpus=1 --cpu-quota=1000 --cpu-period=10000, which means the docker container only uses 1ms CPU worth of run-time every 10ms on cpu 1 (See cpu for more details).
Now we use 9 other benchmark programs to verify this result. These programs are,
Read verification tests data.
In [23]:
df = pd.read_csv('verification/results/1/alltests_with_normalized_results_1.0.csv')
Show number of test benchmarks.
In [24]:
len(df)
Out[24]:
Let's see the speedup of each individual result.
In [25]:
sns.set()
sns.set_context("poster")
plt.xticks(rotation=90)
sns.barplot(x='benchmark', y='normalized', data=df)
Out[25]:
Sort the test results set by the absolute value of normalized
In [26]:
df.reindex(df.normalized.abs().sort_values(ascending=0).index).head(8)
Out[26]:
See the histogram of all of speedups after filtering out the one outlier.
In [27]:
df_t = df[df['benchmark'] != 'nbench_fp']
ax = df_t.normalized.hist(bins=100,xrot=90,figsize=(20,10),alpha=0.5)
plt.xlabel('Speedup (re-execution / original)')
plt.ylabel('Frequency (# of benchmarks)')
Out[27]:
The average of speedup of the test benchmarks without the one outlier is,
In [28]:
df_t['normalized'].mean()
Out[28]:
Conclusion: Except the nbench_fp, all the 92 benchmarks fall within our predicted speedup range [-276.278268, 127.716328], and most of them (86) are in [-6, 4], which has the length of variety 8 (=10-2, becuase there won't be any speedup sit in (-1, 1)).
Question: Is it an acceptable emulational environment for the KV drive?