Performance Benchmarking for KV Drive

The goal of these set of experiments is to characterize the variability across platforms in a systematic and consistent way in terms of KV drive. The steps of experiments are as follows,

Run Stress-ng benchmarks on one KV drive;
Run Stress-ng benchmarks on machine issdm-6, and get the "without limit" result;
Find all the common benchmarks from both results;
Calculate the speedup (normalized value) of each benchmark based on the one from KV drive (issdm-6 (without limit) / KV drive);
Use torpor to calculate the best cpu quota by minimizing the average speedups. We will later use this parameter to limit the cpu usage in the docker container;
Run Stress-ng benchmarks in the constrained docker container on machine issdm-6, and get the "with limit" result;
Calculate the speedup based on KV drive again (issdm-6 (with limit) / KV drive), then we get a new "speedup range", which should be must smaller than the previous one.
Run a bunch of other benchmarks on both KV drive and constrained docker container to verify if they are all within in the later "speedup range".
Make conclusion.



In [1]:

    
%matplotlib inline
import pandas as pd
import random
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

pd.set_option("display.max_rows", 8)

First, we load all test data.



In [2]:

    
df = pd.read_csv('stress-ng/third/torpor-results/alltests.csv')

Let's have a look at the pattern of data.



In [3]:

    
df.head()









    Out[3]:






  
    
      
      machine
      limits
      benchmark
      class
      lower_is_better
      repetition
      result
    
  
  
    
      0
      issdm-6
      with
      stressng-cpu-all
      cpu
      False
      1
      5.229076
    
    
      1
      issdm-6
      with
      stressng-cpu-ackermann
      cpu
      False
      1
      0.535738
    
    
      2
      issdm-6
      with
      stressng-cpu-bitops
      cpu
      False
      1
      93.397542
    
    
      3
      issdm-6
      with
      stressng-cpu-callfunc
      cpu
      False
      1
      12852.729928
    
    
      4
      issdm-6
      with
      stressng-cpu-cdouble
      cpu
      False
      1
      165.364637

Show all the test machines.



In [4]:

    
df['machine'].unique()









    Out[4]:





array(['issdm-6', 't2.micro', 'kv3'], dtype=object)

Define some predicates for machines and limits



In [5]:

    
machine_is_issdm_6 = df['machine'] == 'issdm-6'
machine_is_t2_micro = df['machine'] == 't2.micro'
machine_is_kv3 = df['machine'] == 'kv3'

limits_is_with = df['limits'] == 'with'
limits_is_without = df['limits'] == 'without'

Show the number of stress tests on different machines



In [6]:

    
df_issdm_6_with_limit  = df[machine_is_issdm_6 & limits_is_with]
df_t2_micro_with_limit = df[machine_is_t2_micro & limits_is_with]
df_kv3_without_limit   = df[machine_is_kv3 & limits_is_without]

print(
    len(df_issdm_6_with_limit),                       # machine issdm-6 with limit
    len(df[machine_is_issdm_6 & limits_is_without]),  # machine issdm-6 without limit

    len(df_t2_micro_with_limit),                      # machine t2.micro with limit
    len(df[machine_is_t2_micro & limits_is_without]), # machine t2.micro without limit

    len(df_kv3_without_limit)                         # machine kv3 without limit
)









    



126 128 129 130 123

Because those failed benchmarks are not shown in the result report, we want to know how many common successful stress tests on the target machine and kv3.



In [7]:

    
issdm_6_with_limit_merge_kv3 = pd.merge(df_issdm_6_with_limit, df_kv3_without_limit, how='inner', on='benchmark')
t2_micro_with_limit_merge_kv3 = pd.merge(df_t2_micro_with_limit, df_kv3_without_limit, how='inner', on='benchmark')

print(
    # common successful tests from issdm-6 and kv3
    len(issdm_6_with_limit_merge_kv3),
    
    # common successful tests from t2.micro and kv3
    len(t2_micro_with_limit_merge_kv3)
)

Read the normalized results.



In [8]:

    
df_normalized = pd.read_csv('stress-ng/third/torpor-results/alltests_with_normalized_results_1.1.csv')

Show some of the data lines. The normalized value is the speedup based on kv3. It becomes a negative value when the benchmark runs on the target machine is slower than on kv3 (slowdown).



In [9]:

    
df_normalized.head()









    Out[9]:






  
    
      
      benchmark
      base_result
      machine
      limits
      class
      lower_is_better
      repetition
      result
      normalized
    
  
  
    
      0
      stressng-cpu-all
      0.559459
      issdm-6
      with
      cpu
      False
      1
      5.229076
      9.346665
    
    
      1
      stressng-cpu-all
      0.559459
      issdm-6
      without
      cpu
      False
      1
      73.245089
      130.921281
    
    
      2
      stressng-cpu-all
      0.559459
      t2.micro
      with
      cpu
      False
      1
      15.199386
      27.168007
    
    
      3
      stressng-cpu-all
      0.559459
      t2.micro
      without
      cpu
      False
      1
      223.475214
      399.448778
    
    
      4
      stressng-cpu-ackermann
      1.352526
      issdm-6
      with
      cpu
      False
      1
      0.535738
      0.396102

Show those benchmarks are not both successful completed on the issdm-6 and kv3.



In [10]:

    
df_issdm_6_with_limit[~df_issdm_6_with_limit['benchmark'].isin(issdm_6_with_limit_merge_kv3['benchmark'])]









    Out[10]:






  
    
      
      machine
      limits
      benchmark
      class
      lower_is_better
      repetition
      result
    
  
  
    
      93
      issdm-6
      with
      stressng-cpu-af-alg
      cpu
      False
      1
      2240.135965
    
    
      103
      issdm-6
      with
      stressng-cpu-numa
      cpu
      False
      1
      1.188726
    
    
      112
      issdm-6
      with
      stressng-cpu-cache-icache
      cpu-cache
      False
      1
      24.995858
    
    
      113
      issdm-6
      with
      stressng-cpu-cache-lockbus
      cpu-cache
      False
      1
      75485.873873
    
    
      117
      issdm-6
      with
      stressng-memory-memfd
      memory
      False
      1
      1356.940796
    
    
      124
      issdm-6
      with
      stressng-memory-vm-rw
      memory
      False
      1
      1.905445

Show those benchmarks are not both successful completed on the t2.micro and kv3.



In [11]:

    
df_t2_micro_with_limit[~df_t2_micro_with_limit['benchmark'].isin(t2_micro_with_limit_merge_kv3['benchmark'])]









    Out[11]:






  
    
      
      machine
      limits
      benchmark
      class
      lower_is_better
      repetition
      result
    
  
  
    
      347
      t2.micro
      with
      stressng-cpu-af-alg
      cpu
      False
      1
      6666.401429
    
    
      359
      t2.micro
      with
      stressng-cpu-rdrand
      cpu
      False
      1
      24078.103557
    
    
      362
      t2.micro
      with
      stressng-cpu-tsc
      cpu
      False
      1
      62488.869073
    
    
      367
      t2.micro
      with
      stressng-cpu-cache-icache
      cpu-cache
      False
      1
      38.799627
    
    
      368
      t2.micro
      with
      stressng-cpu-cache-lockbus
      cpu-cache
      False
      1
      72449.252834
    
    
      372
      t2.micro
      with
      stressng-memory-memfd
      memory
      False
      1
      2057.039725
    
    
      381
      t2.micro
      with
      stressng-memory-vm-rw
      memory
      False
      1
      8.296505

We can find the number of benchmarks are speed-up and slowdown, respectively.



In [12]:

    
normalized_limits_is_with = df_normalized['limits'] == 'with'
normalized_limits_is_without = df_normalized['limits'] == 'without'

normalized_machine_is_issdm_6 = df_normalized['machine'] == 'issdm-6'
normalized_machine_is_t2_micro = df_normalized['machine'] == 't2.micro'

normalized_is_speed_up = df_normalized['normalized'] > 0
normalized_is_slow_down = df_normalized['normalized'] < 0

print(
    # issdm-6 without CPU restriction
    len(df_normalized[normalized_limits_is_without & normalized_machine_is_issdm_6 & normalized_is_speed_up]),   # 1. speed-up
    len(df_normalized[normalized_limits_is_without & normalized_machine_is_issdm_6 & normalized_is_slow_down]),  # 2. slowdown
    
    # issdm-6 with CPU restriction
    len(df_normalized[normalized_limits_is_with & normalized_machine_is_issdm_6 & normalized_is_speed_up]),      # 3. speed-up
    len(df_normalized[normalized_limits_is_with & normalized_machine_is_issdm_6 & normalized_is_slow_down]),     # 4. slowdown
    
    # t2.micro without CPU restriction
    len(df_normalized[normalized_limits_is_without & normalized_machine_is_t2_micro & normalized_is_speed_up]),  # 5. speed-up
    len(df_normalized[normalized_limits_is_without & normalized_machine_is_t2_micro & normalized_is_slow_down]), # 6. slowdown
    
    # t2.micro with CPU restriction
    len(df_normalized[normalized_limits_is_with & normalized_machine_is_t2_micro & normalized_is_speed_up]),     # 7. speed-up
    len(df_normalized[normalized_limits_is_with & normalized_machine_is_t2_micro & normalized_is_slow_down])     # 8. slowdown
)









    



121 0 120 0 122 0 122 0

The average of normalized value for results under CPU restriction



In [13]:

    
print(
    # For issdm-6
    df_normalized[normalized_machine_is_issdm_6 & normalized_limits_is_with]['normalized'].mean(),
    
    # For t2_micro
    df_normalized[normalized_machine_is_t2_micro & normalized_limits_is_with]['normalized'].mean()
)









    



5.10958997736 13.4680329817

Experiment Results from issdm-6

Let's have a look at the histogram of frequency of normalized value based on stress tests without CPU restriction running on issdm-6.



In [14]:

    
df_normalized_issdm_6_without_limit = df_normalized[normalized_machine_is_issdm_6 & normalized_limits_is_without]
df_normalized_issdm_6_without_limit.normalized.hist(bins=150, figsize=(25,12), xlabelsize=20, ylabelsize=20)

plt.title('stress tests run on issdm-6 without CPU restriction', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[14]:





<matplotlib.text.Text at 0x7f6b97ba0be0>

Here is the rank of normalized value from stress tests without CPU restriction



In [15]:

    
df_normalized_issdm_6_without_limit_sorted = df_normalized_issdm_6_without_limit.sort_values(by='normalized', ascending=0)
df_normalized_issdm_6_without_limit_sorted_head = df_normalized_issdm_6_without_limit_sorted.head()
df_normalized_issdm_6_without_limit_sorted_tail = df_normalized_issdm_6_without_limit_sorted.tail()
df_normalized_issdm_6_without_limit_sorted_head.append(df_normalized_issdm_6_without_limit_sorted_tail)









    Out[15]:






  
    
      
      benchmark
      base_result
      machine
      limits
      class
      lower_is_better
      repetition
      result
      normalized
    
  
  
    
      261
      stressng-cpu-sqrt
      5.172518
      issdm-6
      without
      cpu
      False
      1
      4041.087860
      781.261246
    
    
      85
      stressng-cpu-gamma
      0.196768
      issdm-6
      without
      cpu
      False
      1
      122.810914
      624.140683
    
    
      205
      stressng-cpu-nsqrt
      0.170864
      issdm-6
      without
      cpu
      False
      1
      92.667529
      542.346714
    
    
      61
      stressng-cpu-euler
      8422.503460
      issdm-6
      without
      cpu
      False
      1
      3668571.085120
      435.567774
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      437
      stressng-cpu-cache-cache
      32.907173
      issdm-6
      without
      cpu-cache
      False
      1
      22.300178
      0.677669
    
    
      353
      stressng-string-strncasecmp
      11272.022718
      issdm-6
      without
      string
      False
      1
      6306.289926
      0.559464
    
    
      325
      stressng-string-strcasecmp
      13468.776467
      issdm-6
      without
      string
      False
      1
      6318.336606
      0.469110
    
    
      181
      stressng-cpu-jenkin
      470339.193258
      issdm-6
      without
      cpu
      False
      1
      17101.863912
      0.036361
    
  

10 rows × 9 columns

Now let's have a look at the histogram of frequency of normalized value based on stress tests with CPU restriction running on issdm-6.



In [16]:

    
df_normalized_issdm_6_with_limit = df_normalized[normalized_machine_is_issdm_6 & normalized_limits_is_with]
df_normalized_issdm_6_with_limit.normalized.hist(color='Orange', bins=150, figsize=(25,12), xlabelsize=20, ylabelsize=20)

plt.title('stress tests run on issdm-6 with CPU restriction', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[16]:





<matplotlib.text.Text at 0x7f6b9792d0b8>

Here is the rank of normalized value from stress tests with CPU restriction



In [17]:

    
df_normalized_issdm_6_with_limit_sorted = df_normalized_issdm_6_with_limit.sort_values(by='normalized', ascending=0)
df_normalized_issdm_6_with_limit_sorted_head = df_normalized_issdm_6_with_limit_sorted.head()
df_normalized_issdm_6_with_limit_sorted_tail = df_normalized_issdm_6_with_limit_sorted.tail()
df_normalized_issdm_6_with_limit_sorted_head.append(df_normalized_issdm_6_with_limit_sorted_tail)









    Out[17]:






  
    
      
      benchmark
      base_result
      machine
      limits
      class
      lower_is_better
      repetition
      result
      normalized
    
  
  
    
      260
      stressng-cpu-sqrt
      5.172518
      issdm-6
      with
      cpu
      False
      1
      280.696748
      54.266945
    
    
      84
      stressng-cpu-gamma
      0.196768
      issdm-6
      with
      cpu
      False
      1
      8.898357
      45.222582
    
    
      204
      stressng-cpu-nsqrt
      0.170864
      issdm-6
      with
      cpu
      False
      1
      6.699748
      39.210998
    
    
      60
      stressng-cpu-euler
      8422.503460
      issdm-6
      with
      cpu
      False
      1
      263206.984608
      31.250445
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      436
      stressng-cpu-cache-cache
      32.907173
      issdm-6
      with
      cpu-cache
      False
      1
      1.600015
      0.048622
    
    
      352
      stressng-string-strncasecmp
      11272.022718
      issdm-6
      with
      string
      False
      1
      449.198875
      0.039851
    
    
      324
      stressng-string-strcasecmp
      13468.776467
      issdm-6
      with
      string
      False
      1
      458.204632
      0.034020
    
    
      180
      stressng-cpu-jenkin
      470339.193258
      issdm-6
      with
      cpu
      False
      1
      1227.934894
      0.002611
    
  

10 rows × 9 columns

We notice that the stressng-cpu-jenkin looks like an outlier. Let's redraw the histogram without this one.



In [18]:

    
df_normalized_issdm_6_no_outlier = df_normalized_issdm_6_with_limit['benchmark'] != 'stressng-cpu-jenkin'
df_normalized_issdm_6_with_limit[df_normalized_issdm_6_no_outlier].normalized.hist(color='Green', bins=150, figsize=(25,12), xlabelsize=20, ylabelsize=20)

plt.title('stress tests run on issdm-6 with CPU restriction (no outlier)', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[18]:





<matplotlib.text.Text at 0x7f6b97723eb8>

Summary

We got the boundary of normalized value on issdm-6 from -29.394675 to 54.266945 by using parameters --cpuset-cpus=1 --cpu-quota=7234 --cpu-period=100000, which means the docker container only uses 7.234ms CPU worth of run-time every 100ms on cpu 1 (See cpu for more details).

Experiment Results from t2.micro

Let's have a look at the histogram of frequency of normalized value based on stress tests without CPU restriction running on t2.micro.



In [19]:

    
df_normalized_t2_micro_without_limit = df_normalized[normalized_machine_is_t2_micro & normalized_limits_is_without]
df_normalized_t2_micro_without_limit.normalized.hist(bins=150,figsize=(30,12), xlabelsize=20, ylabelsize=20)

plt.title('stress tests run on t2.micro without CPU restriction', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[19]:





<matplotlib.text.Text at 0x7f6b97537a90>

Here is the rank of normalized value from stress tests without CPU restriction



In [20]:

    
df_normalized_t2_micro_without_limit_sorted = df_normalized_t2_micro_without_limit.sort_values(by='normalized', ascending=0)
df_normalized_t2_micro_without_limit_sorted_head = df_normalized_t2_micro_without_limit_sorted.head()
df_normalized_t2_micro_without_limit_sorted_tail = df_normalized_t2_micro_without_limit_sorted.tail()
df_normalized_t2_micro_without_limit_sorted_head.append(df_normalized_t2_micro_without_limit_sorted_tail)









    Out[20]:






  
    
      
      benchmark
      base_result
      machine
      limits
      class
      lower_is_better
      repetition
      result
      normalized
    
  
  
    
      87
      stressng-cpu-gamma
      0.196768
      t2.micro
      without
      cpu
      False
      1
      325.956029
      1656.549993
    
    
      263
      stressng-cpu-sqrt
      5.172518
      t2.micro
      without
      cpu
      False
      1
      8127.805628
      1571.344097
    
    
      203
      stressng-cpu-matrixprod
      0.126767
      t2.micro
      without
      cpu
      False
      1
      190.777126
      1504.943132
    
    
      299
      stressng-matrix-mean
      190.142614
      t2.micro
      without
      matrix
      False
      1
      225858.886808
      1187.839391
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      423
      stressng-cpu-stream
      4.620318
      t2.micro
      without
      cpu
      False
      1
      6.094187
      1.318997
    
    
      439
      stressng-cpu-cache-cache
      32.907173
      t2.micro
      without
      cpu-cache
      False
      1
      2.699978
      0.082048
    
    
      183
      stressng-cpu-jenkin
      470339.193258
      t2.micro
      without
      cpu
      False
      1
      32828.454833
      0.069797
    
    
      472
      stressng-memory-stack
      9550.063278
      t2.micro
      without
      memory
      False
      1
      36.975041
      0.003872
    
  

10 rows × 9 columns

Let's have a look at the histogram of frequency of normalized value based on stress tests with CPU restriction running on t2.micro.



In [21]:

    
df_normalized_t2_micro_with_limit = df_normalized[normalized_machine_is_t2_micro & normalized_limits_is_with]
df_normalized_t2_micro_with_limit.normalized.hist(color='Orange', bins=150, figsize=(30,12), xlabelsize=20, ylabelsize=20)

plt.title('stress tests run on t2.micro with CPU restriction', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[21]:





<matplotlib.text.Text at 0x7f6b97252cf8>

Here is the rank of normalized value from stress tests with CPU restriction



In [22]:

    
df_normalized_t2_micro_with_limit_sorted = df_normalized_t2_micro_with_limit.sort_values(by='normalized', ascending=0)
df_normalized_t2_micro_with_limit_sorted_head = df_normalized_t2_micro_with_limit_sorted.head()
df_normalized_t2_micro_with_limit_sorted_tail = df_normalized_t2_micro_with_limit_sorted.tail()
df_normalized_t2_micro_with_limit_sorted_head.append(df_normalized_t2_micro_with_limit_sorted_tail)









    Out[22]:






  
    
      
      benchmark
      base_result
      machine
      limits
      class
      lower_is_better
      repetition
      result
      normalized
    
  
  
    
      86
      stressng-cpu-gamma
      0.196768
      t2.micro
      with
      cpu
      False
      1
      23.593420
      119.904761
    
    
      262
      stressng-cpu-sqrt
      5.172518
      t2.micro
      with
      cpu
      False
      1
      584.293272
      112.961090
    
    
      202
      stressng-cpu-matrixprod
      0.126767
      t2.micro
      with
      cpu
      False
      1
      13.663222
      107.782167
    
    
      298
      stressng-matrix-mean
      190.142614
      t2.micro
      with
      matrix
      False
      1
      15812.388056
      83.160675
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      422
      stressng-cpu-stream
      4.620318
      t2.micro
      with
      cpu
      False
      1
      0.086200
      0.018657
    
    
      438
      stressng-cpu-cache-cache
      32.907173
      t2.micro
      with
      cpu-cache
      False
      1
      0.198139
      0.006021
    
    
      182
      stressng-cpu-jenkin
      470339.193258
      t2.micro
      with
      cpu
      False
      1
      2370.177001
      0.005039
    
    
      471
      stressng-memory-stack
      9550.063278
      t2.micro
      with
      memory
      False
      1
      1.284595
      0.000135
    
  

10 rows × 9 columns

We notice that the stressng-memory-stack looks like an outlier. Let's redraw the histogram without this one.



In [23]:

    
df_normalized_t2_micro_no_outlier = df_normalized_t2_micro_with_limit['benchmark'] != 'stressng-memory-stack'
df_normalized_t2_micro_with_limit[df_normalized_t2_micro_no_outlier].normalized.hist(color='Green', bins=150, figsize=(30,12), xlabelsize=20, ylabelsize=20)

plt.title('stress tests run on t2.micro with CPU restriction (no outlier)', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[23]:





<matplotlib.text.Text at 0x7f6b957bbd30>

The stressng-cpu-jenkin benchmark is a collection of (non-cryptographic) hash functions for multi-byte keys. See Jenkins hash function from Wikipedia for more details.

Summary

We got the boundary of normalized value on t2.micro from -198.440535 to 119.904761 by using parameters --cpuset-cpus=0 --cpu-quota=25750 --cpu-period=100000, which means the docker container only uses 7.234ms CPU worth of run-time every 100ms on cpu 0 (See cpu for more details).

Verification

Now we use 9 other benchmark programs to verify this result. These programs are,

blogbench: filesystem benchmark.
compilebench: It tries to age a filesystem by simulating some of the disk IO common in creating, compiling, patching, stating and reading kernel trees.
fhourstones: This integer benchmark solves positions in the game of connect-4.
himeno: Himeno benchmark score is affected by the performance of a computer, especially memory band width. This benchmark program takes measurements to proceed major loops in solving the Poisson’s equation solution using the Jacobi iteration method.
interbench: It is designed to measure the effect of changes in Linux kernel design or system configuration changes such as cpu, I/O scheduler and filesystem changes and options.
nbench: NBench(Wikipedia) is a synthetic computing benchmark program developed in the mid-1990s by the now defunct BYTE magazine intended to measure a computer's CPU, FPU, and Memory System speed.
pybench: It is a collection of tests that provides a standardized way to measure the performance of Python implementations.
ramsmp: RAMspeed is a free open source command line utility to measure cache and memory performance of computer systems.
stockfish-7: It is a simple benchmark by letting Stockfish analyze a set of positions for a given limit each.

Read verification tests data.



In [24]:

    
df_verification = pd.read_csv('verification/results/2/alltests_with_normalized_results_1.1.csv')

Show number of test benchmarks.



In [25]:

    
len(df_verification) / 2









    Out[25]:





174.0

Order the test results by the absolute of normalized value



In [26]:

    
df_verification_rank = df_verification.reindex(df_verification.normalized.abs().sort_values(ascending=0).index)
df_verification_rank.head(8)









    Out[26]:






  
    
      
      machine
      limits
      benchmark
      base_result
      lower_is_better
      result
      normalized
    
  
  
    
      81
      t2.micro
      with
      nbench_neural_net
      0.23485
      False
      99.331
      422.955078
    
    
      85
      t2.micro
      with
      nbench_floating-point_index
      0.16700
      False
      55.600
      332.934132
    
    
      82
      t2.micro
      with
      nbench_lu_decomposition
      7.89580
      False
      2608.200
      330.327516
    
    
      77
      t2.micro
      with
      nbench_fourier
      155.53000
      False
      41088.000
      264.180544
    
    
      69
      issdm-6
      with
      nbench_lu_decomposition
      7.89580
      False
      1123.900
      142.341498
    
    
      68
      issdm-6
      with
      nbench_neural_net
      0.23485
      False
      32.941
      140.263998
    
    
      72
      issdm-6
      with
      nbench_floating-point_index
      0.16700
      False
      22.106
      132.371257
    
    
      64
      issdm-6
      with
      nbench_fourier
      155.53000
      False
      18073.000
      116.202662

Verification Tests on issdm-6

Histogram of frequency of normalized value.



In [27]:

    
df_verification_issdm_6 = df_verification[df_verification['machine'] == 'issdm-6']
df_verification_issdm_6.normalized.hist(color='y', bins=150,figsize=(20,10), xlabelsize=20, ylabelsize=20)

plt.title('verification tests run on issdm-6', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[27]:





<matplotlib.text.Text at 0x7f6b9553ceb8>

Print the max the min normalized value,



In [28]:

    
print(
    df_verification_issdm_6['normalized'].max(),
    df_verification_issdm_6['normalized'].min()
)









    



142.341498012 0.0239789570966

The average of noramlized value is,



In [29]:

    
df_verification_issdm_6['normalized'].mean()









    Out[29]:





4.1155289595826874

If we remove all nbench tests, the frequency histogram changes to



In [30]:

    
df_verification_issdm_6_no_nbench = df_verification_issdm_6[~df_verification_issdm_6['benchmark'].str.startswith('nbench')]
df_verification_issdm_6_no_nbench.normalized.hist(color='greenyellow', bins=150,figsize=(20,10), xlabelsize=20, ylabelsize=20)

plt.title('verification tests run on issdm-6 (no nbench)', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[30]:





<matplotlib.text.Text at 0x7f6b9535da58>

The max the min normalized value changes to,



In [31]:

    
print(
    df_verification_issdm_6_no_nbench['normalized'].max(),
    df_verification_issdm_6_no_nbench['normalized'].min()
)









    



29.3846153846 0.0239789570966

The average of noramlized value changes to,



In [32]:

    
df_verification_issdm_6_no_nbench['normalized'].mean()









    Out[32]:





1.0010764756668262

Verification Tests on t2.micro

Histogram of frequency of normalized value.



In [33]:

    
df_verification_t2_micro = df_verification[df_verification['machine'] == 't2.micro']
df_verification_t2_micro.normalized.hist(color='y', bins=150,figsize=(20,10), xlabelsize=20, ylabelsize=20)

plt.title('verification tests run on t2.micro', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[33]:





<matplotlib.text.Text at 0x7f6b9509b7b8>

The average of noramlized value of the verification benchmarks is,



In [34]:

    
df_verification_t2_micro['normalized'].mean()









    Out[34]:





12.371806437122357

Let's see the frequency histogram after removing right-most four outliers.



In [35]:

    
df_verification_top_benchmakrs = df_verification_rank[df_verification_rank['machine'] == 't2.micro'].head(4)['benchmark']
df_verification_t2_micro_no_outliers = df_verification_t2_micro[~df_verification_t2_micro['benchmark'].isin(df_verification_top_benchmakrs)]

df_verification_t2_micro_no_outliers.normalized.hist(color='greenyellow', bins=150,figsize=(20,10), xlabelsize=20, ylabelsize=20)

plt.title('verification tests on t2.micro (no outliers)', fontsize=30)

plt.xlabel('Normalized Value (re-execution / original)', fontsize=25)
plt.ylabel('Frequency (# of benchmarks)', fontsize=25)









    Out[35]:





<matplotlib.text.Text at 0x7f6b94e24630>

Print the max the min normalized value,



In [36]:

    
print(
    df_verification_t2_micro_no_outliers['normalized'].max(),
    df_verification_t2_micro_no_outliers['normalized'].min()
)









    



54.6037771386 0.0876515124587

The average of noramlized value without the four outliners is,



In [37]:

    
df_verification_t2_micro_no_outliers['normalized'].mean()









    Out[37]:





4.719394416309151

	machine	limits	benchmark	class	lower_is_better	repetition	result
0	issdm-6	with	stressng-cpu-all	cpu	False	1	5.229076
1	issdm-6	with	stressng-cpu-ackermann	cpu	False	1	0.535738
2	issdm-6	with	stressng-cpu-bitops	cpu	False	1	93.397542
3	issdm-6	with	stressng-cpu-callfunc	cpu	False	1	12852.729928
4	issdm-6	with	stressng-cpu-cdouble	cpu	False	1	165.364637

	benchmark	base_result	machine	limits	class	lower_is_better	repetition	result	normalized
0	stressng-cpu-all	0.559459	issdm-6	with	cpu	False	1	5.229076	9.346665
1	stressng-cpu-all	0.559459	issdm-6	without	cpu	False	1	73.245089	130.921281
2	stressng-cpu-all	0.559459	t2.micro	with	cpu	False	1	15.199386	27.168007
3	stressng-cpu-all	0.559459	t2.micro	without	cpu	False	1	223.475214	399.448778
4	stressng-cpu-ackermann	1.352526	issdm-6	with	cpu	False	1	0.535738	0.396102

	machine	limits	benchmark	class	lower_is_better	repetition	result
93	issdm-6	with	stressng-cpu-af-alg	cpu	False	1	2240.135965
103	issdm-6	with	stressng-cpu-numa	cpu	False	1	1.188726
112	issdm-6	with	stressng-cpu-cache-icache	cpu-cache	False	1	24.995858
113	issdm-6	with	stressng-cpu-cache-lockbus	cpu-cache	False	1	75485.873873
117	issdm-6	with	stressng-memory-memfd	memory	False	1	1356.940796
124	issdm-6	with	stressng-memory-vm-rw	memory	False	1	1.905445

	benchmark	base_result	machine	limits	class	lower_is_better	repetition	result	normalized
261	stressng-cpu-sqrt	5.172518	issdm-6	without	cpu	False	1	4041.087860	781.261246
85	stressng-cpu-gamma	0.196768	issdm-6	without	cpu	False	1	122.810914	624.140683
205	stressng-cpu-nsqrt	0.170864	issdm-6	without	cpu	False	1	92.667529	542.346714
61	stressng-cpu-euler	8422.503460	issdm-6	without	cpu	False	1	3668571.085120	435.567774
...	...	...	...	...	...	...	...	...	...
437	stressng-cpu-cache-cache	32.907173	issdm-6	without	cpu-cache	False	1	22.300178	0.677669
353	stressng-string-strncasecmp	11272.022718	issdm-6	without	string	False	1	6306.289926	0.559464
325	stressng-string-strcasecmp	13468.776467	issdm-6	without	string	False	1	6318.336606	0.469110
181	stressng-cpu-jenkin	470339.193258	issdm-6	without	cpu	False	1	17101.863912	0.036361

	machine	limits	benchmark	base_result	lower_is_better	result	normalized
81	t2.micro	with	nbench_neural_net	0.23485	False	99.331	422.955078
85	t2.micro	with	nbench_floating-point_index	0.16700	False	55.600	332.934132
82	t2.micro	with	nbench_lu_decomposition	7.89580	False	2608.200	330.327516
77	t2.micro	with	nbench_fourier	155.53000	False	41088.000	264.180544
69	issdm-6	with	nbench_lu_decomposition	7.89580	False	1123.900	142.341498
68	issdm-6	with	nbench_neural_net	0.23485	False	32.941	140.263998
72	issdm-6	with	nbench_floating-point_index	0.16700	False	22.106	132.371257
64	issdm-6	with	nbench_fourier	155.53000	False	18073.000	116.202662