notebook.community

Edit and run



In [2]:

    
%matplotlib inline
import pandas as pd
import random
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns



In [3]:

    
df = pd.read_csv('all.csv')

Hypothesis: cgroups can guarantee isolation for concurrent workloads that put limits on direct IO bandwidth, assuming an over-provisioned scenario.

Workloads: read, write and randomread concurrent tasks for 4, 256 and 4096 KB.

Baseline: Run each workload without concurrency for 4KB block size and use this as the overprovisioned maximum bandwidth capacity of the device.

Let's look at the baseline first:



In [51]:

    
df.groupby('mode').maxbw.unique()









    Out[51]:





mode
randread      [299]
read        [26194]
write       [15397]
Name: maxbw, dtype: object



In [8]:

    
df.head()









    Out[8]:






  
    
      
      repetition
      blksize_kb
      limits_pct
      mode
      maxbw
      limit
      benchmark
      job
      read_bw
      write_bw
    
  
  
    
      0
      3
      256
      40,25,25
      write
      15397
      6158
      fio
      write-sdb-256k-seq
      0
      6157
    
    
      1
      3
      256
      40,25,25
      read
      26194
      6548
      fio
      read-sdb-256k-seq
      6547
      0
    
    
      2
      3
      256
      40,25,25
      randread
      299
      74
      fio
      randread-sdb-256k-seq
      74
      0
    
    
      3
      3
      256
      25,40,25
      write
      15397
      3849
      fio
      write-sdb-256k-seq
      0
      3848
    
    
      4
      3
      256
      25,40,25
      read
      26194
      10477
      fio
      read-sdb-256k-seq
      10475
      0



In [15]:

    
def normalize_wrt_limit(x):
    v = x['write_bw'] if 'write' in x['mode'] else x['read_bw']
    return v / float(x['limit'])

df['result_normalized'] = df.apply(normalize_wrt_limit, axis=1)



In [54]:

    
ax = sns.boxplot(x='mode', y='result_normalized', data=df.query("blksize_kb == 4"), hue='limits_pct')
ax.set(ylim=(0, 1.1))
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))









    Out[54]:





<matplotlib.legend.Legend at 0x7f68e644da90>



In [52]:

    
ax = sns.boxplot(x='mode', y='result_normalized', data=df.query("blksize_kb == 4096"), hue='limits_pct')
ax.set(ylim=(0, 1.1))
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))









    Out[52]:





<matplotlib.legend.Legend at 0x7f68e846e290>



In [53]:

    
ax = sns.boxplot(x='mode', y='result_normalized', data=df.query("blksize_kb == 256"), hue='limits_pct')
ax.set(ylim=(0, 1.1))
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))









    Out[53]:





<matplotlib.legend.Legend at 0x7f68e6d37890>

Our results show that when block size is distinct to 4KB, we can guarantee performance for 3 concurrent IO workloads. We then codify this validation so we can test it:



In [98]:

    
for bs in ['4', '256', '4096']:
    within_range = df.query("blksize_kb == " + bs).groupby('limits_pct')['result_normalized'].median().between(0.95, 1.05).all()
    if bs == '4':
        print(not within_range)
    else:
        print(within_range)









    



True
True
True

	repetition	blksize_kb	limits_pct	mode	maxbw	limit	benchmark	job	read_bw	write_bw
0	3	256	40,25,25	write	15397	6158	fio	write-sdb-256k-seq	0	6157
1	3	256	40,25,25	read	26194	6548	fio	read-sdb-256k-seq	6547	0
2	3	256	40,25,25	randread	299	74	fio	randread-sdb-256k-seq	74	0
3	3	256	25,40,25	write	15397	3849	fio	write-sdb-256k-seq	0	3848
4	3	256	25,40,25	read	26194	10477	fio	read-sdb-256k-seq	10475	0