Collecting metrics for disk usage

SAR is a tool to get usage metrics from all the resources in your system.

Here I present a method to parse the disk usage file in order to plot it for a better visualization.

Parsing the file


In [45]:
with open('metrics/disk.txt', 'r') as f:
    metrics = f.readlines()

All the collected txt files start with this line:


In [46]:
print metrics[0]


Linux 2.6.32-358.2.1.el6.x86_64 (n020303) 	2013-05-30 	_x86_64_	(24 CPU)

And the first line of data contains all the labels:


In [47]:
print metrics[2]


00:00:01          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util

DEV indicate the device from where the metrics are taken, so we end up with a line per device:


In [49]:
for l in metrics[3:21]:
    print l


00:10:01      dev8-16     20,81      0,67   4955,84    238,22      0,07      3,43      2,71      5,63

00:10:01       dev8-0     21,25      1,88   4954,21    233,22      0,07      3,38      2,66      5,66

00:10:01       dev9-0      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

00:10:01       dev9-1      0,86      0,87      6,59      8,67      0,00      0,00      0,00      0,00

00:10:01       dev9-3     94,25      0,65   9892,30    104,97      0,00      0,00      0,00      0,00

00:10:01       dev9-2      0,12      0,93      0,00      8,00      0,00      0,00      0,00      0,00

00:20:01      dev8-16     22,71      0,01   5410,96    238,31      0,08      3,32      2,62      5,95

00:20:01       dev8-0     22,80      0,00   5410,07    237,25      0,08      3,29      2,63      6,01

00:20:01       dev9-0      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

00:20:01       dev9-1      0,72      0,00      5,77      8,00      0,00      0,00      0,00      0,00

00:20:01       dev9-3    102,60      0,00  10805,60    105,31      0,00      0,00      0,00      0,00

00:20:01       dev9-2      0,00      0,01      0,00      8,00      0,00      0,00      0,00      0,00

00:30:01      dev8-16     21,96      0,00   5382,34    245,09      0,08      3,66      2,72      5,97

00:30:01       dev8-0     22,10      0,01   5383,14    243,56      0,08      3,61      2,73      6,03

00:30:01       dev9-0      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

00:30:01       dev9-1      0,75      0,00      5,95      7,98      0,00      0,00      0,00      0,00

00:30:01       dev9-3    101,40      0,01  10749,62    106,01      0,00      0,00      0,00      0,00

00:30:01       dev9-2      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

We can save the data in a dictionary containing as a key the device string, and as values, tuples containing (time, metrics):


In [245]:
labels = metrics[2].split()[1:]
y, m, d = metrics[0].split()[3].split('-')
print labels, y, m, d


['DEV', 'tps', 'rd_sec/s', 'wr_sec/s', 'avgrq-sz', 'avgqu-sz', 'await', 'svctm', '%util'] 2013 05 30

In [246]:
import string
import datetime

devs = {}
for l in metrics[3:]:
    metrics_line = l.split()
    dev = metrics_line[1]
    t = metrics_line[0]
    data = metrics_line[2:]
    #We have to convert the timestamps to python datetime objects, and comma
    #separated floats to point separated floats
    t = datetime.datetime(int(y), int(m), int(d), int(t.split(':')[0]), int(t.split(':')[1]), int(t.split(':')[2]))
    for i in range(len(data)):
        data[i] = string.atof(data[i].replace(',', '.'))
    #And then we save it
    if devs.has_key(dev):
        devs[dev].append((t, data))
    else:
        devs[dev] = [(t, data)]

In [248]:
#These are our devices
devs.keys()


Out[248]:
['dev8-0', 'dev9-1', 'dev9-3', 'dev9-2', 'dev8-16', 'dev9-0']

Plotting the data

We can now plot all the metrics from all de devices. I've devided to make a plot for each metric and device to have a more detailed information:


In [271]:
for dev in devs.iterkeys():
    for i in range(len(labels[1:])): #We do not count the DEV label
        dates = []
        values= []
        [dates.append(t) for t, v in devs[dev]]
        [values.append(v[i]) for t, v in devs[dev]]
        fig = figure()
        axes = fig.add_subplot(111)
        axes.grid(True)
        axes.set_title(str(dev))
        axes.set_xlabel("Time")
        axes.set_ylabel(labels[1+i])
        axes.plot(dates, values, linestyle='-')