A Jupyter notebook for helping [me] make sense of sysstat sar data.
Copyright (c) 2017 Brendon Caligari, London, UK
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Strategy:
.sa files as XMLTested and assumed to work with:
As sysstat evolved so did the file format used to stor the sar data and unfortunately there is no guarantee that a version of sadf can read .sa files created with an earlier version of sa. Thankfully, different versions of sar data file format and sysstat version used can be determined from the file header.
sadf can export .sa files into various formats. XML outpout has been available consistently since [at least] sysstat version 8.1.5 and proved to be the most umambiquous and complete format for this script's objectives.
For every .sa file version an sadf binary capable of exporting its contents to XML is required. This mapping from file version to binary is maintained in the dict sa_exporters class variable for class SarData.
Which .sa files to analyse and the directory where they are found is set in the sa_files list and the sa_directory variables respectively in the Importing sa files sections. It is sensible that consequitive daily sar files from the same host are used for a meaningful time series.
The section Sanity checks outputs metadata collected from the input files and describes the data frames used to store the collected information. The Plots section provides templates for frequently rquired plot. The idea however is for those plots to be modified or added to as required.
sysstat versions
In [1]:
sa_directory = './z1'
sa_files = ["sa05", "sa06", 'sa07', 'sa08', 'sa09', 'sa10', 'sa11', 'sa12', 'sa13']
In [2]:
import xml.etree.ElementTree as ET
import pandas as pd
import subprocess
from datetime import datetime
import matplotlib
%matplotlib inline
In [3]:
class SarData(object):
"""Class for importing sar data"""
sa_file_magic = "96d5"
sa_exporters = {
"2170": ["./sadf.binaries/sles11sp3/usr/bin/sadf", "-x", "-t", "--", "-A"],
"2173": ["/usr/bin/sadf", "-x", "-t", "--", "-A"]
}
@classmethod
def sa_get_version(cls, filename):
"""Checks an sa file for sysstat sa magic and returns file format version"""
with open(filename, mode="rb") as f:
magic = f.read(2)
if magic != bytes.fromhex(cls.sa_file_magic):
raise TypeError("{} does not start with sa file magic. Got 0x{:02x}{:02x}".format(
filename, magic[1], magic[0]))
format_version = f.read(2)
return "{:02x}{:02x}".format(format_version[1], format_version[0])
@classmethod
def sa_to_xml(cls, filename):
"""Open an sa file and export it to XML"""
sa_ver = cls.sa_get_version(filename)
if not sa_ver in cls.sa_exporters:
raise NotImplementedError("No exporter for {} format version {}".format(filename, sa_ver))
return subprocess.check_output(cls.sa_exporters[sa_ver] + [filename])
def __init__(self):
self.hostname = None # hostname within sa file
self.aggregate0 = None # aggregate of what we pull out of the xml
self.aggregate1 = None
self.hostmeta = None
self._raw_aggregate0 = list()
self._raw_metadata = list()
def import_file(self, filename):
"""Import an sa file"""
# XML output from sadf seems to have developed 'evolutionarily' with namespaces
# thrown in at some point for added inconvenience. Here we strip them out.
raw_xml = self.sa_to_xml(filename) # Read in the xml output of sadf
xml_tree = ET.fromstring(raw_xml) # Convert to ET elements
__class__._strip_xml_ns(xml_tree) # Remove XML namespace crap inplace
if xml_tree.tag != "sysstat":
raise TypeError("Expected 'sysstat' but found a root of {}".format(xml_tree.tag))
for level1 in xml_tree:
if level1.tag == 'sysdata-version':
pass
elif level1.tag == "host":
tmp_hostmeta = {'filename': filename}
if "nodename" in level1.attrib:
tmp_hostmeta['nodename'] = level1.attrib["nodename"]
for host_child in level1:
if host_child.tag == "statistics":
for timestamp in host_child:
if timestamp.tag != "timestamp":
print(" Unexpected statistic tag: {}".timestamp.tag)
continue
datum_time = datetime.strptime(timestamp.attrib["date"] +
"T" + timestamp.attrib["time"],
"%Y-%m-%dT%H:%M:%S")
tmp_datum0 = {'timestamp': datum_time}
for metric in timestamp:
# Simple key value statistics for regular polls will all go
# into a single dict 'aggregate0'
aggregate0_parser = self.get_aggregate0_parser(metric.tag)
if aggregate0_parser:
tmp_datum0.update(aggregate0_parser(metric))
self._raw_aggregate0.append(tmp_datum0)
else:
hostmeta_parser = self.get_hostmeta_parser(host_child.tag)
if hostmeta_parser:
tmp_hostmeta[host_child.tag] = hostmeta_parser(host_child)
else:
print("Unexpected host tag: {}".format(host_child.tag))
self._raw_metadata.append(tmp_hostmeta)
else:
print("Unknown level 1: {}".format(child_tag))
pass
def dicts_to_dataframes(self):
"""Convert the various temporary lists of dicts to Data Frames"""
self.aggregate0 = pd.DataFrame.from_dict(self._raw_aggregate0)
self.hostmeta = pd.DataFrame.from_dict(self._raw_metadata)
def fix_numeric_columns(self):
"""Convert necessary Data Frame columns from str to numeric"""
for sa_column in self.aggregate0.columns:
if sa_column == 'timestamp':
continue
sa.aggregate0[sa_column] = pd.to_numeric(sa.aggregate0[sa_column],
errors='coerce')
@classmethod
def _strip_xml_ns(cls, element):
element.tag = element.tag.split('}')[-1]
for child in element:
cls._strip_xml_ns(child)
@classmethod
def get_hostmeta_parser(cls, meta_tag):
"""Returns a parser for elements under 'host'"""
## we can comment out any problematic ones at runtime
meta_parsers = {
'sysname': cls._parse_meta_default,
'release': cls._parse_meta_default,
'comments': cls._parse_meta_default,
'restarts': cls._parse_meta_default,
'machine': cls._parse_meta_default,
'number-of-cpus': cls._parse_meta_default,
'file-date': cls._parse_meta_default,
'file-utc-time': cls._parse_meta_default
}
if meta_tag in meta_parsers:
return meta_parsers[meta_tag]
return None
@classmethod
def get_aggregate0_parser(cls, metric_tag):
"""Returns a parser for elements under 'host->statistics'"""
## we can comment out any problematic ones at runtime
metric_parsers = {
'queue': cls._parse_absorb_level,
'memory': cls._parse_absorb_level,
'process-and-context-switch': cls._parse_absorb_level,
'hugepages': cls._parse_absorb_level,
'paging': cls._parse_absorb_level,
'io': cls._parse_absorb_level,
'swap-pages': cls._parse_absorb_level,
'kernel': cls._parse_absorb_level
}
if metric_tag in metric_parsers:
return metric_parsers[metric_tag]
return None
@staticmethod
def _parse_meta_default(element):
"""returns text field of a sar metadata element"""
return element.text.strip()
@staticmethod
def _parse_absorb_level(element):
"""returns a dict of flattened attributes and element tags with some cleanup"""
tmp_dict = dict()
for child in element:
if child.text:
tmp_dict[child.tag] = child.text
else:
tmp_dict.update(child.attrib.copy())
tmp_dict.update(element.attrib.copy())
for undesired in ['per', 'unit']:
if undesired in tmp_dict:
tmp_dict.pop(undesired)
return tmp_dict
def plot_simple_aggregate0(self, y_vars):
"""simple timeseries from self.aggregate0 columns"""
try:
sa.aggregate0.plot(x=['timestamp'],
y=y_vars,
style='-',
figsize=(15, 10))
except KeyError as err:
print("KeyError: {}".format(err))
sa filesHere we extract the various entries within the sar file and populate appropriate data frames. The process is deliberately iterative and verbose to expose the various fields and make on the fly modification or manual reproduction of steps easier. It does however need to be heavily refactored regardless.
In [4]:
sa = SarData()
for sa_file in map(lambda sa_file: "{}/{}".format(sa_directory, sa_file), sa_files):
sa_xml = sa.import_file(sa_file)
sa.dicts_to_dataframes()
sa.fix_numeric_columns()
In [5]:
sa.hostmeta
Out[5]:
In [6]:
sa.aggregate0.dtypes
Out[6]:
ldavg-1 - System load average over past 1 minuteldavg-5 - System load average over past 5 minutesldavg-15 - System load average over past 15 minutesFrom uptime(1):
System load averages is the average number of processes that are either
in a runnable or uninterruptable state. A process in a runnable state
is either using the CPU or waiting to use the CPU. A process in unin‐
terruptable state is waiting for some I/O access, eg waiting for disk.
The averages are taken over the three time intervals. Load averages
are not normalized for the number of CPUs in a system, so a load aver‐
age of 1 means a single CPU system is loaded all the time while on a 4
CPU system it means it was idle 75% of the time.
In [7]:
sa.plot_simple_aggregate0(['ldavg-1', 'ldavg-5', 'ldavg-15'])
runq-sz - number of tasks waiting to runplist-sz - number of tasks in process list
In [8]:
sa.plot_simple_aggregate0(['runq-sz', 'plist-sz'])
In [9]:
sa.plot_simple_aggregate0(['memfree', 'memused', 'buffers', 'cached', 'swpused'])
memused-percent - percentage memory usedswpused-percent - percentage swap used
In [10]:
sa.plot_simple_aggregate0(['memused-percent', 'swpused-percent'])
In [11]:
sa.plot_simple_aggregate0(['pswpin', 'pswpout'])
In [12]:
sa.plot_simple_aggregate0(['proc'])
cswch - context switches per second
In [13]:
sa.plot_simple_aggregate0(['cswch'])
In [14]:
sa.plot_simple_aggregate0(['hugused', 'hugfree'])
hugused-percent
In [15]:
sa.plot_simple_aggregate0(['hugused-percent'])
In [16]:
sa.plot_simple_aggregate0(['tps'])
rtps -wtps -
In [17]:
sa.plot_simple_aggregate0(['rtps', 'wtps'])
bread -bwrtn -
In [18]:
sa.plot_simple_aggregate0(['bread', 'bwrtn'])
In [19]:
sa.plot_simple_aggregate0(['pgpgin', 'pgpgout', 'fault', 'majflt', 'pgfree', 'pgscank', 'pgscand', 'pgsteal'])
vmeff-percent -
In [20]:
sa.plot_simple_aggregate0(['vmeff-percent'])
In [21]:
sa.plot_simple_aggregate0(['dentunusd'])
file-nr - file handles in use
In [22]:
sa.plot_simple_aggregate0(['file-nr'])
inode-nr - inode handles in use
In [23]:
sa.plot_simple_aggregate0(['inode-nr'])
pty-nr - pseudo-terninals in use
In [24]:
sa.plot_simple_aggregate0(['pty-nr'])