Gathering system data

Goals:

- Gathering System Data with multiplatform and platform-dependent tools
- Get infos from files, /proc, /sys
- Capture command output
- Use psutil to get IO, CPU and memory data
- Parse files with a strategy

Non-goals for this lesson:

- use with, yield or pipes

Modules


In [ ]:
import psutil
import glob
import sys
import subprocess

In [ ]:
#
# Our code is p3-ready
#
from __future__ import print_function, unicode_literals

In [ ]:
def grep(needle, fpath):
    """A simple grep implementation

       goal: open() is iterable and doesn't
             need splitlines()
       goal: comprehension can filter lists
    """
    return [x for x in open(fpath) if needle in x]

# Do we have localhost?
print(grep("localhost", "/etc/hosts"))

In [ ]:
#The psutil module is very nice
import psutil

#Works on Windows, Linux and MacOS
psutil.cpu_percent()

In [ ]:
#And its output is very easy to manage
ret = psutil.disk_io_counters()
print(ret)

In [ ]:
# Exercise: Which other informations 
# does psutil provide? 
# Use this cell and the tab-completion jupyter functionalities.

In [ ]:
# Exercise
def multiplatform_vmstat(count):
    # Write a vmstat-like function printing every second:
    # - cpu usage%
    # - bytes read and written in the given interval
    # Hint: use psutil and time.sleep(1)
    # Hint: use this cell or try on ipython and *then* write the function
    #       using %edit vmstat.py
    for i in range(count):
        raise NotImplementedError
        print(cpu_usage, bytes_rw)

multiplatform_vmstat(5)

In [ ]:
!python -c "from solutions import multiplatform_vmstat;multiplatform_vmstat(3)"

In [ ]:
#
# subprocess
#
# The check_output function returns the command stdout
from subprocess import check_output

# It takes a *list* as an argument!
out = check_output("ping -c5 www.google.com".split())

# and returns a string
print(out)
print(type(out))

If you want to stream command output, use subprocess.Popen and check carefully subprocess documentation!


In [ ]:
def sh(cmd, shell=False, timeout=0):
    """"Returns an iterable output of a command string
        checking...
    """
    from sys import version_info as python_version
    if python_version < (3, 3): # ..before using..
        if timeout:
            raise ValueError("Timeout not supported until Python 3.3")
        output = check_output(cmd.split(), shell=shell)
    else:
        output = check_output(cmd.split(), shell=shell, timeout=timeout)
    return output.splitlines()

In [ ]:
# Exercise:
# implement a multiplatform pgrep-like function.
def pgrep(program):
    """
    A multiplatform pgrep-like function.
    Prints a list of processes executing 'program'
    @param program - eg firefox, explorer.exe
    
    Hint: use subprocess, os and list-comprehension
    eg. items = [x for x in a_list if 'firefox' in x] 
    """
    raise NotImplementedError
pgrep('firefox')

In [ ]:
from solutions import pgrep as sol_pgrep
sol_pgrep("firefox")

Parsing /proc

Linux /proc filesystem is a cool place to get data

In the next example we'll see how to get:

  • thread informations;
  • disk statistics;

In [ ]:
# Parsing /proc - 1
def linux_threads(pid):
    """Retrieving data from /proc
    """
    from glob import glob
    # glob emulates shell expansion of * and ?
    # Change to /proc the base path if you run on linux machine
    path = "proc/{}/task/*/status".format(pid)
    
         
    # pick a set of fields to gather
    t_info = ('Pid', 'Tgid', 'voluntary')  # this is a tuple!
    for t in glob(path):
        # ... and use comprehension to get 
        # intersting data.
        t_info = [x 
                  for x in open(t) 
                  if x.startswith(t_info)] # startswith accepts tuples!
        print(t_info)

In [ ]:
# If you're on linux try linux_threads
pid_of_init = 1 # or systemd ?
linux_threads(pid_of_init)

In [ ]:
# On linux /proc/diskstats is the source of I/O infos
disk_l = grep("vda1", "proc/diskstats")
print(''.join(disk_l))

In [ ]:
# To gather that data we put the header in a multiline string
from solutions import diskstats_headers as headers
print(*headers, sep='\n')

In [ ]:
#Take the 1st entry (sda), split the data...
disk_info = disk_l[0].split()
# ... and tie them with the header
ret = zip(headers, disk_info)

# On py3 we need to iterate over the generators
print(list(ret))

In [ ]:
# Try to mangle ret
print('\n'.join(str(x) for x in ret))
# Exercise: trasform ret in a dict.

In [ ]:
# We can create a reusable commodity class with
from collections import namedtuple

# using the imported `headers` as attributes
# like the one provided by psutil
DiskStats = namedtuple('DiskStat', headers)

# ... and disk_info as values
dstat = DiskStats(*disk_info)
print(dstat.device, dstat.writes_ms)

In [ ]:
# Exercise
# Write the following function 
def linux_diskstats(partition):
    """Print every second I/O information from /proc/diskstats
    
        @param: partition - eg sda1 or vdx1
        
        Hint: use the above `grep` function
        Hint: use zip, time.sleep, print() and *magic
    """
    diskstats_headers = ('reads reads_merged reads_sectors reads_ms'
            ' writes writes_merged writes_sectors writes_ms'
            ' io_in_progress io_ms_weight').split()
    
    while True:
        raise NotImplementedError
        print(values, sep="\t")

In [ ]:
!python -c "from solutions import linux_diskstats;linux_diskstats('vda1')"

In [ ]:
# Using check_output with split() doesn't always work
from os import makedirs
makedirs('/tmp/course/b l a n k s')  # , exist_ok=True) this on py3
 
check_output('ls "/tmp/course/b l a n k s"'.split())

In [ ]:
# You can use
from shlex import split
# and
cmd = split('dir -a "/tmp/course/b l a n k s"')
check_output(cmd)