• import cPickle as pickle in order to transparently use the same interface.
  • import job_stats from the tacc_stats monitor directory

In [54]:
import sys
sys.path.append('../../monitor')
import job_stats
import cPickle as pickle
  • Load a single job's file from my directory of them on Ranger
  • This is a local untar of John's nightly files

In [38]:
j=pickle.load(open('nightly_jobs/2012-10-30/2887373'))
  • Let's see what's actually in this object.

In [39]:
dir(j)
  • The hosts data structure contains most of the actual data
  • But we'll come back to a few others

In [40]:
j.hosts
  • Hosts is a dictionary of further objects.
  • Let's pick one and have a look around

In [41]:
h='i101-111.ranger.tacc.utexas.edu'
dir(j.hosts[h])
  • The stats dictionary contains all of the data for this host

In [42]:
j.hosts[h].stats.keys()
  • Each key here corresponds (more or less) to the lines from the raw data
  • The value for each key is a dictionary containing key/value pairs that point to numpy arrays of the actual data for this type.
  • In the case of amd64_core, there's a dictionary entry for each core (strings '0' through '15' on Ranger)

In [43]:
j.hosts[h].stats['amd64_core']['0']
  • These three columns represent the three core counters we count: SSE_FLOPS, Data Cache Sys Fills, and User Cycles
  • There is one row for each time sample
  • How do we know which column corresponds to which counter?

  • The get_schema method returns the schema for this data for a given initial key

In [45]:
j.get_schema('amd64_core')
  • The keys of this dictionary let you know what's available
  • There are some additional things like the unit of measure for a thing, and whether it's an event or not

  • To get the column index into the host stats array, you can do:

In [46]:
index=j.get_schema('amd64_core')['SSE_FLOPS'].index
print index
  • So now we can do:

In [52]:
v=j.hosts[h].stats['amd64_core']['0'][:,index]
print v
  • Which is a 1-D numpy array of values for SSE_FLOPS for core 0 on host i101-111 on Ranger

  • These values are time-aligned and robustified in various ways (account for roll-over, etc.).
  • They count raw SSE floating point operations as measured by the hardware counters.
  • They are more useful than the raw data, but typically what we want is rates:

In [53]:
import numpy
r = numpy.diff(v)/numpy.diff(j.times)
print r

In [37]:
%pylab inline
from pylab import *
t=j.times-j.times[0]
ax=subplot(111)
plot(t[0:-1]/3600.,r)
  • There is a simpler interface to the data
  • But you still need to look at the schema to know what's there

In [61]:
help(j.hosts[h].get_stats)

In [59]:
j.hosts[h].get_stats('amd64_core','0','SSE_FLOPS')