Creating a Trace Object



In [1]:

    
%matplotlib inline 

import trappy
from matplotlib import pyplot as plt

trace = trappy.FTrace("./trace_stats.dat")









    



Populating the interactive namespace from numpy and matplotlib

View the trace



In [2]:

    
# Execute to view it
trappy.plotter.plot_trace(trace)

What is a Trigger?

A trigger is the combination of the following:

* A TRAPpy event
* A pivot for the event
* A set of filters
* A value

Introduction to Triggers

The example below explains how to create a trigger for the event when a particular process is switched in or out.
This uses the trappy.sched.SchedSwitch event in the trace object. This event looks for the unique_word sched_switch in the trace file.
Here is sample line in the text trace that corresponds to the SchedSwitch event.

sched_switch: prev_comm=trace-cmd prev_pid=4731 prev_prio=120 prev_state=1 next_comm=trace-cmd next_pid=4730 next_prio=120

We can see that this event is populated in the above trace object as:



In [3]:

    
trace.sched_switch.data_frame.head()









    Out[3]:






  
    
      
      __comm
      __cpu
      __pid
      next_comm
      next_pid
      next_prio
      prev_comm
      prev_pid
      prev_prio
      prev_state
    
    
      Time
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0.000000
      ls
      2
      4734
      migration/2
      18
      0
      trace-cmd
      4734
      120
      1024
    
    
      0.000022
      migration/2
      2
      18
      trace-cmd
      4732
      120
      migration/2
      18
      0
      1
    
    
      0.000107
      trace-cmd
      1
      4731
      trace-cmd
      4730
      120
      trace-cmd
      4731
      120
      1
    
    
      0.000127
      trace-cmd
      1
      4730
      trace-cmd
      4729
      120
      trace-cmd
      4730
      120
      1
    
    
      0.000142
      trace-cmd
      1
      4729
      swapper/1
      0
      120
      trace-cmd
      4729
      120
      1

Now, let us suppose we want to create a trigger for the event when the task trace-cmd (pid=4729) is switched in and assign a value of 1 to the Trigger signal. In pseudo code:

if next_pid == 4729:
    Trigger_In(t) = 1

We can similarly create a Trigger for the event when the task is switched out:

if prev_pid == 4729:
    Trigger_Out(t) = -1

A pivot is the column along which the data is orthogonal. If the data is pivotable it is possibly a super-position of smaller data elements corresponding to each unique pivot value. In our case the pivot value is "__cpu" and the data can be split into the scheduler switches happening on the different CPUs



In [4]:

    
from trappy.stats.Trigger import Trigger

task_pid = 4729

trigger_switch_in = Trigger(trace, 
                            trappy.sched.SchedSwitch,
                            pivot = "__cpu",
                            filters = {
                                "next_pid" : task_pid
                            },
                            value = 1)

trigger_switch_out = Trigger(trace, 
                            trappy.sched.SchedSwitch,
                            pivot = "__cpu",
                            filters = {
                                "prev_pid" : task_pid
                            },
                            value = -1)

Topology and Aggregation

Creating a CPU (pivot) Topology

A topology can be explained as a collection of different arrangements of groups of nodes. These different arrangements are called levels. Each level has a multiple groups of nodes. For example for a CPU topology can have the following levels:

CPU

  [
      [cpu0],
      [cpu1],
      .
      .
      [cpuN],
  ]

Cluster

  [
      [custer1_cpu1, cluster1_cpu2, ... cluster1_cpuM)],
      .
      .
      .
      [custerK_cpu1, clusterK_cpu2, ... clusterK_cpuP)]
  ]

System

  [
      [cpu0,
       cpu1,
       .
       .
       .
       cpuN]
  ]



In [5]:

    
from trappy.stats.Topology import Topology

cluster_0 = [0, 3, 4, 5]
cluster_1 = [1, 2]
clusters = [cluster_0, cluster_1]

topology = Topology(clusters=clusters)

Aggregator facilitates the aggregation of signals from different triggers. The aggregator also understands a topology over which the aggregation can be performed and this aggregation can be expressed as a function which takes pandas.Series as an input. For example.

    def aggfunc(series):
           return modify(series)

The elements in the topology should be a superset of the pivot values of the Triggers. In our case the pivot value is "__cpu", so, the Topology corresponds to a CPU topology.

Base Aggregation

Topology.flatten() gives the list of all the nodes in the Topology.
The node in the Topology corresponds to a pivot value in the Trigger
The signals for the pivot value are super imposed and stored in a dictionary.



In [6]:

    
topology.flatten()









    Out[6]:





[0, 1, 2, 3, 4, 5]

Here is an informal pseudo-code for this base aggregation:

base_signals = {}

for node in topology.flatten():
    node_signal = init_signal()

    for trigger in triggers:
        node_signal.union(trigger.get_signal_for_pivot_val(node))

    base_signal[node] = node_signal

Aggregation over the topology

The Aggregator.aggregate accepts a level parameter:

Each level in the topology contains a group of nodes
These can be accessed as
```
    topology.get_level(level)
```



In [7]:

    
topology.get_level("cluster")









    Out[7]:





[[0, 3, 4, 5], [1, 2]]

Here is an informal psuedo-code for this aggregation over the Topology:

agg_signal = []

for group_of_nodes in Topology.get_level(level):

    for node in group_of_nodes:

        group_signal = init_signal()
        group_signal += aggfunc(base_signals[node])

    agg_signal.append(group_signal)

Create the Aggregator



In [8]:

    
from trappy.stats.Aggregator import MultiTriggerAggregator

def no_operation_aggfunc(series):
    return series

triggers = [
            trigger_switch_in,
            trigger_switch_out
           ]


vector_agg = MultiTriggerAggregator(triggers, topology, no_operation_aggfunc)

Examples of Aggregation

Aggregate at Cluster level



In [9]:

    
level = "cluster"
result = vector_agg.aggregate(level=level)


# Utility Code for Viewing the data

clusters = (str(c) for c in topology.get_level(level))

for series, cluster in zip(result, clusters):
    plt.figure(figsize=(15,7))
    plt.plot(series.index, series.values)
    plt.title(cluster)

Aggregate at CPU level



In [10]:

    
level = "cpu"
result = vector_agg.aggregate(level=level)


# Utility Code for Viewing the data

cpus = (str(c) for c in topology.get_level(level))

for series, cpu in zip(result, cpus):
    plt.figure(figsize=(15,7))
    plt.plot(series.index, series.values)
    plt.title(cpu)

Aggregator Function can return Scalars

An aggregator function can return a scalar, for example the number of "switch ins". Each value in the result has a one-to-one correspondence with the groups in Topology.get_level(level)



In [11]:

    
def num_switch_ins(series):
    return len(series[series == 1])

scalar_agg = MultiTriggerAggregator(triggers, topology, num_switch_ins)

print scalar_agg.aggregate(level="cpu")
print scalar_agg.aggregate(level="cluster")









    



[0, 364, 0, 0, 0, 0]
[0, 364]

	__comm	__cpu	__pid	next_comm	next_pid	next_prio	prev_comm	prev_pid	prev_prio	prev_state
Time
0.000000	ls	2	4734	migration/2	18	0	trace-cmd	4734	120	1024
0.000022	migration/2	2	18	trace-cmd	4732	120	migration/2	18	0	1
0.000107	trace-cmd	1	4731	trace-cmd	4730	120	trace-cmd	4731	120	1
0.000127	trace-cmd	1	4730	trace-cmd	4729	120	trace-cmd	4730	120	1
0.000142	trace-cmd	1	4729	swapper/1	0	120	trace-cmd	4729	120	1