Using RADICAL-Analytics with RADICAL-Pilot and OSG Experiments

This notebook illustrates the analysis of two experiments performed with RADICAL-Pilot and OSG. The experiments use 4 1-core pilots and between 8 and 64 compute units (CU). RADICAL-Analytics is used to acquire two data sets produced by RADICAL Pilot and then to derive aggregated and per-entity performance measures.

The state models of RADICAL-Pilot's CU and Pilot entities is presented and all the state-based durations are defined and described. Among these durations, both aggregated and per-entity measures are computed and plotted. The aggregated measures are:

TTC: Total time to completion of the given workload, i.e., between 8 and 64 CU;
TTQ: Total time spent by the 4 1-core pilots in the OSG Connect queue;
TTR: Total time spent by the four pilots running on their respective work node;
TTX: Total time spent by all the CU executing their kernel.

Each aggregate measure takes into account the time overlap among entities in the same state. For example, if a pilot start running at time t_0, another at time t_0+5s and they both finish at time t_0+100s, TTR will be 100s, not 195s. The same calculation is done for partial, total, and null overlapping.

Single-entity performance measures are derived for each pilot and CU:

Tq: Time spent by a pilot in the queue of the local resource management system (LRMS);
Tx: Time spent by a CU executing its kernel.

The kernel of each CU is a Synapse executable emulating a GROMACS execution as specified in the GROMACS/CoCo ENSEMBLES use case.

We plot and compare these measurements across the two experiments to understand:

How the heterogeneity of OSG resources affects the execution of this type of workload.
Whether queuing time is as dominant as in XSEDE.

Table of Content

Setup

We need to setup both this notebook and the experiment environment. We start with the notebook and then we will move on to the experiment data.

Notebook Settings

Display matplotlib diagrams without having to use plt.show().



In [40]:

    
%matplotlib inline

Load all the Python modules we will use for the analysis. Note that both RADICAL Utils and RADICAL Pilot need to be loaded alongside RADICAL Analytics.



In [41]:

    
import os
import sys
import glob
import pprint

import numpy as np
import scipy as sp
import pandas as pd
import scipy.stats as sps
import statsmodels.api as sm

import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib.ticker as ticker
import matplotlib.gridspec as gridspec

import radical.utils as ru
import radical.pilot as rp
import radical.analytics as ra

from IPython.display import display

We configure matplotlib so to produce visually consistent diagrams that look readable and we can directly include in a paper written in LaTeX.



In [42]:

    
# Global configurations
# ---------------------

# Use LaTeX and its body font for the diagrams' text.
mpl.rcParams['text.usetex'] = True 
mpl.rcParams['font.family'] = 'serif'
mpl.rcParams['font.serif']  = ['Nimbus Roman Becker No9L']

# Use thinner lines for axes to avoid distractions.
mpl.rcParams['axes.linewidth']    = 0.75
mpl.rcParams['xtick.major.width'] = 0.75
mpl.rcParams['xtick.minor.width'] = 0.75
mpl.rcParams['ytick.major.width'] = 0.75
mpl.rcParams['ytick.minor.width'] = 0.75

# Do not use a box for the legend to avoid distractions.
mpl.rcParams['legend.frameon'] = False

# Helpers
# -------

# Use coordinated colors. These are the "Tableau 20" colors as 
# RGB. Each pair is strong/light. For a theory of color see:
# http://www.tableau.com/about/blog/2016/7/colors-upgrade-tableau-10-56782
# http://tableaufriction.blogspot.com/2012/11/finally-you-can-use-tableau-data-colors.html
tableau20 = [(31 , 119, 180), (174, 199, 232), # blue   [ 0,1 ]
             (255, 127, 14 ), (255, 187, 120), # orange [ 2,3 ]
             (44 , 160, 44 ), (152, 223, 138), # green  [ 4,5 ]
             (214, 39 , 40 ), (255, 152, 150), # red    [ 6,7 ]
             (148, 103, 189), (197, 176, 213), # purple [ 8,9 ]
             (140, 86 , 75 ), (196, 156, 148), # brown  [10,11]
             (227, 119, 194), (247, 182, 210), # pink   [12,13]
             (127, 127, 127), (199, 199, 199), # gray   [14,15]
             (188, 189, 34 ), (219, 219, 141), # yellow [16,17]
             (23 , 190, 207), (158, 218, 229)] # cyan   [18,19]
  
# Scale the RGB values to the [0, 1] range, which is the format 
# matplotlib accepts.    
for i in range(len(tableau20)):  
    r, g, b = tableau20[i]  
    tableau20[i] = (r / 255., g / 255., b / 255.)    

# Return a single plot without right and top axes
def fig_setup():
    fig = plt.figure(figsize=(13,7))
    ax = fig.add_subplot(111)  
    ax.spines["top"].set_visible(False)  
    ax.spines["right"].set_visible(False)  
    ax.get_xaxis().tick_bottom()  
    ax.get_yaxis().tick_left()
    
    return fig, ax

Experiment Settings

RADICAL-Pilot save runtime data in two types of file: profiles and databases. Profile files are written by the agent module on the work nodes of the remote resources; database files are stored in the MongoDB instance used for the workload execution. Both types of file need to be retrieved from the remote resource and from the MongoDB instance. Currently, ...

The data used in this notebook are collected in two compressed bzip2 tar archives, one for each experiment dataset. The two archives need to be decompressed into radical.analytics/use_cases/rp_on_osg/data. Once unzipped, we acquire the datasets by constructing a ra.session object for each experimental run.

We use a helper function to construct the session objects in bulk while keeping track of the experiment to which each run belong. We save both to a Pandas DataFrame. This helps to elaborate the datasets furthers offering methods specifically aimed at data analysis and plotting.



In [44]:

    
def load_data(rdir):
    sessions = {}
    experiments = {}
    start = rdir.rfind(os.sep)+1
    for path, dirs, files in os.walk(rdir):
        folders = path[start:].split(os.sep)
        if len(path[start:].split(os.sep)) == 2:
            sid = os.path.basename(glob.glob('%s/*.json' % path)[0])[:-5]
            if sid not in sessions.keys():
                sessions[sid] = {}
            sessions[sid] = ra.Session(sid, 'radical.pilot', src=path)
            experiments[sid] = folders[0]
    return sessions, experiments

# Load experiments' dataset into ra.session objects
# stored in a DataFrame.
rdir = 'data/'
sessions, experiments = load_data(rdir)
sessions = pd.DataFrame({'session': sessions,
                         'experiment': experiments})

# Check the first/last 3 rows
display(sessions.head(3))
display(sessions.tail(3))









    






  
    
      
      experiment
      session
    
  
  
    
      rp.session.radical.mingtha.017033.0007
      exp1
      <radical.analytics.session.Session object at 0...
    
    
      rp.session.radical.mingtha.017033.0008
      exp1
      <radical.analytics.session.Session object at 0...
    
    
      rp.session.radical.mingtha.017033.0009
      exp1
      <radical.analytics.session.Session object at 0...
    
  








    






  
    
      
      experiment
      session
    
  
  
    
      rp.session.radical.mturilli.017087.0004
      exp4
      <radical.analytics.session.Session object at 0...
    
    
      rp.session.radical.mturilli.017087.0005
      exp4
      <radical.analytics.session.Session object at 0...
    
    
      rp.session.radical.mturilli.017087.0006
      exp4
      <radical.analytics.session.Session object at 0...

Analysis

We measure a set of durations. Each duration has two and only two timestamps, the first always preceding in time the second. Each timestamp represents an event, in this case of a state transition.

Our choice of the durations depends on the design of the experiment for which we are collecting data. In this case, we want to measure the overall time to completion (TTC) of the run and isolate two of its components:

TTQ and Tq: The amount of time spent in the queue waiting for the pilots;
TTX and Tx: The amount of time spent to execute each unit.

We use TTQ and Tq to understand whether queue time has the same dominance on TTC as we measured on XSEDE. This comparison is relevant when evaluating the performance of OSG and the distribution of tasks to OSG, XSEDE, or across both.

TTX and Tx contribute to this understanding by showing how the supposed heterogeneity of OSG resources affects compute performance. The experiment is designed to use homogeneous CUs: every CU has the same compute requirements. Data requirements are not emulated so the differences in execution time across CUs can be related mainly to core performance.

Consistency

The first step in analyzing the experiments' dataset is to verify the consistenty of the data. Without such a verification, we cannot trust any of the results our analysis will produce. Feeling a nice chill down the spine thinking about the results you have already published?! ;)

Here we check for the consistency of the state model and of the timestamps. As documented in the RADICAL-Analytics API, a third test mode is available for the event models. Currently, event models are not implemented.



In [45]:

    
os.environ['RADICAL_ANALYTICS_VERBOSE']='ERROR'

for s in sessions['session']:
    s.consistency(['state_model','timestamps'])

Entities

We need to define the start and end event for TTQ, TTX, Tq, Tx durations. As such, we need to choose the RADICAL-Pilot entity or entities that are relevant to our measurements. We look at what entities were recorded by the experimental runs.



In [46]:

    
expment  = None
pexpment = None

for sid in sessions.index:
    etypes  = sessions.ix[sid, 'session'].list(['etype'])
    expment = sessions.ix[sid, 'experiment']
    if expment != pexpment:
        print '%s|%s|%s' % (expment, sid, etypes)
        pexpment = expment









    



exp1|rp.session.radical.mingtha.017033.0007|[['umgr', 'pmgr', 'agent_1', 'update', 'agent', 'agent_0', 'session', 'root', 'unit', 'pilot']]
exp2|rp.session.radical.mturilli.017051.0002|[['umgr', 'pmgr', 'agent_1', 'agent_0', 'agent', 'update', 'session', 'root', 'unit', 'pilot']]
exp3|rp.session.radical.mturilli.017076.0002|[['umgr', 'pmgr', 'agent_1', 'update', 'agent', 'agent_0', 'session', 'root', 'unit', 'pilot']]
exp4|rp.session.radical.mturilli.017079.0002|[['umgr', 'pmgr', 'agent_1', 'update', 'agent', 'agent_0', 'session', 'root', 'unit', 'pilot']]

We choose 'session', 'pilot', and 'unit'. At the moment, we do not need 'umgr', 'pmgr', 'update' as we want to measure and compare the overall duration of each session and the lifespan of pilots and units. Depending on the results of our analysis, we may want to extend these measurements and comparisons also to the RP managers.

Time To Completion (TTC)

The Session constructor initializes four properties for each session that we can directly access:

t_start: timestamp of the session start;
t_stop: timestamp of the session end;
ttc: total duration of the session;
t_range: the time range of the session.

We add a column in the sessions DataFrame with the TTC of each run.



In [47]:

    
for sid in sessions.index:
    sessions.ix[sid, 'TTC'] = sessions.ix[sid, 'session'].ttc

display(sessions[['TTC']].head(3))
display(sessions[['TTC']].tail(3))









    






  
    
      
      TTC
    
  
  
    
      rp.session.radical.mingtha.017033.0007
      1596.3935
    
    
      rp.session.radical.mingtha.017033.0008
      4904.5828
    
    
      rp.session.radical.mingtha.017033.0009
      2796.6535
    
  








    






  
    
      
      TTC
    
  
  
    
      rp.session.radical.mturilli.017087.0004
      2580.9824
    
    
      rp.session.radical.mturilli.017087.0005
      4595.9861
    
    
      rp.session.radical.mturilli.017087.0006
      1596.5316

We also add a column to the session dataframe with the number of units for each session.



In [48]:

    
for sid in sessions.index:
    sessions.ix[sid, 'nunits'] = len(sessions.ix[sid, 'session'].filter(etype='unit', inplace=False).get())
    
display(sessions[['nunits']].head(3))
display(sessions[['nunits']].tail(3))









    






  
    
      
      nunits
    
  
  
    
      rp.session.radical.mingtha.017033.0007
      16.0
    
    
      rp.session.radical.mingtha.017033.0008
      64.0
    
    
      rp.session.radical.mingtha.017033.0009
      32.0
    
  








    






  
    
      
      nunits
    
  
  
    
      rp.session.radical.mturilli.017087.0004
      64.0
    
    
      rp.session.radical.mturilli.017087.0005
      128.0
    
    
      rp.session.radical.mturilli.017087.0006
      16.0

We now have all the data to plot the TTC of all the experiments runs. We plot the runs of Experiment 1 on the left in blue and those of Experiment 2 on the right in orange.



In [49]:

    
fig = plt.figure(figsize=(13,14))
fig.suptitle('TTC XSEDE OSG Virtual Cluster', fontsize=14)
plt.subplots_adjust(wspace=0.3, top=0.85)

ttc_subplots = []
for exp in sessions['experiment'].sort_values().unique():
    ttc_subplots.append(sessions[ sessions['experiment'] == exp ].sort_values('TTC'))

colors = {'exp1': [tableau20[19]],
          'exp2': [tableau20[7] ],
          'exp3': [tableau20[13]],
          'exp4': [tableau20[17]],
          'exp5': [tableau20[15]]}


ax = []
for splt in range(4):
    session = ttc_subplots.pop(0)
    experiment = session['experiment'].unique()[0]
    ntasks = ', '.join([str(int(n)) for n in session['nunits'].unique()])
    color = colors[experiment]
    title = 'Experiment %s\n%s tasks; %s sessions.' % (experiment[3], ntasks, session.shape[0])

    if not ax:
        ax.append(fig.add_subplot(2, 2, splt+1))
    else:
        ax.append(fig.add_subplot(2, 2, splt+1, sharey=ax[0]))
    session['TTC'].plot(kind='bar', color=color, ax=ax[splt], title=title)
    ax[splt].spines["top"].set_visible(False)
    ax[splt].spines["right"].set_visible(False)
    ax[splt].get_xaxis().tick_bottom()
    ax[splt].get_yaxis().tick_left()
    ax[splt].set_xticklabels([])
    ax[splt].set_xlabel('Sessions')
    ax[splt].set_ylabel('Time (s)')
    ax[splt].legend(bbox_to_anchor=(1.25, 1))
    
    # Add table with statistical description of TTC values.
    table = pd.tools.plotting.table(ax[splt], 
                                    np.round(session['TTC'].describe(), 2),
                                    loc='upper center',
                                    colWidths=[0.2, 0.2, 0.2])
    
    # Eliminate the border of the table.
    for key, cell in table.get_celld().items():
        cell.set_linewidth(0)
    
    fig.add_subplot(ax[splt])
    
plt.savefig('figures/osg_ttc_experiments.pdf', dpi=600, bbox_inches='tight')

We see:

A large variation among the TTC of the runs of both experiments.
Variations are similar between the two experiments.

We still do not know:

How the observed differences in TTC varies depending on the size of the workload.
What and how the entities of RADICAL-Pilot contribute to TTC and its variations.
What and how resource properties contribute to TTC and its variations.

First, we subdivide each plot into four plots, one for each workload size: 8, 16, 32, 64 CUs.



In [10]:

    
fig = plt.figure(figsize=(13,14))

title = 'XSEDE OSG Virtual Cluster'
subtitle = 'TTC'
defs = {'ttq': 'TTQ = Total Time Queuing pilots',
        'ttr': 'TTR = Total Time Running pilots',
        'ttc': 'TTC = Total Time Completing experiment'}

fig.suptitle('%s:\n%s.\n%s.' % (title, 
                                subtitle, 
                                defs['ttc']), fontsize=14)

gs = []
grid  = gridspec.GridSpec(2, 2)
grid.update(wspace=0.4, top=0.85)
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[0]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[1]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[2]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 5, subplot_spec=grid[3]))

ttc_subplots = []
for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        if not sessions[ (sessions['experiment'] == exp) & 
                 (sessions['nunits'] == nun) ].empty:
            ttc_subplots.append(sessions[ (sessions['experiment'] == exp) & 
                                          (sessions['nunits'] == nun) ].sort_values('TTC'))

colors = {'exp1': [tableau20[19]],
          'exp2': [tableau20[7] ],
          'exp3': [tableau20[13]],
          'exp4': [tableau20[17]],
          'exp5': [tableau20[15]]}

nun_exp = []
nun_exp.append(len(sessions[sessions['experiment'] == 'exp1']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp2']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp3']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp4']['nunits'].sort_values().unique()))

ax = []
i  = 0
while(i < len(ttc_subplots)):
    for gn in range(4):
        for gc in range(nun_exp[gn]):
            session = ttc_subplots.pop(0)
            experiment = session['experiment'].unique()[0]
            ntasks = int(session['nunits'].unique()[0])
            repetitions = session.shape[0]
            color = colors[experiment]
            title = 'Exp. %s\n%s tasks\n%s reps.' % (experiment[3], ntasks, repetitions)
            
            if i == 0:
                ax.append(plt.Subplot(fig, gs[gn][0, gc]))
            else:
                ax.append(plt.Subplot(fig, gs[gn][0, gc], sharey=ax[0]))

            session['TTC'].plot(kind='bar', ax=ax[i], color=color, title=title)

            ax[i].spines["top"].set_visible(False)
            ax[i].spines["right"].set_visible(False)
            ax[i].get_xaxis().tick_bottom()
            ax[i].get_yaxis().tick_left()
            ax[i].set_xticklabels([])
            ax[i].set_xlabel('Runs')
            
            # Handle a bug that sets yticklabels to visible 
            # for the last subplot.
            if i == 7 or i == 16:
                plt.setp(ax[i].get_yticklabels(), visible=False)
            else:
                ax[i].set_ylabel('Time (s)')
                
            # Handle legens.
            if i == 7 or i == 3 or i == 11:
                ax[i].legend(labels=['TTC'], bbox_to_anchor=(2.25, 1))
            elif i == 16:
                ax[i].legend(labels=['TTC'], bbox_to_anchor=(2.70, 1))
#             else:
#                 ax[i].get_legend().set_visible(False)

            fig.add_subplot(ax[i])
            i += 1

plt.savefig('figures/osg_ttc_nunits.pdf', dpi=600, bbox_inches='tight')

The variations among TTC of workloads with the same number of CUs and between the two experiments are marked. Here we create a table with various measures of this variation for all the experiments and workload sizes.



In [50]:

    
ttc_stats = {}

for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        tag = exp+'_'+str(int(nun))
        ttc_stats[tag] = sessions[ (sessions['experiment'] == exp) & 
                                   (sessions['nunits'] == nun) ]['TTC'].describe()

ttc_compare = pd.DataFrame(ttc_stats)

sort_cols = ['exp1_8' , 'exp2_8' , 'exp1_16', 'exp2_16', 
             'exp1_32', 'exp2_32', 'exp1_64', 'exp2_64']
ttc_compare = ttc_compare.reindex_axis(sort_cols, axis=1)

ttc_compare









    Out[50]:






  
    
      
      exp1_8
      exp2_8
      exp1_16
      exp2_16
      exp1_32
      exp2_32
      exp1_64
      exp2_64
    
  
  
    
      count
      4.000000
      4.000000
      5.000000
      4.000000
      3.000000
      4.000000
      4.000000
      3.000000
    
    
      mean
      1329.148650
      2169.554250
      1889.015980
      3385.244650
      3215.066967
      3550.343500
      6111.126250
      7431.210467
    
    
      std
      75.696194
      1218.997636
      292.162669
      828.519558
      362.381967
      732.720257
      1200.296161
      2470.739147
    
    
      min
      1217.785800
      1479.814100
      1596.393500
      2649.923700
      2796.653500
      2761.210100
      4904.582800
      4790.166400
    
    
      25%
      1313.580675
      1511.927450
      1793.549300
      2770.503525
      3108.323600
      3009.557000
      5486.647825
      6303.719150
    
    
      50%
      1360.088800
      1602.517150
      1802.504700
      3222.135900
      3419.993700
      3625.530900
      5895.695900
      7817.271900
    
    
      75%
      1375.656775
      2260.143950
      1874.733900
      3836.877025
      3424.273700
      4166.317400
      6520.174325
      8751.732500
    
    
      max
      1378.631200
      3993.368600
      2377.898500
      4446.783100
      3428.553700
      4189.102100
      7748.530400
      9686.193100

For 8 and 16 CUs, the TTC of the second experiment shows a mean twice as large as those of the first experiment. Less pronounced is the difference for mean of the TTC of 32 CUs. The mean of the TTC of the first experiment is 25% smaller that the one of the second experiment for 64 CUs.

The standard deviation among runs of the same experiment and number of CUs, varies between the two experiments. We describe this variation by calculating and comparing STD/mean for each experiment and workload size.



In [51]:

    
(ttc_compare.loc['std']/ttc_compare.loc['mean'])*100









    Out[51]:





exp1_8      5.695089
exp2_8     56.186548
exp1_16    15.466395
exp2_16    24.474437
exp1_32    11.271366
exp2_32    20.638010
exp1_64    19.641161
exp2_64    33.248138
dtype: float64

We notice that STD/mean among repetitions of the same run in Experiment 1 goes from 6.55% to 19.77%, increasing with the increase of the number of CUs. In Experiment 2, STD/mean goes from 20.63% up to 56.18%, independently from the amount of CUs executed by the repeated run. This shows:

More measurements are needed to lower STD;
TTC of experiment 2 may be influenced by one or more factors not present or relevant in Experiment 1.

In order to clarify this discrepancy between Experiments, we need to measure the components of TTC to understand what determines the differences in TTC within the same experiment and between the two. These measurements requires to:

define the state model and transitions of the pilot and CU entities;
verify that this model has been consistently implemented by each experimental run;
define and measure the duration of each state and compare them across experimental runs.

Pilot State Model

From the RADICAL-Pilot documentation and state model description, we know that:

The states of the pilots are therefore as follow:

pilot described, state NEW;
pilot being queued in a pilot manager (PMGR), state PMGR_LAUNCHING_PENDING;
pilot being queued in a local resource management system (LRMS), state PMGR_LAUNCHING;
pilot having a bootstrapping agent, state PMGR_ACTIVE_PENDING;
pilot having an active agent, state PMGR_ACTIVE;
pilot marked as done by the PMGR, state DONE.

We verify whether our run match this model and we test whether the state model of each pilot of every session of all our experiments is the same. In this way we will know:

whether our data are consistent with the authoritative RADICAL-Pilot state model; and
what states of the pilots we can compare in our analysis given our dataset.



In [52]:

    
last_sv = None
last_id = None

for s in sessions['session']:
    sv = s.describe('state_values', etype=['pilot']).values()[0].values()[0]
   
    if last_sv and last_sv != sv:
        print "Different state models:\n%s = %s\n%s = %s" % (last_id, last_sv, sid, sv)

    last_sv = sv
    last_id = s._sid
        
pprint.pprint(last_sv)









    



{-1: None,
 0: 'NEW',
 1: 'PMGR_LAUNCHING_PENDING',
 2: 'PMGR_LAUNCHING',
 3: 'PMGR_ACTIVE_PENDING',
 4: 'PMGR_ACTIVE',
 5: ['CANCELED', 'FAILED', 'DONE']}

Pilot State Durations

We define four durations to measure the aggreted time spent by all the pilots in each state:

Duration	Start timestamp	End time Stamp	Description
TT_PILOT_PMGR_SCHEDULING	NEW	PMGR_LAUNCHING_PENDING	total time spent by a pilot being scheduled to a PMGR
TT_PILOT_PMGR_QUEUING	PMGR_LAUNCHING_PENDING	PMGR_LAUNCHING	total time spent by a pilot in a PMGR queue
TT_PILOT_LRMS_SUBMITTING	PMGR_LAUNCHING	PMGR_ACTIVE_PENDING	total time spent by a pilot being submitted to a LRMS
TT_PILOT_LRMS_QUEUING	PMGR_ACTIVE_PENDING	PMGR_ACTIVE	total time spent by a pilot being queued in a LRMS queue
TT_PILOT_LRMS_RUNNING	PMGR_ACTIVE	DONE	total time spent by a pilot being active

We should note that:

Every state transition can end in state CANCELLED or FAILED, depending on the execution conditions. While this has no bearing on the semantics of the state model, when measuring durations we need to keep that in mind. This is why the API of session.duration() allows for passing lists of states as initial and end timestamp.
In presence of multiple pilots, the queue time of one or more pilot can overlap, partially overlap, or not overlap at all. When calculating the total amount of queue time for the whole run, we need to account for overlapping and, therefore, for time subtractions or additions. Luckily, the method session.duration() does all this for us.

Time to record some durations!



In [53]:

    
# Model of pilot durations.
ttpdm = {'TT_PILOT_PMGR_SCHEDULING': ['NEW'                   ,  'PMGR_LAUNCHING_PENDING'],
         'TT_PILOT_PMGR_QUEUING'   : ['PMGR_LAUNCHING_PENDING',  'PMGR_LAUNCHING'],
         'TT_PILOT_LRMS_SUBMITTING': ['PMGR_LAUNCHING'        ,  'PMGR_ACTIVE_PENDING'],
         'TT_PILOT_LRMS_QUEUING'   : ['PMGR_ACTIVE_PENDING'   ,  'PMGR_ACTIVE'],
         'TT_PILOT_LRMS_RUNNING'   : ['PMGR_ACTIVE'           , ['DONE',
                                                                 'CANCELED',
                                                                 'FAILED']]}

# Add total pilot durations to sessions' DF.
for sid in sessions.index:
    s = sessions.ix[sid, 'session'].filter(etype='pilot', inplace=False)
    for d in ttpdm.keys():
        sessions.ix[sid, d] = s.duration(ttpdm[d])

# Print the relevant portion of the 'session' DataFrame.
display(sessions[['TT_PILOT_PMGR_SCHEDULING', 'TT_PILOT_PMGR_QUEUING', 
                  'TT_PILOT_LRMS_SUBMITTING', 'TT_PILOT_LRMS_QUEUING', 
                  'TT_PILOT_LRMS_RUNNING']].head(3))
display(sessions[['TT_PILOT_PMGR_SCHEDULING', 'TT_PILOT_PMGR_QUEUING', 
                  'TT_PILOT_LRMS_SUBMITTING', 'TT_PILOT_LRMS_QUEUING', 
                  'TT_PILOT_LRMS_RUNNING']].tail(3))









    






  
    
      
      TT_PILOT_PMGR_SCHEDULING
      TT_PILOT_PMGR_QUEUING
      TT_PILOT_LRMS_SUBMITTING
      TT_PILOT_LRMS_QUEUING
      TT_PILOT_LRMS_RUNNING
    
  
  
    
      rp.session.radical.mingtha.017033.0007
      0.1115
      0.0021
      40.8441
      448.9312
      1331.9196
    
    
      rp.session.radical.mingtha.017033.0008
      0.1159
      0.0019
      32.6656
      172.4000
      4716.4960
    
    
      rp.session.radical.mingtha.017033.0009
      0.1041
      0.0022
      30.7743
      235.8966
      2603.0602
    
  








    






  
    
      
      TT_PILOT_PMGR_SCHEDULING
      TT_PILOT_PMGR_QUEUING
      TT_PILOT_LRMS_SUBMITTING
      TT_PILOT_LRMS_QUEUING
      TT_PILOT_LRMS_RUNNING
    
  
  
    
      rp.session.radical.mturilli.017087.0004
      0.0992
      0.0031
      34.9302
      174.6902
      2391.6327
    
    
      rp.session.radical.mturilli.017087.0005
      0.0966
      0.0023
      34.8597
      201.2169
      4397.9872
    
    
      rp.session.radical.mturilli.017087.0006
      0.0973
      0.0015
      34.5969
      266.3615
      1412.9129

Total Time Queueing (TTQ)

We can now measure the first component of TTC: total queuing time (TTQ), i.e., the portion of TTC spent waiting for the pilots of each run to become active while they were queued in the OSG Connect broker. We choose TTQ because of the dominant role it plays on XSEDE and the need to compare that role with the one played within OSG.



In [15]:

    
fig = plt.figure(figsize=(13,14))

title = 'XSEDE OSG Virtual Cluster'
subtitle = 'TTQ and TTC.'
defs = {'ttq': 'TTQ = Total Time Queuing pilots',
        'ttr': 'TTR = Total Time Running pilots',
        'ttc': 'TTC = Total Time Completing experiment'}

fig.suptitle('%s:\n%s.\n%s;\n%s.' % (title, 
                                     subtitle,
                                     defs['ttq'],
                                     defs['ttc']), fontsize=14)

gs = []
grid  = gridspec.GridSpec(2, 2)
grid.update(wspace=0.4, top=0.85)
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[0]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[1]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[2]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 5, subplot_spec=grid[3]))

ttc_subplots = []
for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        if not sessions[ (sessions['experiment'] == exp) & 
                 (sessions['nunits'] == nun) ].empty:
            ttc_subplots.append(sessions[ (sessions['experiment'] == exp) & 
                                          (sessions['nunits'] == nun) ].sort_values('TTC'))

colors = {'exp1': [tableau20[0] ,tableau20[19]],
          'exp2': [tableau20[2] ,tableau20[7] ],
          'exp3': [tableau20[8] ,tableau20[13]],
          'exp4': [tableau20[4] ,tableau20[17]],
          'exp5': [tableau20[10],tableau20[15]]}


nun_exp = []
nun_exp.append(len(sessions[sessions['experiment'] == 'exp1']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp2']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp3']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp4']['nunits'].sort_values().unique()))

ax = []
i  = 0
while(i < len(ttc_subplots)):
    for gn in range(4):
        for gc in range(nun_exp[gn]):
            session = ttc_subplots.pop(0)
            experiment = session['experiment'].unique()[0]
            ntasks = int(session['nunits'].unique()[0])
            repetitions = session.shape[0]
            color = colors[experiment]
            title = 'Exp. %s\n%s tasks\n%s reps.' % (experiment[3], ntasks, repetitions)
            
            if i == 0:
                ax.append(plt.Subplot(fig, gs[gn][0, gc]))
            else:
                ax.append(plt.Subplot(fig, gs[gn][0, gc], sharey=ax[0]))

            session[['TT_PILOT_LRMS_QUEUING', 
                     'TTC']].plot(kind='bar', ax=ax[i], color=color, title=title, stacked=True)

            ax[i].spines["top"].set_visible(False)
            ax[i].spines["right"].set_visible(False)
            ax[i].get_xaxis().tick_bottom()
            ax[i].get_yaxis().tick_left()
            ax[i].set_xticklabels([])
            ax[i].set_xlabel('Runs')

            # Handle a bug that sets yticklabels to visible 
            # for the last subplot.
            if i == 7 or i == 16:
                plt.setp(ax[i].get_yticklabels(), visible=False)
            else:
                ax[i].set_ylabel('Time (s)')
                
            # Handle legens.
            if i == 7 or i == 3 or i == 11:
                ax[i].legend(labels=['TTQ','TTC'], bbox_to_anchor=(2.25, 1))
            elif i == 16:
                ax[i].legend(labels=['TTQ','TTC'], bbox_to_anchor=(2.70, 1))
            else:
                ax[i].get_legend().set_visible(False)
            
            fig.add_subplot(ax[i])
            i += 1

plt.savefig('figures/osg_ttq_ttc_nunits.pdf', dpi=600, bbox_inches='tight')

Across runs and experiments, TTQ is:

relatively consistent.
a small part of TTC.



In [54]:

    
ttc_stats = {}

for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        tag = exp+'_'+str(int(nun))
        ttc_stats[tag] = sessions[ (sessions['experiment'] == exp) & 
                                   (sessions['nunits'] == nun) ]['TT_PILOT_LRMS_QUEUING'].describe()

ttc_compare = pd.DataFrame(ttc_stats)

sort_cols = ['exp1_8' , 'exp2_8' , 'exp1_16', 'exp2_16', 
             'exp1_32', 'exp2_32', 'exp1_64', 'exp2_64']
ttc_compare = ttc_compare.reindex_axis(sort_cols, axis=1)

ttc_compare.round(2)

Even with an average of just 4 runs for each workload size, TTQ variance is relatively small with a couple of exceptions, as shown by STD/mean



In [55]:

    
std_mean = (ttc_compare.loc['std']/ttc_compare.loc['mean'])*100
std_mean.round(2)









    Out[55]:





exp1_8     34.27
exp2_8     11.69
exp1_16    32.36
exp2_16    20.54
exp1_32     5.97
exp2_32    20.31
exp1_64    14.55
exp2_64    18.74
dtype: float64

As with TTC, more experiments and runs are needed. Importantly, due to the dynamic variables of OSG behavior (e.g., number, type or resources available to the broker at any given point in time), it would be useful to perform the runs sequentially so to collect the data (relatively) independent from these dynamics; or characterize these dynamics with long term measurements taken at discrete intervals.

Queue Time (Tq)

Every run uses four 1-core pilots, submitted to the OSG broker. We can look at the frequency of each pilot Tq (i.e., not TTQ but the queuing time of each pilot) without distinguishing among workload sizes and between experiments. We create a new DataFrame pilot containing all the durations of each pilot, not sessions.

Duration	Start timestamp	End time Stamp	Description
PMGR_SCHEDULING	NEW	PMGR_LAUNCHING_PENDING	Time spent by a pilot being scheduled to a PMGR
PMGR_QUEUING	PMGR_LAUNCHING_PENDING	PMGR_LAUNCHING	Time spent by a pilot in a PMGR queue
LRMS_SUBMITTING	PMGR_LAUNCHING	PMGR_ACTIVE_PENDING	Time spent by a pilot being submitted to a LRMS
LRMS_QUEUING	PMGR_ACTIVE_PENDING	PMGR_ACTIVE	Time spent by a pilot being queued in a LRMS queue
LRMS_RUNNING	PMGR_ACTIVE	DONE	Time spent by a pilot being active

Note how only the name of the duration changes when comparing this table to the table with the durations for a session. The pilot state model is always the same, here it is used to calculate durations for a specific entity; for a session is used to calculate the aggregated duration for all the entities of a type in that session.



In [56]:

    
# Model of pilot durations.
pdm = {'PMGR_SCHEDULING': ['NEW'                   ,  'PMGR_LAUNCHING_PENDING'],
       'PMGR_QUEUING'   : ['PMGR_LAUNCHING_PENDING',  'PMGR_LAUNCHING'],
       'LRMS_SUBMITTING': ['PMGR_LAUNCHING'        ,  'PMGR_ACTIVE_PENDING'],
       'LRMS_QUEUING'   : ['PMGR_ACTIVE_PENDING'   ,  'PMGR_ACTIVE'],
       'LRMS_RUNNING'   : ['PMGR_ACTIVE'           , ['DONE',
                                                      'CANCELED',
                                                      'FAILED']]}

# DataFrame structure for pilot durations. 
pds = { 'pid': [],
        'sid': [],
        'experiment'     : [],
        'PMGR_SCHEDULING': [],
        'PMGR_QUEUING'   : [],
        'LRMS_SUBMITTING': [],
        'LRMS_QUEUING'   : [],
        'LRMS_RUNNING'   : []}

# Calculate the duration for each state of each 
# pilot of each run and Populate the DataFrame 
# structure.
for sid in sessions.index:
    s = sessions.ix[sid, 'session'].filter(etype='pilot', inplace=False)
    for p in s.list('uid'):
        sf = s.filter(uid=p, inplace=False)
        pds['pid'].append(p)
        pds['sid'].append(sid)
        pds['experiment'].append(sessions.ix[sid, 'experiment'])
        for d in pdm.keys():
            if (not sf.timestamps(state=pdm[d][0]) or 
                not sf.timestamps(state=pdm[d][1])):
                pds[d].append(None)
                continue
            pds[d].append(sf.duration(pdm[d]))

# Populate the DataFrame.
pilots = pd.DataFrame(pds)

display(pilots.head(3))
display(pilots.tail(3))









    






  
    
      
      LRMS_QUEUING
      LRMS_RUNNING
      LRMS_SUBMITTING
      PMGR_QUEUING
      PMGR_SCHEDULING
      experiment
      pid
      sid
    
  
  
    
      0
      217.1893
      1330.8070
      40.8441
      0.0021
      0.1115
      exp1
      pilot.0002
      rp.session.radical.mingtha.017033.0007
    
    
      1
      220.5954
      1321.0085
      40.8441
      0.0021
      0.1115
      exp1
      pilot.0003
      rp.session.radical.mingtha.017033.0007
    
    
      2
      448.9312
      1096.9671
      40.8441
      0.0021
      0.1115
      exp1
      pilot.0000
      rp.session.radical.mingtha.017033.0007
    
  








    






  
    
      
      LRMS_QUEUING
      LRMS_RUNNING
      LRMS_SUBMITTING
      PMGR_QUEUING
      PMGR_SCHEDULING
      experiment
      pid
      sid
    
  
  
    
      365
      154.5414
      1400.0966
      34.5969
      0.0015
      0.0973
      exp4
      pilot.0007
      rp.session.radical.mturilli.017087.0006
    
    
      366
      155.1169
      1399.5211
      34.5969
      0.0015
      0.0973
      exp4
      pilot.0004
      rp.session.radical.mturilli.017087.0006
    
    
      367
      141.7251
      1412.9129
      34.5969
      0.0015
      0.0973
      exp4
      pilot.0005
      rp.session.radical.mturilli.017087.0006

We calculate add the name of the resource (hostID) on which the pilot (agent) have become active to the pilots DataFrame. Often, the hostID recoreded by RADICAL-Pilot is not the public name of the resource on whic the pilot becomes active but, instead, the name of a working/compute node/unit of that resource. We use an heuristic to isolate the portion of the hostID string that is common to all the nodes/units of the same resource. It should be noted though, that in some cases this is not possible.



In [57]:

    
def parse_osg_hostid(hostid):
    '''
    Heuristic: eliminate node-specific information from hostID.
    '''
    domain = None
    
    # Split domain name from IP.
    host = hostid.split(':')

    # Split domain name into words.
    words = host[0].split('.')

    # Get the words in the domain name that do not contain 
    # numbers. Most hostnames have no number but there are 
    # exceptions.
    literals = [l for l in words if not any((number in set('0123456789')) for number in l)]            

    # Check for exceptions: 
    # a. every word of the domain name has a number
    if len(literals) == 0:

        # Some hostname use '-' instead of '.' as word separator.
        # The parser would have returned a single word and the 
        # any of that word may have a number.
        if '-' in host[0]:
            words = host[0].split('-')
            literals = [l for l in words if not any((number in set('0123456789')) for number in l)]

            # FIXME: We do not check the size of literals. 
            domain = '.'.join(literals)

        # Some hostnames may have only the name of the node. We
        # have to keep the IP to decide later on whether two nodes
        # are likely to belong to the same cluster.
        elif 'n' in host[0] or 'nod' in host[0]:
            domain = '.'.join(host)

        # The hostname is identified by an alphanumeric string
        else:
            domain = '.'.join(host)

    # Some hostnames DO have numbers in their name.
    elif len(literals) == 1:
        domain = '.'.join(words[1:])

    # Some hostname are just simple to parse.
    else:
        domain = '.'.join(literals)

    return domain

We use this heuristic with the pilots DataFrame to which we add two columns: 'hostID' and 'parsed_hostID'.



In [58]:

    
for pix in pilots.index:
    sid = pilots.ix[pix,'sid']
    pid = pilots.ix[pix,'pid']
    pls = sessions.ix[sid, 'session'].filter(uid=pid, inplace=False).get(etype=['pilot'])
    if len(pls) != 1:
        print "Error: session filter on uid returned multiple pilots"
        break
    hostid = pls[0].cfg['hostid']
    if hostid:
        domain = parse_osg_hostid(hostid)
    else:
        domain = np.nan
    pilots.ix[pix,'hostID'] = hostid
    pilots.ix[pix,'parsed_hostID'] = domain

We plot the frequency of Tq for both experiments as histogram. Ideally, this should be the first step towards the definition of the characteristic distribution of Tq. Part of this characterization will be to study how stable this distribution is across time, i.e., between two experiments executed at different point in time. As we know that the pool or resources of OSG Is both heterogeneous and dynamic, we expect this distribution not to be stable across time due to the potentially different pool of resources available at any point in time.



In [21]:

    
fig, ax = fig_setup()
title='XSEDE OSG Virtual Cluster\nDensity of Pilot Tq'

tq_exp1 = pilots[pilots['experiment'].str.contains('exp1')]['LRMS_QUEUING'].dropna().reset_index(drop=True)
tq_exp2 = pilots[pilots['experiment'].str.contains('exp2')]['LRMS_QUEUING'].dropna().reset_index(drop=True)
tq_exp3 = pilots[pilots['experiment'].str.contains('exp3')]['LRMS_QUEUING'].dropna().reset_index(drop=True)
tq_exp4 = pilots[pilots['experiment'].str.contains('exp4')]['LRMS_QUEUING'].dropna().reset_index(drop=True)

plots = pd.DataFrame({'exp1': tq_exp1, 'exp2': tq_exp2,  'exp3': tq_exp3, 'exp4': tq_exp4}) 

#plots.plot.hist(ax=ax, color=[tableau20[19],tableau20[7],tableau20[13],tableau20[17]], title=title)
plots.plot.density(ax=ax, color=[tableau20[0],tableau20[2],tableau20[8],tableau20[4]], title=title)

ax.set_xlabel('Time (s)')
ax.legend(labels=['Tq Experiment 1','Tq Experiment 2','Tq Experiment 3','Tq Experiment 4'])

plt.savefig('figures/osg_tq_frequency.pdf', dpi=600, bbox_inches='tight')

The diagrams hint to bimodal distributions, more measurements are required to study this further.

Total Time Running (TTR)

We know that TTQ marginally contributes to the TTC of each run. We still do not know whether most of TTC depends on the running time of the pilots or on the overheads of bootstrapping and managing them. We suspect the former but we have to exclude the latter.

We define TTR as the aggregated time spent by the pilots in running state. We plot TTR stacked with TTQ and TTC and we verify whether TTR and the remaining part of TTC are analogous. As usual, this could be done just numerically but, hey, we spent a senseless ton of time dooming matplotlib so now we want to use it!



In [22]:

    
fig = plt.figure(figsize=(13,14))

title = 'XSEDE OSG Virtual Cluster'
subtitle = 'TTQ, TTR and TTC'
defs = {'ttq': 'TTQ = Total Time Queuing pilots',
        'ttr': 'TTR = Total Time Running pilots',
        'ttc': 'TTC = Total Time Completing experiment'}

fig.suptitle('%s:\n%s.\n%s;\n%s;\n%s.' % (title, 
                                      subtitle, 
                                      defs['ttq'], 
                                      defs['ttr'], 
                                      defs['ttc']), fontsize=14)

gs = []
grid  = gridspec.GridSpec(2, 2)
grid.update(wspace=0.4, top=0.85)
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[0]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[1]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[2]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 5, subplot_spec=grid[3]))

ttq_subplots = []
for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        if not sessions[ (sessions['experiment'] == exp) & 
                 (sessions['nunits'] == nun) ].empty:
            ttq_subplots.append(sessions[ (sessions['experiment'] == exp) & 
                                          (sessions['nunits'] == nun) ].sort_values('TTC'))

colors = {'exp1': [tableau20[0] ,tableau20[18],tableau20[19]],
          'exp2': [tableau20[2] ,tableau20[6] ,tableau20[7] ],
          'exp3': [tableau20[8] ,tableau20[12],tableau20[13]],
          'exp4': [tableau20[4] ,tableau20[16],tableau20[17]],
          'exp5': [tableau20[10],tableau20[14],tableau20[15]]}

nun_exp = []
nun_exp.append(len(sessions[sessions['experiment'] == 'exp1']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp2']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp3']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp4']['nunits'].sort_values().unique()))

ax = []
i  = 0
while(i < len(ttq_subplots)):
    for gn in range(4):
        for gc in range(nun_exp[gn]):
            session = ttq_subplots.pop(0)
            experiment = session['experiment'].unique()[0]
            ntasks = int(session['nunits'].unique()[0])
            repetitions = session.shape[0]
            color = colors[experiment]
            title = 'Exp. %s\n%s tasks\n%s rep.' % (experiment[3], ntasks, repetitions)
            
            if i == 0:
                ax.append(plt.Subplot(fig, gs[gn][0, gc]))
            else:
                ax.append(plt.Subplot(fig, gs[gn][0, gc], sharey=ax[0]))

            session[['TT_PILOT_LRMS_QUEUING', 
                     'TT_PILOT_LRMS_RUNNING',
                     'TTC']].plot(kind='bar', ax=ax[i], color=color, title=title, stacked=True)
            
            ax[i].spines["top"].set_visible(False)
            ax[i].spines["right"].set_visible(False)
            ax[i].get_xaxis().tick_bottom()
            ax[i].get_yaxis().tick_left()
            ax[i].set_xticklabels([])
            ax[i].set_xlabel('Runs')
            
            # Handle a bug that sets yticklabels to visible 
            # for the last subplot.
            if i == 7 or i == 16:
                plt.setp(ax[i].get_yticklabels(), visible=False)
            else:
                ax[i].set_ylabel('Time (s)')
                
            # Handle legens.
            if i == 7 or i == 3 or i == 11:
                ax[i].legend(labels=['TTQ','TTR','TTC'], bbox_to_anchor=(2.25, 1))
            elif i == 16:
                ax[i].legend(labels=['TTQ','TTR','TTC'], bbox_to_anchor=(2.70, 1))
            else:
                ax[i].get_legend().set_visible(False)
            
            fig.add_subplot(ax[i])
            i += 1

plt.savefig('figures/osg_ttq_ttr_ttc_nunits.pdf', dpi=600, bbox_inches='tight')

As expected, TTR is largely equivalent to TTC-TTQ. This tells us that we will have to investigate the time spent describing, binding, scheduling, and executing CUs, measuring whether pilots TTR is spent effectively running CUs or managing them. Also, we will have to measure how much time is spent staging data in and out to and from the resources. In these experiments, data staging was not performed so we will limit our analysis to the execution time of CUs.

In the diagram above, we confirm that the differences previously observed between the TTC of Experiment 1 and 2 (particularly evident when comparing runs with 8 and 64 units). As they depend on running pilots, we investigate whether every run uses the same amount of (active) pilots to execute CUs. The rationale is that when using fewer pilots, running the same amount of CUs will take more time (sequential execution on a single core).

We add a column to the session DataFrame with the number of pilots for each session.



In [59]:

    
# Temporary: workaround for bug ticket \#15. Calculates 
# the number of active pilots by looking into the 
# length of the list returned by timestamp on the 
# PMGR_ACTIVE state.

for sid in sessions.index:
    sessions.ix[sid, 'npilot_active'] = len(sessions.ix[sid, 'session'].filter(etype='pilot', inplace=False).timestamps(state='PMGR_ACTIVE'))

display(sessions[['npilot_active']].head(3))
display(sessions[['npilot_active']].tail(3))









    






  
    
      
      npilot_active
    
  
  
    
      rp.session.radical.mingtha.017033.0007
      4.0
    
    
      rp.session.radical.mingtha.017033.0008
      4.0
    
    
      rp.session.radical.mingtha.017033.0009
      4.0
    
  








    






  
    
      
      npilot_active
    
  
  
    
      rp.session.radical.mturilli.017087.0004
      8.0
    
    
      rp.session.radical.mturilli.017087.0005
      8.0
    
    
      rp.session.radical.mturilli.017087.0006
      8.0

We add the numbers of pilot that become active in each run to the plot we used above to show TTQ, TTR, and TTC.



In [24]:

    
fig = plt.figure(figsize=(13,14))

title = 'XSEDE OSG Virtual Cluster'
subtitle = 'TTQ, TTR and TTC with Number of Active Pilots'
defs = {'ttq': 'TTQ = Total Time Queuing pilots',
        'ttr': 'TTR = Total Time Running pilots',
        'ttc': 'TTC = Total Time Completing experiment'}

fig.suptitle('%s:\n%s.\n%s;\n%s;\n%s.' % (title, 
                                      subtitle, 
                                      defs['ttq'], 
                                      defs['ttr'], 
                                      defs['ttc']), fontsize=14)

gs = []
grid  = gridspec.GridSpec(2, 2)
grid.update(wspace=0.4, top=0.85)
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[0]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[1]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[2]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 5, subplot_spec=grid[3]))

ttq_subplots = []
for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        if not sessions[ (sessions['experiment'] == exp) & 
                 (sessions['nunits'] == nun) ].empty:
            ttq_subplots.append(sessions[ (sessions['experiment'] == exp) & 
                                          (sessions['nunits'] == nun) ].sort_values('TTC'))

colors = {'exp1': [tableau20[0] ,tableau20[18],tableau20[19]],
          'exp2': [tableau20[2] ,tableau20[6] ,tableau20[7] ],
          'exp3': [tableau20[8] ,tableau20[12],tableau20[13]],
          'exp4': [tableau20[4] ,tableau20[16],tableau20[17]],
          'exp5': [tableau20[10],tableau20[14],tableau20[15]]}

nun_exp = []
nun_exp.append(len(sessions[sessions['experiment'] == 'exp1']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp2']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp3']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp4']['nunits'].sort_values().unique()))

ax = []
i  = 0
while(i < len(ttq_subplots)):
    for gn in range(4):
        for gc in range(nun_exp[gn]):
            session = ttq_subplots.pop(0)
            experiment = session['experiment'].unique()[0]
            ntasks = int(session['nunits'].unique()[0])
            repetitions = session.shape[0]
            color = colors[experiment]
            title = 'Exp. %s\n%s tasks\n%s rep.' % (experiment[3], ntasks, repetitions)
            
            if i == 0:
                ax.append(plt.Subplot(fig, gs[gn][0, gc]))
            else:
                ax.append(plt.Subplot(fig, gs[gn][0, gc], sharey=ax[0]))
            session[['TT_PILOT_LRMS_QUEUING', 
                     'TT_PILOT_LRMS_RUNNING',
                     'TTC']].plot(kind='bar', ax=ax[i], color=color, title=title, stacked=True)
            ax[i].spines["top"].set_visible(False)
            ax[i].spines["right"].set_visible(False)
            ax[i].get_xaxis().tick_bottom()
            ax[i].get_yaxis().tick_left()
            ax[i].set_xticklabels([])
            ax[i].set_xlabel('Runs')
            
            # Handle a bug that sets yticklabels to visible 
            # for the last subplot.
            if i == 7 or i == 16:
                plt.setp(ax[i].get_yticklabels(), visible=False)
            else:
                ax[i].set_ylabel('Time (s)')
                
            # Handle legens.
            if i == 7 or i == 3 or i == 11:
                ax[i].legend(labels=['TTQ','TTR','TTC'], bbox_to_anchor=(2.25, 1))
            elif i == 16:
                ax[i].legend(labels=['TTQ','TTR','TTC'], bbox_to_anchor=(2.70, 1))
            else:
                ax[i].get_legend().set_visible(False)
                
            # Add labels with number of pilots per session.
            rects = ax[i].patches
            labels = [int(l) for l in session['npilot_active']]
            for rect, label in zip(rects[-repetitions:], labels):
                height = rect.get_height()
                ax[i].text(rect.get_x() + rect.get_width()/2, 
                           (height*2), label, ha='center', 
                           va='bottom')

            fig.add_subplot(ax[i])
            i += 1

plt.savefig('figures/osg_ttq_ttr_ttc_npactive_nunits.pdf', dpi=600, bbox_inches='tight')

As expected, the largest differences we observed in TTC and TTR among the runs with the same number of CU and among experiments map to the number of pilots used to execute CUs. Our analysis show that the two experiments have a different number of independent variables. Any comparison has to take into account whether the measure observed depends on the number of active pilots used to execute CUs.

We have now exhausted the analyses of TTC via the defined pilot durations. We have now to look at the composition of TTR, namely at the durations depending on the CU states.

Compute Unit State Model Interpretation

From the RADICAL-Pilot documentation and state model description, we know that:

The states of the units are therefore as follow:

unit described, state NEW;
unit queuing in a unit manager (UMGR)'s queue, state UMGR_SCHEDULING_PENDING;
unit being scheduled by a UMGR to an active pilot agent, state UMGR_SCHEDULING;
input file(s) of a scheduling unit queuing in a UMGR's queue, state UMGR_STAGING_INPUT_PENDING;
input file(s) of a scheduling unit being staged to a (pilot) agent's MongoDB queue. The agent is the same on which the input file(s)' unit is being scheduled, state UMGR_STAGING_INPUT;
input file(s) of a scheduling unit queuing in the (pilot) agent's MongoDB queue. The agent is the same on which the input file(s)' unit is being scheduled, state AGENT_STAGING_INPUT_PENDING;
input file(s) of a scheduling unit being staged to an agent's resource. The agent is the same on which the input file(s)' unit is being scheduled, state AGENT_STAGING_INPUT;
unit queuing in a agent's queue, state AGENT_SCHEDULING_PENDING;
unit being scheduled by the agent for execution on pilot's resources, state AGENT_SCHEDULING;
unit queuing in a agent's queue, state AGENT_EXECUTING_PENDING;
unit being executed by the agent on pilot's resources, state AGENT_EXECUTING;
output file(s) of an executed unit queuing on an agent's queue, state AGENT_STAGING_OUTPUT_PENDING;
output file(s) of an executed unit being staged on a UMGR's MongoDB queue, state AGENT_STAGING_OUTPUT;
output file(s) of an executed unit queuing on a UMGR's MongoDB queue, state UMGR_STAGING_OUTPUT_PENDING;
output file(s) of an executed unit being staged on a UMGR's resource (e.g., user's workstation), state UMGR_STAGING_OUTPUT;
unit marked as done by a UMGR, state DONE.

As done with the pilot state model, we verify whether our run match the compute unit state model, and we test whether the state model of each unit of every session of all experiments is the same. In this way we will know:

whether our data are consistent with the authoritative RADICAL-Pilot state model; and
what states of the units we can compare in our analysis given our datasets.



In [60]:

    
last_sv = None
last_id = None

for s in sessions['session']:
    sv = s.describe('state_values', etype=['unit']).values()[0].values()[0]
   
    if last_sv and last_sv != sv:
        print "Different state models:\n%s = %s\n%s = %s" % (last_id, last_sv, sid, sv)

    last_sv = sv
    last_id = s._sid
        
pprint.pprint(last_sv)









    



{-1: None,
 0: 'NEW',
 1: 'UMGR_SCHEDULING_PENDING',
 2: 'UMGR_SCHEDULING',
 3: 'UMGR_STAGING_INPUT_PENDING',
 4: 'UMGR_STAGING_INPUT',
 5: 'AGENT_STAGING_INPUT_PENDING',
 6: 'AGENT_STAGING_INPUT',
 7: 'AGENT_SCHEDULING_PENDING',
 8: 'AGENT_SCHEDULING',
 9: 'AGENT_EXECUTING_PENDING',
 10: 'AGENT_EXECUTING',
 11: 'AGENT_STAGING_OUTPUT_PENDING',
 12: 'AGENT_STAGING_OUTPUT',
 13: 'UMGR_STAGING_OUTPUT_PENDING',
 14: 'UMGR_STAGING_OUTPUT',
 15: ['CANCELED', 'DONE', 'FAILED']}

Unit Durations

We define 15 durations to measure the aggreted time spent by all the units of a session in each state:

Duration	Start timestamp	End time Stamp	Description
TT_UNIT_UMGR_SCHEDULING	NEW	UMGR_SCHEDULING_PENDING	total time spent by a unit being scheduled to a UMGR
TT_UNIT_UMGR_BINDING	UMGR_SCHEDULING_PENDING	UMGR_SCHEDULING	total time spent by a unit being bound to a pilot by a UMGR
TT_IF_UMGR_SCHEDULING	UMGR_SCHEDULING	UMGR_STAGING_INPUT_PENDING	total time spent by input file(s) being scheduled to a UMGR
TT_IF_UMGR_QUEING	UMGR_STAGING_INPUT_PENDING	UMGR_STAGING_INPUT	total time spent by input file(s) queuing in a UMGR
TT_IF_AGENT_SCHEDULING	UMGR_STAGING_INPUT	AGENT_STAGING_INPUT_PENDING	total time spent by input file(s) being scheduled to an agent's MongoDB queue
TT_IF_AGENT_QUEUING	AGENT_STAGING_INPUT_PENDING	AGENT_STAGING_INPUT	total time spent by input file(s) queuing in an agent's MongoDB queue
TT_IF_AGENT_TRANSFERRING	AGENT_STAGING_INPUT	AGENT_SCHEDULING_PENDING	total time spent by input file(s)' payload to be transferred from where the UMGR is being executed (e.g., the user's workstation) to the resource on which the agent is executing
TT_UNIT_AGENT_QUEUING	AGENT_SCHEDULING_PENDING	AGENT_SCHEDULING	total time spent by a unit in the agent's scheduling queue
TT_UNIT_AGENT_SCHEDULING	AGENT_SCHEDULING	AGENT_EXECUTING_PENDING	total time spent by a unit to be scheduled to the agent's executing queue
TT_UNIT_AGENT_QUEUING_EXECUTION	AGENT_EXECUTING_PENDING	AGENT_EXECUTING	total time spent by a unit in the agent's executing queue
TT_UNIT_AGENT_EXECUTING	AGENT_EXECUTING	AGENT_STAGING_OUTPUT_PENDING	total time spent by a unit executing
TT_OF_AGENT_QUEUING	AGENT_STAGING_OUTPUT_PENDING	AGENT_STAGING_OUTPUT	total time spent by output file(s) queuing in the agent's stage out queue
TT_OF_UMGR_SCHEDULING	AGENT_STAGING_OUTPUT	UMGR_STAGING_OUTPUT_PENDING	total time spent by output file(s) being scheduled to a UMGR's MongoDB queue
TT_OF_UMGR_QUEUING	UMGR_STAGING_OUTPUT_PENDING	UMGR_STAGING_OUTPUT	total time spent by output file(s) queuing in a UMGR's MongoDB queue
TT_OF_UMGR_TRANSFERRING	UMGR_STAGING_OUTPUT	DONE	total time spent by output file(s)' payload to be transferred from the resource to where the UMGR is being executed (e.g., the user's workstation)

Unit Durations Aggregates

Durations can be aggregated so to represent a middle-level semantics:

* TT_UNIT_RP_OVERHEAD    = TT_UMGR_UNIT_SCHEDULING         + 
                           TT_AGENT_UNIT_QUEUING           + 
                           TT_AGENT_UNIT_SCHEDULING        + 
                           TT_AGENT_UNIT_QUEUING_EXECUTION

* TT_IF_RP_OVERHEAD      = TT_UMGR_IF_SCHEDULING           +
                           TT_UMGR_IF_QUEING               +
                           TT_AGENT_IF_QUEUING

* TT_OF_RP_OVERHEAD      = TT_AGENT_OF_QUEUING             +
                           TT_UMGR_OF_QUEING               +

* TT_IF_NETWORK_OVERHEAD = TT_AGENT_IF_SCHEDULING          +
                           TT_AGENT_IF_TRANSFERRING

* TT_OF_NETWORK_OVERHEAD = TT_UMGR_OF_SCHEDULING           +
                           TT_UMGR_OF_TRANSFERRING

* TT_IF_STAGING          = TT_IF_RP_OVERHEAD               +
                           TT_IF_NETWORK_OVERHEAD

* TT_OF_STAGING          = TT_OF_RP_OVERHEAD               +
                           TT_OF_NETWORK_OVERHEAD

and higher-level semantics:

* TT_RP_OVERHEADS        = TT_UNIT_RP_OVERHEAD             + 
                           TT_IF_RP_OVERHEAD               + 
                           TT_OF_RP_OVERHEAD

* TT_NETWORK_OVERHEADS   = TT_IF_NETWORK_OVERHEAD          +
                           TT_OF_NETWORK_OVERHEAD

* TT_FILE_STAGING        = TT_IF_STAGING                   + 
                           TT_OF_STAGING

* TT_UNIT_EXECUTING      = TT_AGENT_UNIT_EXECUTING

Consistency Rules

Note that we can derive consistency constraints from these models. For every session, the following has always to be true:

As done with the pilot, we first calculate the overall time spent during the session to execute CUs.



In [61]:

    
# Model of unit durations.
udm = {'TT_UNIT_UMGR_SCHEDULING'   : ['NEW'                         , 'UMGR_SCHEDULING_PENDING'],
       'TT_UNIT_UMGR_BINDING'      : ['UMGR_SCHEDULING_PENDING'     , 'UMGR_SCHEDULING'],
       'TT_IF_UMGR_SCHEDULING'     : ['UMGR_SCHEDULING'             , 'UMGR_STAGING_INPUT_PENDING'], 
       'TT_IF_UMGR_QUEING'         : ['UMGR_STAGING_INPUT_PENDING'  , 'UMGR_STAGING_INPUT'],
       'TT_IF_AGENT_SCHEDULING'    : ['UMGR_STAGING_INPUT'          , 'AGENT_STAGING_INPUT_PENDING'],  
       'TT_IF_AGENT_QUEUING'       : ['AGENT_STAGING_INPUT_PENDING' , 'AGENT_STAGING_INPUT'], 
       'TT_IF_AGENT_TRANSFERRING'  : ['AGENT_STAGING_INPUT'         , 'AGENT_SCHEDULING_PENDING'],
       'TT_UNIT_AGENT_QUEUING'     : ['AGENT_SCHEDULING_PENDING'    , 'AGENT_SCHEDULING'],
       'TT_UNIT_AGENT_SCHEDULING'  : ['AGENT_SCHEDULING'            , 'AGENT_EXECUTING_PENDING'], 
       'TT_UNIT_AGENT_QUEUING_EXEC': ['AGENT_EXECUTING_PENDING'     , 'AGENT_EXECUTING'], 
       'TT_UNIT_AGENT_EXECUTING'   : ['AGENT_EXECUTING'             , 'AGENT_STAGING_OUTPUT_PENDING'], 
       'TT_OF_AGENT_QUEUING'       : ['AGENT_STAGING_OUTPUT_PENDING', 'AGENT_STAGING_OUTPUT'], 
       'TT_OF_UMGR_SCHEDULING'     : ['AGENT_STAGING_OUTPUT'        , 'UMGR_STAGING_OUTPUT_PENDING'],
       'TT_OF_UMGR_QUEUING'        : ['UMGR_STAGING_OUTPUT_PENDING' , 'UMGR_STAGING_OUTPUT'],
       'TT_OF_UMGR_TRANSFERRING'   : ['UMGR_STAGING_OUTPUT'         , 'DONE']}

# Calculate total unit durations for each session.
for sid in sessions.index:
    s = sessions.ix[sid, 'session'].filter(etype='unit', inplace=False)
    for d in udm.keys():
        sessions.ix[sid, d] = s.duration(udm[d])

# Print the new columns of the session DF with total unit durations.
display(sessions[['TT_UNIT_UMGR_SCHEDULING'   , 'TT_UNIT_UMGR_BINDING'   , 'TT_IF_UMGR_SCHEDULING'   , 
                  'TT_IF_UMGR_QUEING'         , 'TT_IF_AGENT_SCHEDULING' , 'TT_IF_AGENT_QUEUING'     , 
                  'TT_IF_AGENT_TRANSFERRING'  , 'TT_UNIT_AGENT_QUEUING'  , 'TT_UNIT_AGENT_SCHEDULING', 
                  'TT_UNIT_AGENT_QUEUING_EXEC', 'TT_UNIT_AGENT_EXECUTING', 'TT_OF_AGENT_QUEUING'     , 
                  'TT_OF_UMGR_SCHEDULING'     , 'TT_OF_UMGR_QUEUING'     , 'TT_OF_UMGR_TRANSFERRING']].head(3))
display(sessions[['TT_UNIT_UMGR_SCHEDULING'   , 'TT_UNIT_UMGR_BINDING'   , 'TT_IF_UMGR_SCHEDULING'   , 
                  'TT_IF_UMGR_QUEING'         , 'TT_IF_AGENT_SCHEDULING' , 'TT_IF_AGENT_QUEUING'     , 
                  'TT_IF_AGENT_TRANSFERRING'  , 'TT_UNIT_AGENT_QUEUING'  , 'TT_UNIT_AGENT_SCHEDULING', 
                  'TT_UNIT_AGENT_QUEUING_EXEC', 'TT_UNIT_AGENT_EXECUTING', 'TT_OF_AGENT_QUEUING'     , 
                  'TT_OF_UMGR_SCHEDULING'     , 'TT_OF_UMGR_QUEUING'     , 'TT_OF_UMGR_TRANSFERRING']].tail(3))









    






  
    
      
      TT_UNIT_UMGR_SCHEDULING
      TT_UNIT_UMGR_BINDING
      TT_IF_UMGR_SCHEDULING
      TT_IF_UMGR_QUEING
      TT_IF_AGENT_SCHEDULING
      TT_IF_AGENT_QUEUING
      TT_IF_AGENT_TRANSFERRING
      TT_UNIT_AGENT_QUEUING
      TT_UNIT_AGENT_SCHEDULING
      TT_UNIT_AGENT_QUEUING_EXEC
      TT_UNIT_AGENT_EXECUTING
      TT_OF_AGENT_QUEUING
      TT_OF_UMGR_SCHEDULING
      TT_OF_UMGR_QUEUING
      TT_OF_UMGR_TRANSFERRING
    
  
  
    
      rp.session.radical.mingtha.017033.0007
      0.2603
      0.0073
      893.4051
      0.014800
      0.003800
      73.470300
      0.004801
      0.018100
      953.1781
      0.0223
      1295.3383
      1.4884
      0.0124
      36.893699
      0.004000
    
    
      rp.session.radical.mingtha.017033.0008
      0.4086
      0.0375
      4049.1254
      0.069201
      0.017399
      410.061300
      0.018901
      0.089498
      4281.2538
      0.0920
      4684.3459
      0.0937
      0.0410
      151.972700
      0.017699
    
    
      rp.session.radical.mingtha.017033.0009
      0.9616
      0.0105
      2038.6201
      0.032899
      0.008401
      149.587099
      0.009701
      0.040300
      2212.5801
      0.0445
      2576.7140
      1.3749
      0.0200
      73.501200
      0.008899
    
  








    






  
    
      
      TT_UNIT_UMGR_SCHEDULING
      TT_UNIT_UMGR_BINDING
      TT_IF_UMGR_SCHEDULING
      TT_IF_UMGR_QUEING
      TT_IF_AGENT_SCHEDULING
      TT_IF_AGENT_QUEUING
      TT_IF_AGENT_TRANSFERRING
      TT_UNIT_AGENT_QUEUING
      TT_UNIT_AGENT_SCHEDULING
      TT_UNIT_AGENT_QUEUING_EXEC
      TT_UNIT_AGENT_EXECUTING
      TT_OF_AGENT_QUEUING
      TT_OF_UMGR_SCHEDULING
      TT_OF_UMGR_QUEUING
      TT_OF_UMGR_TRANSFERRING
    
  
  
    
      rp.session.radical.mturilli.017087.0004
      0.6705
      0.0285
      1734.9091
      0.056300
      0.014200
      250.449400
      0.020401
      0.085599
      1843.7852
      0.091300
      2144.5490
      0.197700
      0.145201
      144.542200
      0.015099
    
    
      rp.session.radical.mturilli.017087.0005
      1.1227
      0.0313
      3749.3999
      0.126799
      0.031199
      639.056804
      0.044399
      0.209000
      3843.7559
      0.199201
      4141.2798
      0.236402
      0.517299
      287.558597
      0.031201
    
    
      rp.session.radical.mturilli.017087.0006
      0.2231
      0.0056
      299.7530
      0.010000
      0.003100
      43.557600
      0.011400
      0.019500
      671.4032
      0.033999
      1165.2300
      0.067600
      0.018200
      32.946399
      0.002701

Total Time eXecuting (TTX)

We now calculate the total amount of TTC spent executing CUs. This will tell us how much of TTR is indeed spent executing CUs.



In [64]:

    
# Add number of unique resorces per session.

for sid in sessions.index:
    sessions.ix[sid, 'n_unique_host'] = len(pilots[pilots['sid'] == sid]['parsed_hostID'].unique())



In [74]:

    
fig = plt.figure(figsize=(13,14))

title = 'XSEDE OSG Virtual Cluster'
subtitle = 'TTQ, TTR, TTX and TTC with Number of Active Pilots (black) and Number of Unique Resources (red)'
fig.suptitle('%s:\n%s.' % (title, subtitle), fontsize=16)

defs = {'ttq': 'TTQ = Total Time Queuing pilots',
        'ttr': 'TTR = Total Time Running pilots',
        'ttx': 'TTR = Total Time Executing compute units',
        'ttc': 'TTC = Total Time Completing experiment'}
defslist = '%s;\n%s;\n%s;\n%s.' % (defs['ttq'], defs['ttr'], defs['ttx'], defs['ttc'])
plt.figtext(.38,.89, defslist, fontsize=14, ha='left')

gs = []
grid  = gridspec.GridSpec(2, 2)
grid.update(wspace=0.4, hspace=0.4, top=0.825)
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[0]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[1]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[2]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 5, subplot_spec=grid[3]))

ttq_subplots = []
for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        if not sessions[ (sessions['experiment'] == exp) & 
                 (sessions['nunits'] == nun) ].empty:
            ttq_subplots.append(sessions[ (sessions['experiment'] == exp) & 
                                          (sessions['nunits'] == nun) ].sort_values('TTC'))

colors = {'exp1': [tableau20[0] ,tableau20[18],tableau20[1] ,tableau20[19]],
          'exp2': [tableau20[2] ,tableau20[6] ,tableau20[3] ,tableau20[7] ],
          'exp3': [tableau20[8] ,tableau20[12],tableau20[9] ,tableau20[13]],
          'exp4': [tableau20[4] ,tableau20[16],tableau20[5] ,tableau20[17]],
          'exp5': [tableau20[10],tableau20[14],tableau20[11],tableau20[15]]}

nun_exp = []
nun_exp.append(len(sessions[sessions['experiment'] == 'exp1']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp2']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp3']['nunits'].sort_values().unique()))
nun_exp.append(len(sessions[sessions['experiment'] == 'exp4']['nunits'].sort_values().unique()))

ax = []
i  = 0
while(i < len(ttq_subplots)):
    for gn in range(4):
        for gc in range(nun_exp[gn]):
            session = ttq_subplots.pop(0)
            experiment = session['experiment'].unique()[0]
            ntasks = int(session['nunits'].unique()[0])
            npilots = int(session[session['experiment'] == experiment]['npilot_active'][0])
            repetitions = session.shape[0]
            color = colors[experiment]
            title = 'Exp. %s\n%s tasks\n%s pilots\n%s rep.' % (experiment[3], ntasks, npilots, repetitions)
            
            if i == 0:
                ax.append(plt.Subplot(fig, gs[gn][0, gc]))
            else:
                ax.append(plt.Subplot(fig, gs[gn][0, gc], sharey=ax[0]))
            session[['TT_PILOT_LRMS_QUEUING', 
                     'TT_PILOT_LRMS_RUNNING',
                     'TT_UNIT_AGENT_EXECUTING',
                     'TTC']].plot(kind='bar', ax=ax[i], color=color, title=title, stacked=True)
            ax[i].spines["top"].set_visible(False)
            ax[i].spines["right"].set_visible(False)
            ax[i].get_xaxis().tick_bottom()
            ax[i].get_yaxis().tick_left()
            ax[i].set_xticklabels([])
            ax[i].set_xlabel('Runs')
            
            # Handle a bug that sets yticklabels to visible 
            # for the last subplot.
            if i == 7 or i == 16:
                plt.setp(ax[i].get_yticklabels(), visible=False)
            else:
                ax[i].set_ylabel('Time (s)')
                
            # Handle legens.
            if i == 7 or i == 3 or i == 11:
                ax[i].legend(labels=['TTQ','TTR','TTX','TTC'], bbox_to_anchor=(2.25, 1))
            elif i == 16:
                ax[i].legend(labels=['TTQ','TTR','TTX','TTC'], bbox_to_anchor=(2.70, 1))
            else:
                ax[i].get_legend().set_visible(False)
                
            # Add labels with number of pilots per session.
            rects = ax[i].patches
            labels = [int(l) for l in session['npilot_active']]
            for rect, label in zip(rects[-repetitions:], labels):
                height = rect.get_height()
                ax[i].text(rect.get_x() + rect.get_width()/2, 
                           (height*3)+1500, label, ha='center', 
                           va='bottom')

            # Add labels with number of unique resources per session.
            rects = ax[i].patches
            labels = [int(l) for l in session['n_unique_host']]
            for rect, label in zip(rects[-repetitions:], labels):
                height = rect.get_height()
                ax[i].text(rect.get_x() + rect.get_width()/2, 
                           height*3, label, ha='center', 
                           va='bottom', color='red')

            fig.add_subplot(ax[i])
            i += 1

plt.savefig('figures/osg_ttq_ttr_ttx_ttc_npactive_nrunique_nunits.pdf', dpi=600, bbox_inches='tight')

The diagram confirms the similarity between the size of TTR and TTX. More analytically:



In [28]:

    
sessions[['TT_PILOT_LRMS_RUNNING', 'TT_UNIT_AGENT_EXECUTING']].describe()









    Out[28]:






  
    
      
      TT_PILOT_LRMS_RUNNING
      TT_UNIT_AGENT_EXECUTING
    
  
  
    
      count
      72.000000
      72.000000
    
    
      mean
      3490.056229
      3301.837846
    
    
      std
      2698.278110
      2685.132010
    
    
      min
      697.522100
      665.675400
    
    
      25%
      1631.464550
      1446.410950
    
    
      50%
      2593.843550
      2560.194050
    
    
      75%
      4453.639975
      4005.900000
    
    
      max
      15388.901700
      15361.804200

We now have to characterize the variation among TTX of the runs with the same number of CUs and between the two experiments. We plot just TTX focusing on these variations.



In [29]:

    
fig = plt.figure(figsize=(13,7))
fig.suptitle('TTX with Number of Active Pilots - XSEDE OSG Virtual Cluster', fontsize=14)

gs = []
grid  = gridspec.GridSpec(1, 2)
grid.update(wspace=0.4, top=0.85)
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[0]))
gs.append(gridspec.GridSpecFromSubplotSpec(1, 4, subplot_spec=grid[1]))

ttq_subplots = []
for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        ttq_subplots.append(sessions[ (sessions['experiment'] == exp) & 
                                      (sessions['nunits'] == nun) ].sort_values('TTC'))

colors = {'exp1': [tableau20[18]],
          'exp2': [tableau20[10]]}

ax = []
i  = 0
while(i<8):
    for gn in range(2):
        for gc in range(4):
            session = ttq_subplots.pop(0)
            experiment = session['experiment'].unique()[0]
            ntasks = int(session['nunits'].unique()[0])
            repetitions = session.shape[0]
            color = colors[experiment]
            title = 'Experiment %s\n%s tasks\n%s repetitions.' % (experiment[3], ntasks, repetitions)
            
            if i == 0:
                ax.append(plt.Subplot(fig, gs[gn][0, gc]))
            else:
                ax.append(plt.Subplot(fig, gs[gn][0, gc], sharey=ax[0]))
            session[['TT_UNIT_AGENT_EXECUTING']].plot(kind='bar', ax=ax[i], color=color, title=title, stacked=True)
            ax[i].spines["top"].set_visible(False)
            ax[i].spines["right"].set_visible(False)
            ax[i].get_xaxis().tick_bottom()
            ax[i].get_yaxis().tick_left()
            ax[i].set_xticklabels([])
            ax[i].set_xlabel('Sessions')
            
            # Handle a bug that sets yticklabels to visible 
            # for the last subplot.
            if i == 7:
                plt.setp(ax[i].get_yticklabels(), visible=False)
            else:
                ax[i].set_ylabel('Time (s)')
                
            # Handle legens.
            if i == 7 or i == 3:
                ax[i].legend(labels=['TTX'], bbox_to_anchor=(2.25, 1))
            else:
                ax[i].get_legend().set_visible(False)
                
            # Add labels with number of pilots per session.
            rects = ax[i].patches
            labels = [int(l) for l in session['npilot_active']]
            for rect, label in zip(rects[-repetitions:], labels):
                height = rect.get_height()
                ax[i].text(rect.get_x() + rect.get_width()/2, 
                           height+0.5, label, ha='center', 
                           va='bottom')

            fig.add_subplot(ax[i])
            i += 1

plt.savefig('figures/osg_ttx_npactive_nunits.pdf', dpi=600, bbox_inches='tight')









    



---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-29-d52a1cd27e9c> in <module>()
     23         for gc in range(4):
     24             session = ttq_subplots.pop(0)
---> 25             experiment = session['experiment'].unique()[0]
     26             ntasks = int(session['nunits'].unique()[0])
     27             repetitions = session.shape[0]

IndexError: index 0 is out of bounds for axis 0 with size 0

We notice lage variations both within and across experiments. Specifically:



In [30]:

    
ttx_stats = {}

for exp in sessions['experiment'].sort_values().unique():
    for nun in sessions['nunits'].sort_values().unique():
        tag = exp+'_'+str(int(nun))
        ttx_stats[tag] = sessions[ (sessions['experiment'] == exp) & 
                                   (sessions['nunits'] == nun) ]['TT_UNIT_AGENT_EXECUTING'].describe()

ttx_compare = pd.DataFrame(ttx_stats)

sort_cols_runs = ['exp1_8' , 'exp2_8' , 'exp1_16', 'exp2_16', 
                  'exp1_32', 'exp2_32', 'exp1_64', 'exp2_64']
sort_cols_exp  = ['exp1_8' , 'exp1_16', 'exp1_32', 'exp1_64', 
                  'exp2_8',  'exp2_16', 'exp2_32', 'exp2_64']

ttx_compare_runs = ttx_compare.reindex_axis(sort_cols_runs, axis=1)
ttx_compare_exp  = ttx_compare.reindex_axis(sort_cols_exp,  axis=1)

ttx_compare_exp.round(2)



In [31]:

    
std_mean = (ttx_compare_exp.loc['std']/ttx_compare_exp.loc['mean'])*100
std_mean.round(2)









    Out[31]:





exp1_8      7.18
exp1_16    14.56
exp1_32    10.46
exp1_64    18.51
exp2_8     78.41
exp2_16    26.19
exp2_32    25.92
exp2_64    34.80
dtype: float64

Variation within Experiment 1 is between 7 and 18% of mean, almost proportionally increasing with the increasing of the number of unit executed. Variation in Experiment 2 is more pronounced, ranging from 25 to 78% of the mean. Clearly, these values are not indicative.

While our analysis showed that this difference is due to the varying number of active pilots in Experiment 2, the relative uniformity of Experiment 1 may be also a byproduct of a reduced functionality of RP that limited the number of resources in which the experiment was able to run successfully. Experiment 1 failed when any of the pilots failed while in Experiment 2, failures where limited to pilots, without affecting the overall execution.

Concurrency

Finally, another element that should be considered is the impact of RP overheads on the execution of CUs. While the comparison between TTR and TTX excluded major overheads outside TTX, the averarages between the two parameters show that TTR grown bigger than TTX proportionally to the increase of the number of CU executed. This can be measured by looking at the degree of concurrency of CU executions across pilots.

Here we plot this degree only for 64 CUs comparing both exp1 and exp2 runs.



In [32]:

    
from collections import OrderedDict

cTTXs = OrderedDict()
ncu = 64

ss = sessions[sessions['nunits'] == ncu].sort_values(['experiment','TTC'])['session']

for s in ss:
    cTTXs[s._sid] = s.filter(etype='unit').concurrency(state=['AGENT_EXECUTING','AGENT_STAGING_OUTPUT_PENDING'],
                                                       sampling=1)

for sid, cTTX in cTTXs.iteritems():
    title = 'Degree of Concurrent Execution of %s CUs - XSEDE OSG Virtual Cluster\nSession %s' % (ncu, sid)
    x = [x[0] for x in cTTX]
    y = [y[1] for y in cTTX]

    color = tableau20[2]
    if 'ming' in sid:
        color = tableau20[0]
    
    fig, ax = fig_setup()
    fig.suptitle(title, fontsize=14)
    ax.set_xlabel('Time (s)')
    ax.set_ylabel('Number of CU')
    display(ax.plot(x, y, marker='.',  linestyle='', color=color))









    





[<matplotlib.lines.Line2D at 0x12a526d50>]






    





[<matplotlib.lines.Line2D at 0x127f3a9d0>]






    





[<matplotlib.lines.Line2D at 0x12a326b50>]






    





[<matplotlib.lines.Line2D at 0x12bd7c650>]






    





[<matplotlib.lines.Line2D at 0x128911c50>]






    





[<matplotlib.lines.Line2D at 0x12b29ff90>]






    





[<matplotlib.lines.Line2D at 0x12d1fce50>]






    





[<matplotlib.lines.Line2D at 0x12bb3e210>]






    





[<matplotlib.lines.Line2D at 0x1277dce10>]






    





[<matplotlib.lines.Line2D at 0x12b9b0c90>]






    





[<matplotlib.lines.Line2D at 0x1286f7650>]






    





[<matplotlib.lines.Line2D at 0x12bb96d90>]






    





[<matplotlib.lines.Line2D at 0x12c24cd10>]






    





[<matplotlib.lines.Line2D at 0x12c47c550>]






    





[<matplotlib.lines.Line2D at 0x1281fd350>]

Experiment 2 shows better concurrency than Experiment 1. This might be due to the evolution of the RADICAL Pilot code or more performant resources used by the runs of Experiment 2. These data shows at least three elements that need further development:

We lack information on what resource each pilot has been running. Without this information we cannot make statistical inferences about some of the durations we measure.
RADICAL Pilot overheads affect experimental measures at both global (see state models) and local level (see concurrency overheads). Their characterization and, in case, normalization is required by every experiment and need to be considered a first citizen of analytics.
Experiments are potentially affected by small differences in the RADICAL Pilot code base. This would require the use of a stable RADICAL Pilot code base across all the experiment computational campaign.

Time execution (Tx)

Overall, our analysis confirms that the heterogeneity of OSG resources affects TTX and that TTQ has a marginal impact. The following step is trying to characterize statistically the effect of resource heterogeneity on TTX. We begin this characterization by looking at the time that each unit takes to execute on a OSG resource (Tx).

Each CU has the same computing requirements. Therefore, we can aggregate their analysis across experiments. Note that each unit executed on only and only one pilot so the number of pilot that becomes active have no bearing on the units' Tx.



In [36]:

    
# Model of unit durations.
udm = {'UNIT_UMGR_SCHEDULING'   : ['NEW'                         , 'UMGR_SCHEDULING_PENDING'],
       'UNIT_UMGR_BINDING'      : ['UMGR_SCHEDULING_PENDING'     , 'UMGR_SCHEDULING'],
       'IF_UMGR_SCHEDULING'     : ['UMGR_SCHEDULING'             , 'UMGR_STAGING_INPUT_PENDING'], 
       'IF_UMGR_QUEING'         : ['UMGR_STAGING_INPUT_PENDING'  , 'UMGR_STAGING_INPUT'],
       #'IF_AGENT_SCHEDULING'    : ['UMGR_STAGING_INPUT'          , 'AGENT_STAGING_INPUT_PENDING'],  
       'IF_AGENT_QUEUING'       : ['AGENT_STAGING_INPUT_PENDING' , 'AGENT_STAGING_INPUT'], 
       'IF_AGENT_TRANSFERRING'  : ['AGENT_STAGING_INPUT'         , 'AGENT_SCHEDULING_PENDING'],
       'UNIT_AGENT_QUEUING'     : ['AGENT_SCHEDULING_PENDING'    , 'AGENT_SCHEDULING'],
       'UNIT_AGENT_SCHEDULING'  : ['AGENT_SCHEDULING'            , 'AGENT_EXECUTING_PENDING'], 
       'UNIT_AGENT_QUEUING_EXEC': ['AGENT_EXECUTING_PENDING'     , 'AGENT_EXECUTING'], 
       'UNIT_AGENT_EXECUTING'   : ['AGENT_EXECUTING'             , 'AGENT_STAGING_OUTPUT_PENDING'], 
       #'OF_AGENT_QUEUING'       : ['AGENT_STAGING_OUTPUT_PENDING', 'AGENT_STAGING_OUTPUT'], 
       #'OF_UMGR_SCHEDULING'     : ['AGENT_STAGING_OUTPUT'        , 'UMGR_STAGING_OUTPUT_PENDING'],
       'OF_UMGR_QUEUING'        : ['UMGR_STAGING_OUTPUT_PENDING' , 'UMGR_STAGING_OUTPUT'],
       'OF_UMGR_TRANSFERRING'   : ['UMGR_STAGING_OUTPUT'         , 'DONE']}

# DataFrame structure for pilot durations. 
uds = { 'pid': [],
        'sid': [],
        'experiment' : [],
        'UNIT_UMGR_SCHEDULING'   : [],
        'UNIT_UMGR_BINDING'      : [],
        'IF_UMGR_SCHEDULING'     : [], 
        'IF_UMGR_QUEING'         : [],
        'IF_AGENT_SCHEDULING'    : [],
        'IF_AGENT_QUEUING'       : [], 
        'IF_AGENT_TRANSFERRING'  : [],
        'UNIT_AGENT_QUEUING'     : [],
        'UNIT_AGENT_SCHEDULING'  : [], 
        'UNIT_AGENT_QUEUING_EXEC': [], 
        'UNIT_AGENT_EXECUTING'   : [], 
        'OF_AGENT_QUEUING'       : [], 
        'OF_UMGR_SCHEDULING'     : [],
        'OF_UMGR_QUEUING'        : [],
        'OF_UMGR_TRANSFERRING'   : []}

# Calculate the duration for each state of each 
# pilot of each run and Populate the DataFrame 
# structure.
for sid in sessions[['session', 'experiment']].index:
    s = sessions.ix[sid, 'session'].filter(etype='unit', inplace=False)
    for u in s.list('uid'):
        sf = s.filter(uid=u, inplace=False)
        uds['pid'].append(u)
        uds['sid'].append(sid)
        uds['experiment'].append(sessions.ix[sid, 'experiment'])
        for d in udm.keys():
            if (not sf.timestamps(state=udm[d][0]) or 
                not sf.timestamps(state=udm[d][1])):
                pds[d].append(None)
                print udm[d]
                continue
            uds[d].append(sf.duration(udm[d]))

# Populate the DataFrame. We have empty lists 
units = pd.DataFrame(dict([(k,pd.Series(v)) for k,v in uds.iteritems()]))

display(units.head(3))
display(units.tail(3))









    






  
    
      
      IF_AGENT_QUEUING
      IF_AGENT_SCHEDULING
      IF_AGENT_TRANSFERRING
      IF_UMGR_QUEING
      IF_UMGR_SCHEDULING
      OF_AGENT_QUEUING
      OF_UMGR_QUEUING
      OF_UMGR_SCHEDULING
      OF_UMGR_TRANSFERRING
      UNIT_AGENT_EXECUTING
      UNIT_AGENT_QUEUING
      UNIT_AGENT_QUEUING_EXEC
      UNIT_AGENT_SCHEDULING
      UNIT_UMGR_BINDING
      UNIT_UMGR_SCHEDULING
      experiment
      pid
      sid
    
  
  
    
      0
      7.4205
      NaN
      0.0004
      0.0016
      270.2836
      NaN
      2.2764
      NaN
      0.0002
      210.2204
      0.0016
      0.0012
      239.5139
      0.0073
      0.2526
      exp1
      unit.000009
      rp.session.radical.mingtha.017033.0007
    
    
      1
      7.4205
      NaN
      0.0004
      0.0016
      270.2836
      NaN
      2.6414
      NaN
      0.0003
      239.5106
      0.0016
      0.0012
      0.0007
      0.0073
      0.2534
      exp1
      unit.000008
      rp.session.radical.mingtha.017033.0007
    
    
      2
      8.1140
      NaN
      0.0005
      0.0015
      315.0004
      NaN
      2.2641
      NaN
      0.0002
      249.2585
      0.0018
      0.0014
      294.9150
      0.0073
      0.2517
      exp1
      unit.000010
      rp.session.radical.mingtha.017033.0007
    
  








    






  
    
      
      IF_AGENT_QUEUING
      IF_AGENT_SCHEDULING
      IF_AGENT_TRANSFERRING
      IF_UMGR_QUEING
      IF_UMGR_SCHEDULING
      OF_AGENT_QUEUING
      OF_UMGR_QUEUING
      OF_UMGR_SCHEDULING
      OF_UMGR_TRANSFERRING
      UNIT_AGENT_EXECUTING
      UNIT_AGENT_QUEUING
      UNIT_AGENT_QUEUING_EXEC
      UNIT_AGENT_SCHEDULING
      UNIT_UMGR_BINDING
      UNIT_UMGR_SCHEDULING
      experiment
      pid
      sid
    
  
  
    
      2413
      6.1270
      NaN
      0.0068
      0.0013
      228.8057
      NaN
      1.8667
      NaN
      0.0002
      416.3933
      0.0029
      0.0056
      0.0062
      0.0056
      0.2196
      exp4
      unit.000007
      rp.session.radical.mturilli.017087.0006
    
    
      2414
      2.9402
      NaN
      0.0005
      0.0014
      203.7040
      NaN
      1.9489
      NaN
      0.0002
      333.5874
      0.0020
      0.0013
      0.0006
      0.0056
      0.2225
      exp4
      unit.000001
      rp.session.radical.mturilli.017087.0006
    
    
      2415
      7.5189
      NaN
      0.0010
      0.0011
      234.2630
      NaN
      1.8191
      NaN
      0.0001
      420.7333
      0.0042
      0.0024
      450.8125
      0.0056
      0.2231
      exp4
      unit.000000
      rp.session.radical.mturilli.017087.0006

Note the NaN value for IF_AGENT_SCHEDULING and OF_UMGR_SCHEDULING. The timestamp of these states is broken in the RADICAL Pilot branch used for this experiments. Without analytics it would be difficult to spot and/or understand the error.

Statistics

We now have two set of measures that we can use to do some statistics.

Descriptive

Measures of Center

mean ($\mu$)
standard error of the mean (SEM)
median
mode



In [37]:

    
def measures_of_center(durations):
    m = {}
    m['mu']     = np.mean(durations)         # Mean value of the data
    
    # standard error of the mean. Quantifies how 
    # precisely we know the true mean of the 
    # population. It takes into account both the 
    # value of the SD and the sample size. SEM
    # gets smaller as your samples get larger: 
    # precision of the mean gets higher with the
    # sample size.
    m['sem']    = sps.sem(durations)         
    
    # Are there extremes in our dataset? Compare 
    # to the mean. 
    m['median'] = np.median(durations)
    
    # What value occours most often?
    m['mode']   = sps.mstats.mode(durations)

    return m

Txs      = units['UNIT_AGENT_EXECUTING']
Txs_exp1 = units[ units['experiment'] == 'exp1']['UNIT_AGENT_EXECUTING']
Txs_exp2 = units[ units['experiment'] == 'exp2']['UNIT_AGENT_EXECUTING']

Txs      = sorted(Txs)
Txs_exp1 = sorted(Txs_exp1)
Txs_exp2 = sorted(Txs_exp2)

Tx_measures = measures_of_center(Txs)
Tx_measures_exp1 = measures_of_center(Txs_exp1)
Tx_measures_exp2 = measures_of_center(Txs_exp2)

print 'Tx'
pprint.pprint(Tx_measures)
print '\nTx_exp1'
pprint.pprint(Tx_measures_exp1)
print '\nTx_exp2'
pprint.pprint(Tx_measures_exp2)









    



Tx
{'median': 329.67065000534058,
 'mode': ModeResult(mode=array([ 180.19449997]), count=array([ 2.])),
 'mu': 357.87215057587781,
 'sem': 2.8282432462712639}

Tx_exp1
{'median': 328.33319997787476,
 'mode': ModeResult(mode=array([ 200.21420002]), count=array([ 2.])),
 'mu': 338.55822672617848,
 'sem': 4.55930987509202}

Tx_exp2
{'median': 299.42369997501373,
 'mode': ModeResult(mode=array([ 284.2823]), count=array([ 2.])),
 'mu': 355.21746177512864,
 'sem': 8.567725401180093}

Measures of Spread

range
percentiles
interquartile (IRQ)
variance
standard deviation ($\sigma$)
median absolute deviation (MAD)



In [38]:

    
def measures_of_spread(durations):
    m = {}
    m['range'] = max(durations)-min(durations)
    m['min'], m['q1'], m['q2'], m['q3'], m['max'] = np.percentile(durations, [0,25,50,75,100])
    m['irq'] = m['q3'] - m['q1']
    m['var'] = np.var(durations)
    m['std'] = np.std(durations)
    m['mad'] = sm.robust.scale.mad(durations)

    return m

print "Tx"
pprint.pprint(measures_of_spread(Txs))
print "\nTx exp1"
pprint.pprint(measures_of_spread(Txs_exp1))
print "\nTx exp2"
pprint.pprint(measures_of_spread(Txs_exp2))

plots = [Txs, Txs_exp1, Txs_exp2]

fig, ax = fig_setup()
fig.suptitle('Distribution of Tx for Experiment 1 and 2', fontsize=14)
ax.set_ylabel('Time (s)')
bp = ax.boxplot(plots, labels=['Tx', 'Tx exp1', 'Tx exp2'])#, showmeans=True, showcaps=True)

bp['boxes'][0].set( color=tableau20[8] )
bp['boxes'][1].set( color=tableau20[0] )
bp['boxes'][2].set( color=tableau20[2] )

plt.savefig('figures/osg_cu_spread_box.pdf', dpi=600, bbox_inches='tight')

# - Mann-Whitney-Wilcoxon (MWW) RankSum test: determine 
#   whether two distributions are significantly 
#   different or not. Unlike the t-test, the RankSum 
#   test does not assume that the data are normally 
#   distributed. How do we interpret the difference?
x = np.linspace(min(Txs),max(Txs),len(Txs))
Txs_pdf = mlab.normpdf(x, Tx_measures['mu'], Tx_measures['std'])
z_stat, p_val = sps.ranksums(Txs, Txs_pdf)









    



Tx
{'irq': 155.12127512693405,
 'mad': 116.22244804333985,
 'max': 1382.527899980545,
 'min': 136.13420009613037,
 'q1': 269.26982492208481,
 'q2': 329.67065000534058,
 'q3': 424.39110004901886,
 'range': 1246.3936998844147,
 'std': 138.98736655570832,
 'var': 19317.48806209083}

Tx exp1
{'irq': 138.07927489280701,
 'mad': 105.50975762382448,
 'max': 674.71900010108948,
 'min': 191.19040012359619,
 'q1': 270.31105011701584,
 'q2': 328.33319997787476,
 'q3': 408.39032500982285,
 'range': 483.52859997749329,
 'std': 98.104652930850719,
 'var': 9624.5229266826773}

Tx exp2
{'irq': 178.28755003213882,
 'mad': 109.40003154511911,
 'max': 1258.6105000972748,
 'min': 185.20440006256104,
 'q1': 237.6789248585701,
 'q2': 299.42369997501373,
 'q3': 415.96647489070892,
 'range': 1073.4061000347137,
 'std': 174.5378360077294,
 'var': 30463.456198261036}






    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-38-901b3b72cae5> in <module>()
     36 #   distributed. How do we interpret the difference?
     37 x = np.linspace(min(Txs),max(Txs),len(Txs))
---> 38 Txs_pdf = mlab.normpdf(x, Tx_measures['mu'], Tx_measures['std'])
     39 z_stat, p_val = sps.ranksums(Txs, Txs_pdf)

KeyError: 'std'

Skewness and Kurtosis



In [ ]:

    
Tx_measures['skew'] = sps.skew(Txs, bias=True)
Tx_measures['kurt'] = sps.kurtosis(Txs)

u_skew_test = sps.skewtest(Txs)
u_kurt_test = sps.kurtosistest(Txs)

print Tx_measures['skew']
print Tx_measures['kurt']

print u_skew_test
print u_kurt_test

metric      = 'T_x'
description = 'Histogram of $%s$' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures['mu'], Tx_measures['std'], Tx_measures['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

ax = fig_setup()

n, bins, patches = ax.hist(Txs, bins='fd',
        normed=1,
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[8], 
        color=tableau20[9])

# ax.xaxis.set_major_locator(ticker.MultipleLocator(50))
# ax.set_xlim(150, 700)
ax.set_ylim(0.0, 0.005)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title)#, y=1.05)
plt.legend(loc='upper right')

plt.savefig('figures/osg_cu_spread_hist.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
Tx_measures_exp1['skew'] = sps.skew(Txs, bias=True)
Tx_measures_exp1['kurt'] = sps.kurtosis(Txs)

u_skew_test = sps.skewtest(Txs)
u_kurt_test = sps.kurtosistest(Txs)

print Tx_measures_exp1['skew']
print Tx_measures_exp1['kurt']

print u_skew_test
print u_kurt_test

metric      = 'T_x_exp1'
description = 'Histogram of $%s$' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs_exp1)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures_exp1['mu'], Tx_measures_exp1['std'], Tx_measures_exp1['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

ax = fig_setup()

n, bins, patches = ax.hist(Txs_exp1, bins='fd',
        normed=1,
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[0], 
        color=tableau20[1])

# ax.xaxis.set_major_locator(ticker.MultipleLocator(50))
# ax.set_xlim(150, 700)
ax.set_ylim(0.0, 0.005)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title)#, y=1.05)
plt.legend(loc='upper right')

plt.savefig('figures/osg_exp1_cu_spread_hist.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
Tx_measures_exp2['skew'] = sps.skew(Txs, bias=True)
Tx_measures_exp2['kurt'] = sps.kurtosis(Txs)

u_skew_test = sps.skewtest(Txs)
u_kurt_test = sps.kurtosistest(Txs)

print Tx_measures_exp2['skew']
print Tx_measures_exp2['kurt']

print u_skew_test
print u_kurt_test

metric      = 'T_x_exp2'
description = 'Histogram of $%s$' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs_exp2)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures_exp2['mu'], Tx_measures_exp2['std'], Tx_measures_exp2['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

ax = fig_setup()

n, bins, patches = ax.hist(Txs_exp2, bins='fd',
        normed=1,
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[2], 
        color=tableau20[3])

# ax.xaxis.set_major_locator(ticker.MultipleLocator(50))
# ax.set_xlim(150, 700)
ax.set_ylim(0.0, 0.005)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title)#, y=1.05)
plt.legend(loc='upper right')

plt.savefig('figures/osg_exp2_cu_spread_hist.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
# - Fit to the normal distribution: fit the empirical 
#   distribution to the normal for comparison purposes.    
(f_mu, f_sigma) = sps.norm.fit(Txs)
(f_mu_exp1, f_sigma_exp1) = sps.norm.fit(Txs_exp1)
(f_mu_exp2, f_sigma_exp2) = sps.norm.fit(Txs_exp2)

# sample_pdf = np.linspace(min(Txs),max(Txs), len(Txs))
sample_pdf = np.linspace(0,max(Txs), len(Txs))
sample_pdf_exp1 = np.linspace(0,max(Txs_exp1), len(Txs_exp1))
sample_pdf_exp2 = np.linspace(0,max(Txs_exp2), len(Txs_exp2))



In [ ]:

    
metric      = 'T_x'
description = 'Histogram of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures['mu'], Tx_measures['std'], Tx_measures['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

ax = fig_setup()

n, bins, p = ax.hist(Txs, bins='fd',
        normed=True,
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[8], 
        color=tableau20[9])

pdf = mlab.normpdf(sample_pdf, f_mu, f_sigma)
print min(pdf)
print max(pdf)

ax.plot(sample_pdf, 
        pdf, 
        label="$\phi$", 
        color=tableau20[6])

# ax.fill_between(bins, 
#         sample_pdf,
#         color=tableau20[1],
#         alpha=0.25)

# ax.xaxis.set_major_locator(ticker.MultipleLocator(50))
ax.set_xlim(min(sample_pdf), max(sample_pdf))
ax.set_ylim(0.0, 0.005)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title)#, y=1.05)
plt.legend(loc='upper right')

plt.savefig('osg_cu_spread_pdf.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
metric      = 'T_x_exp1'
description = 'Histogram of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs_exp1)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures_exp1['mu'], 
                                                           Tx_measures_exp1['std'], 
                                                           Tx_measures_exp1['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

ax = fig_setup()

n, bins, p = ax.hist(Txs_exp1, bins='fd',
        normed=True,
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[0], 
        color=tableau20[1])

pdf_exp1 = mlab.normpdf(sample_pdf_exp1, f_mu_exp1, f_sigma_exp1)
print min(pdf_exp1)
print max(pdf_exp1)

ax.plot(sample_pdf_exp1, 
        pdf_exp1, 
        label="$\phi$", 
        color=tableau20[6])

# ax.fill_between(bins, 
#         sample_pdf,
#         color=tableau20[1],
#         alpha=0.25)

# ax.xaxis.set_major_locator(ticker.MultipleLocator(50))
ax.set_xlim(min(sample_pdf_exp1), max(sample_pdf_exp1))
ax.set_ylim(0.0, 0.005)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title)#, y=1.05)
plt.legend(loc='upper right')

plt.savefig('osg_exp1_cu_spread_pdf.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
metric      = 'T_x Experiment 2'
description = 'Histogram of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs_exp2)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures_exp2['mu'], 
                                                           Tx_measures_exp2['std'], 
                                                           Tx_measures_exp2['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

ax = fig_setup()

n, bins, p = ax.hist(Txs_exp2, bins='fd',
        normed=True,
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[2], 
        color=tableau20[3])

pdf_exp2 = mlab.normpdf(sample_pdf_exp2, f_mu_exp2, f_sigma_exp2)
print min(pdf_exp2)
print max(pdf_exp2)

ax.plot(sample_pdf_exp2, 
        pdf_exp2, 
        label="$\phi$", 
        color=tableau20[6])

# ax.fill_between(bins, 
#         sample_pdf,
#         color=tableau20[1],
#         alpha=0.25)

# ax.xaxis.set_major_locator(ticker.MultipleLocator(50))
ax.set_xlim(min(sample_pdf_exp2), max(sample_pdf_exp2))
ax.set_ylim(0.0, 0.005)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title)#, y=1.05)
plt.legend(loc='upper right')

plt.savefig('osg_exp2_cu_spread_pdf.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
# Values for analytical pdf
sample_pdf = np.random.normal(loc=f_mu, scale=f_sigma, size=len(Txs))

metric      = 'T_x'
description = 'Cumulative distribution of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures['mu'], Tx_measures['std'], Tx_measures['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

# fig = plt.figure()
# ax = fig.add_subplot(111)  
# ax.spines["top"].set_visible(False)  
# ax.spines["right"].set_visible(False)  
# ax.get_xaxis().tick_bottom()  
# ax.get_yaxis().tick_left() 

ax = fig_setup()

n, bins, p = ax.hist(Txs, 
        bins='fd',
        normed=True,
        cumulative=True, 
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[8], 
        color=tableau20[9],
        alpha=0.75)

ax.hist(sample_pdf, 
        bins='fd', 
        normed=True,
        cumulative=True, 
        histtype='stepfilled', 
        label="$cmd$",
        edgecolor=tableau20[6], 
        color=tableau20[7], 
        alpha=0.25)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title, y=1.05)
plt.legend(loc='upper left')

plt.savefig('osg_cu_cumulative_hist.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
# Values for analytical pdf
sample_pdf_exp1 = np.random.normal(loc=f_mu_exp1, scale=f_sigma_exp1, size=len(Txs_exp1))

metric      = 'T_x Experiment 1'
description = 'Cumulative distribution of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures_exp1['mu'], 
                                                           Tx_measures_exp1['std'], 
                                                           Tx_measures_exp1['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

# fig = plt.figure()
# ax = fig.add_subplot(111)  
# ax.spines["top"].set_visible(False)  
# ax.spines["right"].set_visible(False)  
# ax.get_xaxis().tick_bottom()  
# ax.get_yaxis().tick_left() 

ax = fig_setup()

n, bins, p = ax.hist(Txs_exp1, 
        bins='fd',
        normed=True,
        cumulative=True, 
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[0], 
        color=tableau20[1],
        alpha=0.75)

ax.hist(sample_pdf_exp1, 
        bins='fd', 
        normed=True,
        cumulative=True, 
        histtype='stepfilled', 
        label="$cmd$",
        edgecolor=tableau20[6], 
        color=tableau20[7], 
        alpha=0.25)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title, y=1.05)
plt.legend(loc='upper left')

plt.savefig('osg_exp1_cu_cumulative_hist.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
# Values for analytical pdf
sample_pdf_exp2 = np.random.normal(loc=f_mu_exp2, scale=f_sigma_exp2, size=len(Txs_exp2))

metric      = 'T_x Experiment 1'
description = 'Cumulative distribution of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures_exp2['mu'], 
                                                           Tx_measures_exp2['std'], 
                                                           Tx_measures_exp2['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

ax = fig_setup()

n, bins, p = ax.hist(Txs_exp2, 
        bins='fd',
        normed=True,
        cumulative=True, 
        histtype='stepfilled', 
        label="$T_x$", 
        linewidth=0.75,
        edgecolor=tableau20[2], 
        color=tableau20[3],
        alpha=0.75)

ax.hist(sample_pdf_exp2, 
        bins='fd', 
        normed=True,
        cumulative=True, 
        histtype='stepfilled', 
        label="$cmd$",
        edgecolor=tableau20[6], 
        color=tableau20[7], 
        alpha=0.25)

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title, y=1.05)
plt.legend(loc='upper left')

plt.savefig('osg_exp2_cu_cumulative_hist.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
Txs_np = np.array(Txs)

# Cumulative samples
Txs_sum = np.cumsum(np.ones(Txs_np.shape))/len(Txs)

# Values for analytical cdf
sample_cdf = np.linspace(0,max(Txs), len(Txs))

metric      = 'T_x'
description = 'Cumulative distribution of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures['mu'], Tx_measures['std'], Tx_measures['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

# fig = plt.figure()
# ax = fig.add_subplot(111)  
# ax.spines["top"].set_visible(False)  
# ax.spines["right"].set_visible(False)  
# ax.get_xaxis().tick_bottom()  
# ax.get_yaxis().tick_left() 

ax = fig_setup()

ax.plot(sample_cdf, 
        sps.norm.cdf(sample_cdf, f_mu, f_sigma), 
        label="cdf",
        color=tableau20[6])

ax.step(Txs, 
        Txs_sum,
        label="$T_x$",
        where='post',
        color=tableau20[8])

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title, y=1.05)
plt.legend(loc='upper left')

plt.savefig('osg_cu_cumulative_plot.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
Txs_np_exp1 = np.array(Txs_exp1)

# Cumulative samples
Txs_sum_exp1 = np.cumsum(np.ones(Txs_np_exp1.shape))/len(Txs_exp1)

# Values for analytical cdf
sample_cdf_exp1 = np.linspace(0,max(Txs_exp1), len(Txs_exp1))

metric      = 'T_x Experiment 1'
description = 'Cumulative distribution of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs_exp1)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures_exp1['mu'], 
                                                           Tx_measures_exp1['std'], 
                                                           Tx_measures_exp1['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

# fig = plt.figure()
# ax = fig.add_subplot(111)  
# ax.spines["top"].set_visible(False)  
# ax.spines["right"].set_visible(False)  
# ax.get_xaxis().tick_bottom()  
# ax.get_yaxis().tick_left() 

ax = fig_setup()

ax.plot(sample_cdf_exp1, 
        sps.norm.cdf(sample_cdf_exp1, f_mu_exp1, f_sigma_exp1), 
        label="cdf",
        color=tableau20[6])

ax.step(Txs_exp1, 
        Txs_sum_exp1,
        label="$T_x$",
        where='post',
        color=tableau20[0])

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title, y=1.05)
plt.legend(loc='upper left')

plt.savefig('osg_exp1_cu_cumulative_plot.pdf', dpi=600, bbox_inches='tight')



In [ ]:

    
Txs_np_exp2 = np.array(Txs_exp2)

# Cumulative samples
Txs_sum_exp2 = np.cumsum(np.ones(Txs_np_exp2.shape))/len(Txs_exp2)

# Values for analytical cdf
sample_cdf_exp2 = np.linspace(0,max(Txs_exp2), len(Txs_exp2))

metric      = 'T_x Experiment 2'
description = 'Cumulative distribution of $%s$ compared to its fitted normal distribution' % metric
task        = 'Gromacs emulation'
repetition  = '$%s$ repetitions' % len(Txs_exp2)
resource    = 'XSEDE OSG Virtual Cluster'
stats       = '$\mu$=%.3f,\ \sigma=%.3f,\ SE_\mu=%.3f$' % (Tx_measures_exp2['mu'], 
                                                           Tx_measures_exp2['std'], 
                                                           Tx_measures_exp2['sem'])
title       = '%s.\n%s; %s;\n%s.' % (description, repetition, resource, stats)

# fig = plt.figure()
# ax = fig.add_subplot(111)  
# ax.spines["top"].set_visible(False)  
# ax.spines["right"].set_visible(False)  
# ax.get_xaxis().tick_bottom()  
# ax.get_yaxis().tick_left() 

ax = fig_setup()

ax.plot(sample_cdf_exp2, 
        sps.norm.cdf(sample_cdf_exp2, f_mu_exp2, f_sigma_exp2), 
        label="cdf",
        color=tableau20[6])

ax.step(Txs_exp2, 
        Txs_sum_exp2,
        label="$T_x$",
        where='post',
        color=tableau20[2])

plt.ylabel('$P(Tx)$')
plt.xlabel('$T_x$ (s)')
plt.title(title, y=1.05)
plt.legend(loc='upper left')

plt.savefig('osg_exp2_cu_cumulative_plot.pdf', dpi=600, bbox_inches='tight')

	exp1_8	exp2_8	exp1_16	exp2_16	exp1_32	exp2_32	exp1_64	exp2_64
count	4.00	4.00	5.00	4.00	3.00	4.00	4.00	3.00
mean	345.70	195.96	285.81	237.65	241.09	228.17	217.87	252.16
std	118.48	22.91	92.50	48.81	14.39	46.35	31.70	47.25
min	260.74	177.78	229.38	188.40	230.01	193.47	172.40	197.89
25%	267.61	182.89	233.92	202.18	232.95	200.77	210.09	236.15
50%	302.94	188.47	247.59	233.88	235.90	211.78	226.96	274.40
75%	381.03	201.54	269.21	269.34	246.62	239.18	234.73	279.29
max	516.15	229.15	448.93	294.44	257.35	295.66	245.16	284.18

	exp1_8	exp1_16	exp1_32	exp1_64	exp2_8	exp2_16	exp2_32	exp2_64
count	4.00	5.00	3.00	4.00	4.00	4.00	4.00	3.00
mean	846.03	1547.02	2910.13	5823.36	1717.90	2997.32	3141.84	6990.58
std	60.78	225.19	304.43	1078.16	1347.03	784.92	814.43	2432.62
min	785.97	1295.34	2576.71	4684.35	669.08	2201.90	2291.96	4326.35
25%	797.54	1397.92	2778.55	5254.99	1098.33	2458.23	2525.01	5939.16
50%	849.29	1571.22	2980.38	5678.90	1253.41	2913.30	3180.50	7551.97
75%	897.78	1584.02	3076.84	6247.27	1872.99	3452.38	3797.34	8322.70
max	899.55	1886.61	3173.30	7251.28	3695.69	3960.77	3914.41	9093.43

	experiment	session
rp.session.radical.mingtha.017033.0007	exp1	<radical.analytics.session.Session object at 0...
rp.session.radical.mingtha.017033.0008	exp1	<radical.analytics.session.Session object at 0...
rp.session.radical.mingtha.017033.0009	exp1	<radical.analytics.session.Session object at 0...

	experiment	session
rp.session.radical.mturilli.017087.0004	exp4	<radical.analytics.session.Session object at 0...
rp.session.radical.mturilli.017087.0005	exp4	<radical.analytics.session.Session object at 0...
rp.session.radical.mturilli.017087.0006	exp4	<radical.analytics.session.Session object at 0...

	TTC
rp.session.radical.mingtha.017033.0007	1596.3935
rp.session.radical.mingtha.017033.0008	4904.5828
rp.session.radical.mingtha.017033.0009	2796.6535

	TTC
rp.session.radical.mturilli.017087.0004	2580.9824
rp.session.radical.mturilli.017087.0005	4595.9861
rp.session.radical.mturilli.017087.0006	1596.5316

	nunits
rp.session.radical.mingtha.017033.0007	16.0
rp.session.radical.mingtha.017033.0008	64.0
rp.session.radical.mingtha.017033.0009	32.0

	exp1_8	exp2_8	exp1_16	exp2_16	exp1_32	exp2_32	exp1_64	exp2_64
count	4.000000	4.000000	5.000000	4.000000	3.000000	4.000000	4.000000	3.000000
mean	1329.148650	2169.554250	1889.015980	3385.244650	3215.066967	3550.343500	6111.126250	7431.210467
std	75.696194	1218.997636	292.162669	828.519558	362.381967	732.720257	1200.296161	2470.739147
min	1217.785800	1479.814100	1596.393500	2649.923700	2796.653500	2761.210100	4904.582800	4790.166400
25%	1313.580675	1511.927450	1793.549300	2770.503525	3108.323600	3009.557000	5486.647825	6303.719150
50%	1360.088800	1602.517150	1802.504700	3222.135900	3419.993700	3625.530900	5895.695900	7817.271900
75%	1375.656775	2260.143950	1874.733900	3836.877025	3424.273700	4166.317400	6520.174325	8751.732500
max	1378.631200	3993.368600	2377.898500	4446.783100	3428.553700	4189.102100	7748.530400	9686.193100

	TT_PILOT_PMGR_SCHEDULING	TT_PILOT_PMGR_QUEUING	TT_PILOT_LRMS_SUBMITTING	TT_PILOT_LRMS_QUEUING	TT_PILOT_LRMS_RUNNING
rp.session.radical.mingtha.017033.0007	0.1115	0.0021	40.8441	448.9312	1331.9196
rp.session.radical.mingtha.017033.0008	0.1159	0.0019	32.6656	172.4000	4716.4960
rp.session.radical.mingtha.017033.0009	0.1041	0.0022	30.7743	235.8966	2603.0602

	TT_PILOT_PMGR_SCHEDULING	TT_PILOT_PMGR_QUEUING	TT_PILOT_LRMS_SUBMITTING	TT_PILOT_LRMS_QUEUING	TT_PILOT_LRMS_RUNNING
rp.session.radical.mturilli.017087.0004	0.0992	0.0031	34.9302	174.6902	2391.6327
rp.session.radical.mturilli.017087.0005	0.0966	0.0023	34.8597	201.2169	4397.9872
rp.session.radical.mturilli.017087.0006	0.0973	0.0015	34.5969	266.3615	1412.9129

	LRMS_QUEUING	LRMS_RUNNING	LRMS_SUBMITTING	PMGR_QUEUING	PMGR_SCHEDULING	experiment	pid	sid
0	217.1893	1330.8070	40.8441	0.0021	0.1115	exp1	pilot.0002	rp.session.radical.mingtha.017033.0007
1	220.5954	1321.0085	40.8441	0.0021	0.1115	exp1	pilot.0003	rp.session.radical.mingtha.017033.0007
2	448.9312	1096.9671	40.8441	0.0021	0.1115	exp1	pilot.0000	rp.session.radical.mingtha.017033.0007

	LRMS_QUEUING	LRMS_RUNNING	LRMS_SUBMITTING	PMGR_QUEUING	PMGR_SCHEDULING	experiment	pid	sid
365	154.5414	1400.0966	34.5969	0.0015	0.0973	exp4	pilot.0007	rp.session.radical.mturilli.017087.0006
366	155.1169	1399.5211	34.5969	0.0015	0.0973	exp4	pilot.0004	rp.session.radical.mturilli.017087.0006
367	141.7251	1412.9129	34.5969	0.0015	0.0973	exp4	pilot.0005	rp.session.radical.mturilli.017087.0006

	TT_UNIT_UMGR_SCHEDULING	TT_UNIT_UMGR_BINDING	TT_IF_UMGR_SCHEDULING	TT_IF_UMGR_QUEING	TT_IF_AGENT_SCHEDULING	TT_IF_AGENT_QUEUING	TT_IF_AGENT_TRANSFERRING	TT_UNIT_AGENT_QUEUING	TT_UNIT_AGENT_SCHEDULING	TT_UNIT_AGENT_QUEUING_EXEC	TT_UNIT_AGENT_EXECUTING	TT_OF_AGENT_QUEUING	TT_OF_UMGR_SCHEDULING	TT_OF_UMGR_QUEUING	TT_OF_UMGR_TRANSFERRING
rp.session.radical.mingtha.017033.0007	0.2603	0.0073	893.4051	0.014800	0.003800	73.470300	0.004801	0.018100	953.1781	0.0223	1295.3383	1.4884	0.0124	36.893699	0.004000
rp.session.radical.mingtha.017033.0008	0.4086	0.0375	4049.1254	0.069201	0.017399	410.061300	0.018901	0.089498	4281.2538	0.0920	4684.3459	0.0937	0.0410	151.972700	0.017699
rp.session.radical.mingtha.017033.0009	0.9616	0.0105	2038.6201	0.032899	0.008401	149.587099	0.009701	0.040300	2212.5801	0.0445	2576.7140	1.3749	0.0200	73.501200	0.008899

	TT_UNIT_UMGR_SCHEDULING	TT_UNIT_UMGR_BINDING	TT_IF_UMGR_SCHEDULING	TT_IF_UMGR_QUEING	TT_IF_AGENT_SCHEDULING	TT_IF_AGENT_QUEUING	TT_IF_AGENT_TRANSFERRING	TT_UNIT_AGENT_QUEUING	TT_UNIT_AGENT_SCHEDULING	TT_UNIT_AGENT_QUEUING_EXEC	TT_UNIT_AGENT_EXECUTING	TT_OF_AGENT_QUEUING	TT_OF_UMGR_SCHEDULING	TT_OF_UMGR_QUEUING	TT_OF_UMGR_TRANSFERRING
rp.session.radical.mturilli.017087.0004	0.6705	0.0285	1734.9091	0.056300	0.014200	250.449400	0.020401	0.085599	1843.7852	0.091300	2144.5490	0.197700	0.145201	144.542200	0.015099
rp.session.radical.mturilli.017087.0005	1.1227	0.0313	3749.3999	0.126799	0.031199	639.056804	0.044399	0.209000	3843.7559	0.199201	4141.2798	0.236402	0.517299	287.558597	0.031201
rp.session.radical.mturilli.017087.0006	0.2231	0.0056	299.7530	0.010000	0.003100	43.557600	0.011400	0.019500	671.4032	0.033999	1165.2300	0.067600	0.018200	32.946399	0.002701

	TT_PILOT_LRMS_RUNNING	TT_UNIT_AGENT_EXECUTING
count	72.000000	72.000000
mean	3490.056229	3301.837846
std	2698.278110	2685.132010
min	697.522100	665.675400
25%	1631.464550	1446.410950
50%	2593.843550	2560.194050
75%	4453.639975	4005.900000
max	15388.901700	15361.804200

	IF_AGENT_QUEUING	IF_AGENT_SCHEDULING	IF_AGENT_TRANSFERRING	IF_UMGR_QUEING	IF_UMGR_SCHEDULING	OF_AGENT_QUEUING	OF_UMGR_QUEUING	OF_UMGR_SCHEDULING	OF_UMGR_TRANSFERRING	UNIT_AGENT_EXECUTING	UNIT_AGENT_QUEUING	UNIT_AGENT_QUEUING_EXEC	UNIT_AGENT_SCHEDULING	UNIT_UMGR_BINDING	UNIT_UMGR_SCHEDULING	experiment	pid	sid
0	7.4205	NaN	0.0004	0.0016	270.2836	NaN	2.2764	NaN	0.0002	210.2204	0.0016	0.0012	239.5139	0.0073	0.2526	exp1	unit.000009	rp.session.radical.mingtha.017033.0007
1	7.4205	NaN	0.0004	0.0016	270.2836	NaN	2.6414	NaN	0.0003	239.5106	0.0016	0.0012	0.0007	0.0073	0.2534	exp1	unit.000008	rp.session.radical.mingtha.017033.0007
2	8.1140	NaN	0.0005	0.0015	315.0004	NaN	2.2641	NaN	0.0002	249.2585	0.0018	0.0014	294.9150	0.0073	0.2517	exp1	unit.000010	rp.session.radical.mingtha.017033.0007

	IF_AGENT_QUEUING	IF_AGENT_SCHEDULING	IF_AGENT_TRANSFERRING	IF_UMGR_QUEING	IF_UMGR_SCHEDULING	OF_AGENT_QUEUING	OF_UMGR_QUEUING	OF_UMGR_SCHEDULING	OF_UMGR_TRANSFERRING	UNIT_AGENT_EXECUTING	UNIT_AGENT_QUEUING	UNIT_AGENT_QUEUING_EXEC	UNIT_AGENT_SCHEDULING	UNIT_UMGR_BINDING	UNIT_UMGR_SCHEDULING	experiment	pid	sid
2413	6.1270	NaN	0.0068	0.0013	228.8057	NaN	1.8667	NaN	0.0002	416.3933	0.0029	0.0056	0.0062	0.0056	0.2196	exp4	unit.000007	rp.session.radical.mturilli.017087.0006
2414	2.9402	NaN	0.0005	0.0014	203.7040	NaN	1.9489	NaN	0.0002	333.5874	0.0020	0.0013	0.0006	0.0056	0.2225	exp4	unit.000001	rp.session.radical.mturilli.017087.0006
2415	7.5189	NaN	0.0010	0.0011	234.2630	NaN	1.8191	NaN	0.0001	420.7333	0.0042	0.0024	450.8125	0.0056	0.2231	exp4	unit.000000	rp.session.radical.mturilli.017087.0006