Evaluation of Transport Network Criticality Results

Bramka Arga Jafino

Delft University of Technology

Faculty of Technology, Policy and Management

An introduction note

This notebook presents further evaluations of Bangladesh's multimodal freight transport network criticality results. This evaluation observes (dis)similarities between the metrics. That is, which pairs of criticality metrics are overlapping (highlighting the same set of links in the network as critical) and which of them are complementary (highlighting different sets of links in the network as critical). Kolmogorov-smirnov distance and Correlation coefficients are used for this purpose.

There are five analysis in this notebook:

A01 Individual assessment of top 100 links for each metric
A02 Overlapping density between metrics
A03 Kolmogorov-Smirnov between metrics
A04 Spearman rank correlation coefficients between metrics
A05 Spearman rank correlation coefficients between final metrics set

0. Import all required modules and files



In [1]:

    
%matplotlib inline
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.pylab import *
import matplotlib.colors as colors
import seaborn as sns
from __future__ import division

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

#Modules developed for this project
from transport_network_modeling import network_visualization as net_v
from transport_network_modeling import criticality as crit



In [2]:

    
#import criticality results
crit_df_loc = r'./criticality_results/result_interdiction_1107noz2_v03.csv'
crit_df = pd.read_csv(crit_df_loc)



In [3]:

    
#remove one strange value in m2_02
crit_df['m2_02'] = crit_df.m2_02.apply(lambda val: 0 if val < 1.39e-10 else val)



In [4]:

    
#record the metric names
metric_names = {'m01_01' : ['Change in unweighted daily accessibility', 
                            'Topological', 'Accessibility', 'Network-wide'], 
                'm01_02' : ['Change in number of nodes accessible within daily reach', 
                            'Topological', 'Accessibility', 'Network-wide'], 
                'm02_01' : ['Change in unweighted total travel cost', 
                           'Topological', 'Total travel cost', 'Network-wide'], 
                'm02_02' : ['Change in network average efficiency', 
                            'Topological', 'Total travel cost', 'Network-wide'], 
                'm03_01' : ['Unweighted link betweenness centrality', 
                            'Topological', 'Total travel cost', 'Localized'], 
                'm03_02' : ['Change in region-based unweighted total travel cost', 
                           'Topological', 'Total travel cost', 'Localized'], 
                'm04_01' : ['Minimum link cut centrality', 
                            'Topological', 'Connectivity', 'Network-wide'], 
                'm04_02' : ['OD k-connectivity', 
                            'Topological', 'Connectivity', 'Network-wide'], 
                'm05_01' : ['Nearby alternative links (simplified)', 
                            'Topological', 'Connectivity', 'Localized'], 
                'm06_01' : ['Change in weighted accessibility', 
                            'System-based', 'Accessibility', 'Network-wide'],
                'm07_01' : ['Change in weighted total travel cost', 
                            'System-based', 'Total travel cost', 'Network-wide'], 
                'm07_02' : ['Change in expected user exposure', 
                            'System-based', 'Total travel cost', 'Network-wide'], 
                'm07_03' : ['Change in worst-case user exposure', 
                            'System-based', 'Total travel cost', 'Network-wide'], 
                'm08_01' : ['Traffic flow data', 
                            'System-based', 'Total travel cost', 'Localized'], 
                'm08_02' : ['Weighted link betweenness centrality', 
                            'System-based', 'Total travel cost', 'Localized'], 
                'm08_03' : ['Volume over capacity', 
                            'System-based', 'Total travel cost', 'Localized'], 
                'm09_01' : ['Unsatisfied demand',  
                            'System-based', 'Connectivity', 'Network-wide'], 
                'm10' : ['Exposure to disaster', 
                         'System-based', 'Connectivity', 'Localized']}



In [5]:

    
# Show the metrics and their associated code
metric_names_df = pd.DataFrame()
metric_names_df['Code'] = metric_names.keys()
metric_names_df['Description'] = [x[0] for x in metric_names.values()]
metric_names_df['Layer I (Paradigm)'] = [x[1] for x in metric_names.values()]
metric_names_df['Layer II (Functionality)'] = [x[2] for x in metric_names.values()]
metric_names_df['Layer III (Aggregation)'] = [x[3] for x in metric_names.values()]
metric_names_df.sort_values(by='Code', inplace=True)
metric_names_df.index = range(len(metric_names_df))

print('Metrics Code and Description')
metric_names_df









    



Metrics Code and Description






    Out[5]:






  
    
      
      Code
      Description
      Layer I (Paradigm)
      Layer II (Functionality)
      Layer III (Aggregation)
    
  
  
    
      0
      m01_01
      Change in unweighted daily accessibility
      Topological
      Accessibility
      Network-wide
    
    
      1
      m01_02
      Change in number of nodes accessible within da...
      Topological
      Accessibility
      Network-wide
    
    
      2
      m02_01
      Change in unweighted total travel cost
      Topological
      Total travel cost
      Network-wide
    
    
      3
      m02_02
      Change in network average efficiency
      Topological
      Total travel cost
      Network-wide
    
    
      4
      m03_01
      Unweighted link betweenness centrality
      Topological
      Total travel cost
      Localized
    
    
      5
      m03_02
      Change in region-based unweighted total travel...
      Topological
      Total travel cost
      Localized
    
    
      6
      m04_01
      Minimum link cut centrality
      Topological
      Connectivity
      Network-wide
    
    
      7
      m04_02
      OD k-connectivity
      Topological
      Connectivity
      Network-wide
    
    
      8
      m05_01
      Nearby alternative links (simplified)
      Topological
      Connectivity
      Localized
    
    
      9
      m06_01
      Change in weighted accessibility
      System-based
      Accessibility
      Network-wide
    
    
      10
      m07_01
      Change in weighted total travel cost
      System-based
      Total travel cost
      Network-wide
    
    
      11
      m07_02
      Change in expected user exposure
      System-based
      Total travel cost
      Network-wide
    
    
      12
      m07_03
      Change in worst-case user exposure
      System-based
      Total travel cost
      Network-wide
    
    
      13
      m08_01
      Traffic flow data
      System-based
      Total travel cost
      Localized
    
    
      14
      m08_02
      Weighted link betweenness centrality
      System-based
      Total travel cost
      Localized
    
    
      15
      m08_03
      Volume over capacity
      System-based
      Total travel cost
      Localized
    
    
      16
      m09_01
      Unsatisfied demand
      System-based
      Connectivity
      Network-wide
    
    
      17
      m10
      Exposure to disaster
      System-based
      Connectivity
      Localized

A01 Individual assessment of top 100 links for each metric

Before analysing the (dis)similarities between the metrics, it is interesting to observe the distribution pattern of the criticality scores from each metric. In this way, we can understand if there are few links with high criticality scores, or if the criticality scores are normally distributed between all links in the network.

This section explores the distribution pattern of the criticality scores from the top 100 links in each criticality metric.

Back to table of contents



In [6]:

    
#subset the result dataframe to only the relevant columns
crit_df2 = crit_df[['osmid','m1_01', 'm1_02', 'm2_01', 'm2_02', 'm3_01', 'm3_02', 'm4_01', 'm4_02', 'm5_01', 'm6_01',
             'm7_01', 'm7_02', 'm7_03', 'm8_01', 'm8_02', 'm8_03', 'm9_01', 'm10']]

#rename the metrics, adding '0' before the number
crit_df2.columns = ['osmid','m01_01', 'm01_02', 'm02_01', 'm02_02', 'm03_01', 'm03_02', 'm04_01', 'm04_02', 'm05_01', 'm06_01',
             'm07_01', 'm07_02', 'm07_03', 'm08_01', 'm08_02', 'm08_03', 'm09_01', 'm10']

#record the name of each metric
all_metric = sorted(metric_names.keys())



In [7]:

    
#alter the m5_01 criticality scores so that they are consistent to the formula  described in the report
crit_df2['m05_01'] = crit_df2.m05_01.apply(lambda x: 1/x if x > 0 else 2)









    



C:\Users\Lenovo\Anaconda2\lib\site-packages\ipykernel\__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app



In [8]:

    
# Visualize the scores distribution

print("Individual metric scores' distribution")
fig1 = plt.figure(figsize=(24,60))
c=0
n=100
for num, metric in enumerate(all_metric):
    new_df = crit_df2[[metric, 'osmid']]
    new_df = new_df.loc[new_df[metric]!=0]
    topn_list = []
    #take only top 100 links
    try:
        topn_list.extend(list(new_df.sort_values(metric, ascending=False).osmid[:n]))
    except:
        topn_list.extend(list(new_df.sort_values(metric).osmid))
        
    new_df = new_df.loc[new_df['osmid'].isin(topn_list)]
    
    sns.set_style('white')
    exec("ax{} = fig1.add_subplot(12, 5, c+1)".format(num))
    exec("b = sns.distplot(new_df[metric], kde=False, rug=False, ax=ax{})".format(num))
    
    b.set_xlabel(metric, fontsize=24)
    c+=1
fig1.tight_layout()
plt.show()









    



Individual metric scores' distribution

A02 Overlapping distribution between metrics

In order to select the appropriate correlation coefficient technique, the distribution between each metric pair should be observed. If they follow a normal distribution, Pearson correlation coefficient can be used. Else, Spearman-rank correlation coefficient should be used.

This section visualizes the comparisons of criticality scores distributions between all metrics.

Back to table of contents

First, create a DataFrame of union of top 100 critical links from all metrics



In [9]:

    
n=100
topn_list = []
for metric in all_metric:
    new_data = crit_df2.loc[crit_df2[metric]!=0]
    try:
        topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
    except:
        topn_list.extend(list(new_data.sort_values(metric).osmid))
        
topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

Next, visualize the data



In [10]:

    
net_v.overlap_distribution(crit_df=crit_df2, all_metric=all_metric)









    



transport_network_modeling\network_visualization.py:559: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
  ax1.set_axis_bgcolor('white')
transport_network_modeling\network_visualization.py:574: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
  ax2.set_axis_bgcolor('white')

As seen above, most of the metric pairs are not normally distributed. Therefore, Spearman-rank correlation coefficient should be used.

A03 K-S Heatmap

While the visualization above already gives a general idea of the metric pairs' distribution overlap, it is more convenient if the results are displayed in a single graph. For this purpose, Kolmogorov-Smirnov (K-S) distance between the metrics are calculated.

Back to table of contents

First, create a DataFrame of union of top 100 critical links from all metrics



In [11]:

    
n=100
topn_list = []
for metric in all_metric:
    new_data = crit_df2.loc[crit_df2[metric]!=0]
    try:
        topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
    except:
        topn_list.extend(list(new_data.sort_values(metric).osmid))
        
topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

Then calculate the K-S distance



In [12]:

    
#normalize between 0 and 1
#because the ks_2samp function assumes similar scale of values between the two datasets
for metric in all_metric:
    minval = crit_df2[metric].min()
    maxval = crit_df2[metric].max()
    rang = maxval - minval
    crit_df2[metric] = crit_df2[metric].apply(lambda val: (val-minval)/rang)



In [13]:

    
ks_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)
for index, rows1 in ks_df.iterrows():
    for value, rows2 in rows1.iteritems():
        D, p = crit.correlate_metrics_ks(df=crit_df2, m_a=index, m_b=value)
        ks_df.set_value(index, value, D)

Finally, visualize it in a heatmap



In [14]:

    
net_v.correlation_plot(ks_df, title='K-S Distance between metrics')

A04 Spearman correlation

In order to observe the (dis)similarities between metrics, Spearman-rank correlation coefficients are used. If two metrics have high correlation coefficients, they can be considered as overlapping as they highlight the same transport segments as critical. Thus, one of them is subject to be eliminated.

Back to table of contents

First, create a DataFrame of union of top 100 critical links from all metrics



In [15]:

    
n=100
topn_list = []
for metric in all_metric:
    new_data = crit_df2.loc[crit_df2[metric]!=0]
    try:
        topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
    except:
        topn_list.extend(list(new_data.sort_values(metric).osmid))
        
topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

Then calculate the Spearman rank correlation coefficients between all metrics



In [16]:

    
spearmanr_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)

for index, rows1 in spearmanr_df.iterrows():
    for value, rows2 in rows1.iteritems():
        r, p, n = crit.correlate_metrics_spearman(df=crit_df2, m_a=index, m_b=value)
        spearmanr_df.set_value(index, value, r)

Lastly, visualize it in a heatmap



In [17]:

    
net_v.correlation_plot(spearmanr_df, title='Spearman Rank Correlation')

From the graph above we can observe the following metrics as having high correlation coefficients with the other metrics in general, as well as their corresponding counterpart:

- m1_01 - m1_02
- m1_02 - m1_01
- m2_01 - m2_02
- m2_02 - m2_01
- m3_01 - m4_02
- m4_02 - m3_01
- m6_01 - m7_01
- m7_01 - m6_01 and m2_01
- m7_02 - m6_01 and m2_01
- m7_03 - m6_01
- m8_02 - m3_01

Therefore we reduce the highly correlated metrics set to only:

- m1_01
- m2_01
- m4_02
- m6_01
- m8_02

Which means we leave out:

- m1_02
- m2_02
- m3_01
- m7_01
- m7_02
- m7_03

A05 Reduced Spearman Correlation

In the final step, we revisualize the Spearman correlation heatmap only for the reduced metrics set to make sure that there is no more metric with high correlation coefficients in general.

Back to table of contents



In [18]:

    
n=100
topn_list = []
for metric in all_metric:
    new_data = crit_df2.loc[crit_df2[metric]!=0]
    try:
        topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
    except:
        topn_list.extend(list(new_data.sort_values(metric).osmid))
        
topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

data2 = data2[['m01_01', 'm02_01', 'm03_02', 'm04_01', 'm04_02', 'm05_01', 'm06_01',
               'm08_01', 'm08_02', 'm08_03', 'm09_01', 'm10']]

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

spearmanr_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)
pearsonr_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)


for index, rows1 in spearmanr_df.iterrows():
    for value, rows2 in rows1.iteritems():
        r, p, n = crit.correlate_metrics_spearman(df=crit_df2, m_a=index, m_b=value)
        spearmanr_df.set_value(index, value, r)



In [19]:

    
net_v.correlation_plot(spearmanr_df, title='Spearman Rank Correlation')

	Code	Description	Layer I (Paradigm)	Layer II (Functionality)	Layer III (Aggregation)
0	m01_01	Change in unweighted daily accessibility	Topological	Accessibility	Network-wide
1	m01_02	Change in number of nodes accessible within da...	Topological	Accessibility	Network-wide
2	m02_01	Change in unweighted total travel cost	Topological	Total travel cost	Network-wide
3	m02_02	Change in network average efficiency	Topological	Total travel cost	Network-wide
4	m03_01	Unweighted link betweenness centrality	Topological	Total travel cost	Localized
5	m03_02	Change in region-based unweighted total travel...	Topological	Total travel cost	Localized
6	m04_01	Minimum link cut centrality	Topological	Connectivity	Network-wide
7	m04_02	OD k-connectivity	Topological	Connectivity	Network-wide
8	m05_01	Nearby alternative links (simplified)	Topological	Connectivity	Localized
9	m06_01	Change in weighted accessibility	System-based	Accessibility	Network-wide
10	m07_01	Change in weighted total travel cost	System-based	Total travel cost	Network-wide
11	m07_02	Change in expected user exposure	System-based	Total travel cost	Network-wide
12	m07_03	Change in worst-case user exposure	System-based	Total travel cost	Network-wide
13	m08_01	Traffic flow data	System-based	Total travel cost	Localized
14	m08_02	Weighted link betweenness centrality	System-based	Total travel cost	Localized
15	m08_03	Volume over capacity	System-based	Total travel cost	Localized
16	m09_01	Unsatisfied demand	System-based	Connectivity	Network-wide
17	m10	Exposure to disaster	System-based	Connectivity	Localized