Evaluation of Transport Network Criticality Results

Bramka Arga Jafino

Delft University of Technology

Faculty of Technology, Policy and Management

An introduction note

This notebook presents further evaluations of Bangladesh's multimodal freight transport network criticality results. This evaluation observes (dis)similarities between the metrics. That is, which pairs of criticality metrics are overlapping (highlighting the same set of links in the network as critical) and which of them are complementary (highlighting different sets of links in the network as critical). Kolmogorov-smirnov distance and Correlation coefficients are used for this purpose.

There are five analysis in this notebook:

  1. A01 Individual assessment of top 100 links for each metric
  2. A02 Overlapping density between metrics
  3. A03 Kolmogorov-Smirnov between metrics
  4. A04 Spearman rank correlation coefficients between metrics
  5. A05 Spearman rank correlation coefficients between final metrics set

0. Import all required modules and files

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.pylab import *
import matplotlib.colors as colors
import seaborn as sns
from __future__ import division

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:

#Modules developed for this project
from transport_network_modeling import network_visualization as net_v
from transport_network_modeling import criticality as crit

In [2]:
#import criticality results
crit_df_loc = r'./criticality_results/result_interdiction_1107noz2_v03.csv'
crit_df = pd.read_csv(crit_df_loc)

In [3]:
#remove one strange value in m2_02
crit_df['m2_02'] = crit_df.m2_02.apply(lambda val: 0 if val < 1.39e-10 else val)

In [4]:
#record the metric names
metric_names = {'m01_01' : ['Change in unweighted daily accessibility', 
                            'Topological', 'Accessibility', 'Network-wide'], 
                'm01_02' : ['Change in number of nodes accessible within daily reach', 
                            'Topological', 'Accessibility', 'Network-wide'], 
                'm02_01' : ['Change in unweighted total travel cost', 
                           'Topological', 'Total travel cost', 'Network-wide'], 
                'm02_02' : ['Change in network average efficiency', 
                            'Topological', 'Total travel cost', 'Network-wide'], 
                'm03_01' : ['Unweighted link betweenness centrality', 
                            'Topological', 'Total travel cost', 'Localized'], 
                'm03_02' : ['Change in region-based unweighted total travel cost', 
                           'Topological', 'Total travel cost', 'Localized'], 
                'm04_01' : ['Minimum link cut centrality', 
                            'Topological', 'Connectivity', 'Network-wide'], 
                'm04_02' : ['OD k-connectivity', 
                            'Topological', 'Connectivity', 'Network-wide'], 
                'm05_01' : ['Nearby alternative links (simplified)', 
                            'Topological', 'Connectivity', 'Localized'], 
                'm06_01' : ['Change in weighted accessibility', 
                            'System-based', 'Accessibility', 'Network-wide'],
                'm07_01' : ['Change in weighted total travel cost', 
                            'System-based', 'Total travel cost', 'Network-wide'], 
                'm07_02' : ['Change in expected user exposure', 
                            'System-based', 'Total travel cost', 'Network-wide'], 
                'm07_03' : ['Change in worst-case user exposure', 
                            'System-based', 'Total travel cost', 'Network-wide'], 
                'm08_01' : ['Traffic flow data', 
                            'System-based', 'Total travel cost', 'Localized'], 
                'm08_02' : ['Weighted link betweenness centrality', 
                            'System-based', 'Total travel cost', 'Localized'], 
                'm08_03' : ['Volume over capacity', 
                            'System-based', 'Total travel cost', 'Localized'], 
                'm09_01' : ['Unsatisfied demand',  
                            'System-based', 'Connectivity', 'Network-wide'], 
                'm10' : ['Exposure to disaster', 
                         'System-based', 'Connectivity', 'Localized']}

In [5]:
# Show the metrics and their associated code
metric_names_df = pd.DataFrame()
metric_names_df['Code'] = metric_names.keys()
metric_names_df['Description'] = [x[0] for x in metric_names.values()]
metric_names_df['Layer I (Paradigm)'] = [x[1] for x in metric_names.values()]
metric_names_df['Layer II (Functionality)'] = [x[2] for x in metric_names.values()]
metric_names_df['Layer III (Aggregation)'] = [x[3] for x in metric_names.values()]
metric_names_df.sort_values(by='Code', inplace=True)
metric_names_df.index = range(len(metric_names_df))

print('Metrics Code and Description')

Metrics Code and Description
Code Description Layer I (Paradigm) Layer II (Functionality) Layer III (Aggregation)
0 m01_01 Change in unweighted daily accessibility Topological Accessibility Network-wide
1 m01_02 Change in number of nodes accessible within da... Topological Accessibility Network-wide
2 m02_01 Change in unweighted total travel cost Topological Total travel cost Network-wide
3 m02_02 Change in network average efficiency Topological Total travel cost Network-wide
4 m03_01 Unweighted link betweenness centrality Topological Total travel cost Localized
5 m03_02 Change in region-based unweighted total travel... Topological Total travel cost Localized
6 m04_01 Minimum link cut centrality Topological Connectivity Network-wide
7 m04_02 OD k-connectivity Topological Connectivity Network-wide
8 m05_01 Nearby alternative links (simplified) Topological Connectivity Localized
9 m06_01 Change in weighted accessibility System-based Accessibility Network-wide
10 m07_01 Change in weighted total travel cost System-based Total travel cost Network-wide
11 m07_02 Change in expected user exposure System-based Total travel cost Network-wide
12 m07_03 Change in worst-case user exposure System-based Total travel cost Network-wide
13 m08_01 Traffic flow data System-based Total travel cost Localized
14 m08_02 Weighted link betweenness centrality System-based Total travel cost Localized
15 m08_03 Volume over capacity System-based Total travel cost Localized
16 m09_01 Unsatisfied demand System-based Connectivity Network-wide
17 m10 Exposure to disaster System-based Connectivity Localized

Before analysing the (dis)similarities between the metrics, it is interesting to observe the distribution pattern of the criticality scores from each metric. In this way, we can understand if there are few links with high criticality scores, or if the criticality scores are normally distributed between all links in the network.

This section explores the distribution pattern of the criticality scores from the top 100 links in each criticality metric.

Back to table of contents

In [6]:
#subset the result dataframe to only the relevant columns
crit_df2 = crit_df[['osmid','m1_01', 'm1_02', 'm2_01', 'm2_02', 'm3_01', 'm3_02', 'm4_01', 'm4_02', 'm5_01', 'm6_01',
             'm7_01', 'm7_02', 'm7_03', 'm8_01', 'm8_02', 'm8_03', 'm9_01', 'm10']]

#rename the metrics, adding '0' before the number
crit_df2.columns = ['osmid','m01_01', 'm01_02', 'm02_01', 'm02_02', 'm03_01', 'm03_02', 'm04_01', 'm04_02', 'm05_01', 'm06_01',
             'm07_01', 'm07_02', 'm07_03', 'm08_01', 'm08_02', 'm08_03', 'm09_01', 'm10']

#record the name of each metric
all_metric = sorted(metric_names.keys())

In [7]:
#alter the m5_01 criticality scores so that they are consistent to the formula  described in the report
crit_df2['m05_01'] = crit_df2.m05_01.apply(lambda x: 1/x if x > 0 else 2)

C:\Users\Lenovo\Anaconda2\lib\site-packages\ipykernel\__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app

In [8]:
# Visualize the scores distribution

print("Individual metric scores' distribution")
fig1 = plt.figure(figsize=(24,60))
for num, metric in enumerate(all_metric):
    new_df = crit_df2[[metric, 'osmid']]
    new_df = new_df.loc[new_df[metric]!=0]
    topn_list = []
    #take only top 100 links
        topn_list.extend(list(new_df.sort_values(metric, ascending=False).osmid[:n]))
    new_df = new_df.loc[new_df['osmid'].isin(topn_list)]
    exec("ax{} = fig1.add_subplot(12, 5, c+1)".format(num))
    exec("b = sns.distplot(new_df[metric], kde=False, rug=False, ax=ax{})".format(num))
    b.set_xlabel(metric, fontsize=24)

Individual metric scores' distribution

A02 Overlapping distribution between metrics

In order to select the appropriate correlation coefficient technique, the distribution between each metric pair should be observed. If they follow a normal distribution, Pearson correlation coefficient can be used. Else, Spearman-rank correlation coefficient should be used.

This section visualizes the comparisons of criticality scores distributions between all metrics.

Back to table of contents

First, create a DataFrame of union of top 100 critical links from all metrics

In [9]:
topn_list = []
for metric in all_metric:
    new_data = crit_df2.loc[crit_df2[metric]!=0]
        topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

Next, visualize the data

In [10]:
net_v.overlap_distribution(crit_df=crit_df2, all_metric=all_metric)

transport_network_modeling\network_visualization.py:559: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
transport_network_modeling\network_visualization.py:574: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.

As seen above, most of the metric pairs are not normally distributed. Therefore, Spearman-rank correlation coefficient should be used.

A03 K-S Heatmap

While the visualization above already gives a general idea of the metric pairs' distribution overlap, it is more convenient if the results are displayed in a single graph. For this purpose, Kolmogorov-Smirnov (K-S) distance between the metrics are calculated.

Back to table of contents

First, create a DataFrame of union of top 100 critical links from all metrics

In [11]:
topn_list = []
for metric in all_metric:
    new_data = crit_df2.loc[crit_df2[metric]!=0]
        topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

Then calculate the K-S distance

In [12]:
#normalize between 0 and 1
#because the ks_2samp function assumes similar scale of values between the two datasets
for metric in all_metric:
    minval = crit_df2[metric].min()
    maxval = crit_df2[metric].max()
    rang = maxval - minval
    crit_df2[metric] = crit_df2[metric].apply(lambda val: (val-minval)/rang)

In [13]:
ks_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)
for index, rows1 in ks_df.iterrows():
    for value, rows2 in rows1.iteritems():
        D, p = crit.correlate_metrics_ks(df=crit_df2, m_a=index, m_b=value)
        ks_df.set_value(index, value, D)

Finally, visualize it in a heatmap

In [14]:
net_v.correlation_plot(ks_df, title='K-S Distance between metrics')

A04 Spearman correlation

In order to observe the (dis)similarities between metrics, Spearman-rank correlation coefficients are used. If two metrics have high correlation coefficients, they can be considered as overlapping as they highlight the same transport segments as critical. Thus, one of them is subject to be eliminated.

Back to table of contents

First, create a DataFrame of union of top 100 critical links from all metrics

In [15]:
topn_list = []
for metric in all_metric:
    new_data = crit_df2.loc[crit_df2[metric]!=0]
        topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

Then calculate the Spearman rank correlation coefficients between all metrics

In [16]:
spearmanr_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)

for index, rows1 in spearmanr_df.iterrows():
    for value, rows2 in rows1.iteritems():
        r, p, n = crit.correlate_metrics_spearman(df=crit_df2, m_a=index, m_b=value)
        spearmanr_df.set_value(index, value, r)

Lastly, visualize it in a heatmap

In [17]:
net_v.correlation_plot(spearmanr_df, title='Spearman Rank Correlation')

From the graph above we can observe the following metrics as having high correlation coefficients with the other metrics in general, as well as their corresponding counterpart:

- m1_01 - m1_02
- m1_02 - m1_01
- m2_01 - m2_02
- m2_02 - m2_01
- m3_01 - m4_02
- m4_02 - m3_01
- m6_01 - m7_01
- m7_01 - m6_01 and m2_01
- m7_02 - m6_01 and m2_01
- m7_03 - m6_01
- m8_02 - m3_01

Therefore we reduce the highly correlated metrics set to only:

- m1_01
- m2_01
- m4_02
- m6_01
- m8_02

Which means we leave out:

- m1_02
- m2_02
- m3_01
- m7_01
- m7_02
- m7_03

A05 Reduced Spearman Correlation

In the final step, we revisualize the Spearman correlation heatmap only for the reduced metrics set to make sure that there is no more metric with high correlation coefficients in general.

Back to table of contents

In [18]:
topn_list = []
for metric in all_metric:
    new_data = crit_df2.loc[crit_df2[metric]!=0]
        topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

data2 = data2[['m01_01', 'm02_01', 'm03_02', 'm04_01', 'm04_02', 'm05_01', 'm06_01',
               'm08_01', 'm08_02', 'm08_03', 'm09_01', 'm10']]

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

spearmanr_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)
pearsonr_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)

for index, rows1 in spearmanr_df.iterrows():
    for value, rows2 in rows1.iteritems():
        r, p, n = crit.correlate_metrics_spearman(df=crit_df2, m_a=index, m_b=value)
        spearmanr_df.set_value(index, value, r)

In [19]:
net_v.correlation_plot(spearmanr_df, title='Spearman Rank Correlation')