# Evaluation of Transport Network Criticality Results

Bramka Arga Jafino

Delft University of Technology

Faculty of Technology, Policy and Management

## An introduction note

This notebook presents further evaluations of Bangladesh's multimodal freight transport network criticality results. This evaluation observes (dis)similarities between the metrics. That is, which pairs of criticality metrics are overlapping (highlighting the same set of links in the network as critical) and which of them are complementary (highlighting different sets of links in the network as critical). Kolmogorov-smirnov distance and Correlation coefficients are used for this purpose.

There are five analysis in this notebook:

## 0. Import all required modules and files

``````

In [1]:

%matplotlib inline
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.pylab import *
import matplotlib.colors as colors
import seaborn as sns
from __future__ import division

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
sys.path.append(module_path)

#Modules developed for this project
from transport_network_modeling import network_visualization as net_v
from transport_network_modeling import criticality as crit

``````
``````

In [2]:

#import criticality results
crit_df_loc = r'./criticality_results/result_interdiction_1107noz2_v03.csv'

``````
``````

In [3]:

#remove one strange value in m2_02
crit_df['m2_02'] = crit_df.m2_02.apply(lambda val: 0 if val < 1.39e-10 else val)

``````
``````

In [4]:

#record the metric names
metric_names = {'m01_01' : ['Change in unweighted daily accessibility',
'Topological', 'Accessibility', 'Network-wide'],
'm01_02' : ['Change in number of nodes accessible within daily reach',
'Topological', 'Accessibility', 'Network-wide'],
'm02_01' : ['Change in unweighted total travel cost',
'Topological', 'Total travel cost', 'Network-wide'],
'm02_02' : ['Change in network average efficiency',
'Topological', 'Total travel cost', 'Network-wide'],
'm03_01' : ['Unweighted link betweenness centrality',
'Topological', 'Total travel cost', 'Localized'],
'm03_02' : ['Change in region-based unweighted total travel cost',
'Topological', 'Total travel cost', 'Localized'],
'm04_01' : ['Minimum link cut centrality',
'Topological', 'Connectivity', 'Network-wide'],
'm04_02' : ['OD k-connectivity',
'Topological', 'Connectivity', 'Network-wide'],
'm05_01' : ['Nearby alternative links (simplified)',
'Topological', 'Connectivity', 'Localized'],
'm06_01' : ['Change in weighted accessibility',
'System-based', 'Accessibility', 'Network-wide'],
'm07_01' : ['Change in weighted total travel cost',
'System-based', 'Total travel cost', 'Network-wide'],
'm07_02' : ['Change in expected user exposure',
'System-based', 'Total travel cost', 'Network-wide'],
'm07_03' : ['Change in worst-case user exposure',
'System-based', 'Total travel cost', 'Network-wide'],
'm08_01' : ['Traffic flow data',
'System-based', 'Total travel cost', 'Localized'],
'm08_02' : ['Weighted link betweenness centrality',
'System-based', 'Total travel cost', 'Localized'],
'm08_03' : ['Volume over capacity',
'System-based', 'Total travel cost', 'Localized'],
'm09_01' : ['Unsatisfied demand',
'System-based', 'Connectivity', 'Network-wide'],
'm10' : ['Exposure to disaster',
'System-based', 'Connectivity', 'Localized']}

``````
``````

In [5]:

# Show the metrics and their associated code
metric_names_df = pd.DataFrame()
metric_names_df['Code'] = metric_names.keys()
metric_names_df['Description'] = [x[0] for x in metric_names.values()]
metric_names_df['Layer I (Paradigm)'] = [x[1] for x in metric_names.values()]
metric_names_df['Layer II (Functionality)'] = [x[2] for x in metric_names.values()]
metric_names_df['Layer III (Aggregation)'] = [x[3] for x in metric_names.values()]
metric_names_df.sort_values(by='Code', inplace=True)
metric_names_df.index = range(len(metric_names_df))

print('Metrics Code and Description')
metric_names_df

``````
``````

Metrics Code and Description

Out[5]:

Code
Description
Layer II (Functionality)
Layer III (Aggregation)

0
m01_01
Change in unweighted daily accessibility
Topological
Accessibility
Network-wide

1
m01_02
Change in number of nodes accessible within da...
Topological
Accessibility
Network-wide

2
m02_01
Change in unweighted total travel cost
Topological
Total travel cost
Network-wide

3
m02_02
Change in network average efficiency
Topological
Total travel cost
Network-wide

4
m03_01
Topological
Total travel cost
Localized

5
m03_02
Change in region-based unweighted total travel...
Topological
Total travel cost
Localized

6
m04_01
Topological
Connectivity
Network-wide

7
m04_02
OD k-connectivity
Topological
Connectivity
Network-wide

8
m05_01
Topological
Connectivity
Localized

9
m06_01
Change in weighted accessibility
System-based
Accessibility
Network-wide

10
m07_01
Change in weighted total travel cost
System-based
Total travel cost
Network-wide

11
m07_02
Change in expected user exposure
System-based
Total travel cost
Network-wide

12
m07_03
Change in worst-case user exposure
System-based
Total travel cost
Network-wide

13
m08_01
Traffic flow data
System-based
Total travel cost
Localized

14
m08_02
System-based
Total travel cost
Localized

15
m08_03
Volume over capacity
System-based
Total travel cost
Localized

16
m09_01
Unsatisfied demand
System-based
Connectivity
Network-wide

17
m10
Exposure to disaster
System-based
Connectivity
Localized

``````

Before analysing the (dis)similarities between the metrics, it is interesting to observe the distribution pattern of the criticality scores from each metric. In this way, we can understand if there are few links with high criticality scores, or if the criticality scores are normally distributed between all links in the network.

This section explores the distribution pattern of the criticality scores from the top 100 links in each criticality metric.

``````

In [6]:

#subset the result dataframe to only the relevant columns
crit_df2 = crit_df[['osmid','m1_01', 'm1_02', 'm2_01', 'm2_02', 'm3_01', 'm3_02', 'm4_01', 'm4_02', 'm5_01', 'm6_01',
'm7_01', 'm7_02', 'm7_03', 'm8_01', 'm8_02', 'm8_03', 'm9_01', 'm10']]

#rename the metrics, adding '0' before the number
crit_df2.columns = ['osmid','m01_01', 'm01_02', 'm02_01', 'm02_02', 'm03_01', 'm03_02', 'm04_01', 'm04_02', 'm05_01', 'm06_01',
'm07_01', 'm07_02', 'm07_03', 'm08_01', 'm08_02', 'm08_03', 'm09_01', 'm10']

#record the name of each metric
all_metric = sorted(metric_names.keys())

``````
``````

In [7]:

#alter the m5_01 criticality scores so that they are consistent to the formula  described in the report
crit_df2['m05_01'] = crit_df2.m05_01.apply(lambda x: 1/x if x > 0 else 2)

``````
``````

C:\Users\Lenovo\Anaconda2\lib\site-packages\ipykernel\__main__.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
from ipykernel import kernelapp as app

``````
``````

In [8]:

# Visualize the scores distribution

print("Individual metric scores' distribution")
fig1 = plt.figure(figsize=(24,60))
c=0
n=100
for num, metric in enumerate(all_metric):
new_df = crit_df2[[metric, 'osmid']]
new_df = new_df.loc[new_df[metric]!=0]
topn_list = []
try:
topn_list.extend(list(new_df.sort_values(metric, ascending=False).osmid[:n]))
except:
topn_list.extend(list(new_df.sort_values(metric).osmid))

new_df = new_df.loc[new_df['osmid'].isin(topn_list)]

sns.set_style('white')
exec("b = sns.distplot(new_df[metric], kde=False, rug=False, ax=ax{})".format(num))

b.set_xlabel(metric, fontsize=24)
c+=1
fig1.tight_layout()
plt.show()

``````
``````

Individual metric scores' distribution

``````

## A02 Overlapping distribution between metrics

In order to select the appropriate correlation coefficient technique, the distribution between each metric pair should be observed. If they follow a normal distribution, Pearson correlation coefficient can be used. Else, Spearman-rank correlation coefficient should be used.

This section visualizes the comparisons of criticality scores distributions between all metrics.

First, create a DataFrame of union of top 100 critical links from all metrics

``````

In [9]:

n=100
topn_list = []
for metric in all_metric:
new_data = crit_df2.loc[crit_df2[metric]!=0]
try:
topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
except:
topn_list.extend(list(new_data.sort_values(metric).osmid))

topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

``````

Next, visualize the data

``````

In [10]:

net_v.overlap_distribution(crit_df=crit_df2, all_metric=all_metric)

``````
``````

transport_network_modeling\network_visualization.py:559: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
ax1.set_axis_bgcolor('white')
transport_network_modeling\network_visualization.py:574: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
ax2.set_axis_bgcolor('white')

``````

As seen above, most of the metric pairs are not normally distributed. Therefore, Spearman-rank correlation coefficient should be used.

## A03 K-S Heatmap

While the visualization above already gives a general idea of the metric pairs' distribution overlap, it is more convenient if the results are displayed in a single graph. For this purpose, Kolmogorov-Smirnov (K-S) distance between the metrics are calculated.

First, create a DataFrame of union of top 100 critical links from all metrics

``````

In [11]:

n=100
topn_list = []
for metric in all_metric:
new_data = crit_df2.loc[crit_df2[metric]!=0]
try:
topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
except:
topn_list.extend(list(new_data.sort_values(metric).osmid))

topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

``````

Then calculate the K-S distance

``````

In [12]:

#normalize between 0 and 1
#because the ks_2samp function assumes similar scale of values between the two datasets
for metric in all_metric:
minval = crit_df2[metric].min()
maxval = crit_df2[metric].max()
rang = maxval - minval
crit_df2[metric] = crit_df2[metric].apply(lambda val: (val-minval)/rang)

``````
``````

In [13]:

ks_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)
for index, rows1 in ks_df.iterrows():
for value, rows2 in rows1.iteritems():
D, p = crit.correlate_metrics_ks(df=crit_df2, m_a=index, m_b=value)
ks_df.set_value(index, value, D)

``````

Finally, visualize it in a heatmap

``````

In [14]:

net_v.correlation_plot(ks_df, title='K-S Distance between metrics')

``````
``````

``````

## A04 Spearman correlation

In order to observe the (dis)similarities between metrics, Spearman-rank correlation coefficients are used. If two metrics have high correlation coefficients, they can be considered as overlapping as they highlight the same transport segments as critical. Thus, one of them is subject to be eliminated.

First, create a DataFrame of union of top 100 critical links from all metrics

``````

In [15]:

n=100
topn_list = []
for metric in all_metric:
new_data = crit_df2.loc[crit_df2[metric]!=0]
try:
topn_list.extend(list(new_data.sort_values(metric, ascending=False).osmid[:n]))
except:
topn_list.extend(list(new_data.sort_values(metric).osmid))

topn_list = list(set(topn_list))

data2 = crit_df2.ix[:, crit_df2.columns != 'osmid']

crit_df2 = crit_df2.loc[crit_df2['osmid'].isin(topn_list)]

``````

Then calculate the Spearman rank correlation coefficients between all metrics

``````

In [16]:

spearmanr_df = pd.DataFrame(np.nan, index=data2.columns, columns=data2.columns)

for index, rows1 in spearmanr_df.iterrows():
for value, rows2 in rows1.iteritems():
r, p, n = crit.correlate_metrics_spearman(df=crit_df2, m_a=index, m_b=value)
spearmanr_df.set_value(index, value, r)

``````

Lastly, visualize it in a heatmap

``````

In [17]:

net_v.correlation_plot(spearmanr_df, title='Spearman Rank Correlation')

``````
``````

``````