Most, if not all, operational risk capital models assume the existence of stationary frequency and severity distributions (typically Poisson for frequencies, and a subexponential distribution such as lognormal for severities). Yet every quarter (or whenever the model is recalibrated) risk capital goes up almost without fail, either because frequencies increase, severities increase or both.
The assumption of stationary distributions is just one limitation of current approaches to operational risk modeling, but it offers a good inroad for modeling approaches beyond the usual actuarial model typical in operational capital models.
In this notebook, we give a first example of how neural networks can overcome the stationarity assumptions of traditional approaches. The hope is that this is but one of many examples showing a better way to model operational risk.
Note: What follows if very much a work in progress . . .
In [3]:
import risklearning.learning_frequency as rlf
reload(rlf)
Out[3]:
In [4]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import math
import matplotlib.style
matplotlib.style.use('ggplot')
import ggplot as gg
%matplotlib inline
In [35]:
# Read in Poisson parameters used to simulate loss counts
lambdas_df = pd.read_csv('data/lambdas_tcem_1d.csv')
lambda_start = lambdas_df['TCEM'][0]
lambda_end = lambdas_df['TCEM'].tail(1).iloc[0]
print('Lambda start value: {}, lambda end value: {}'.format(lambda_start, lambda_end))
lambda_ts = lambdas_df['TCEM']
# Read in simulated loss counts
counts_sim_df = pd.read_csv('data/tcem_1d.csv')
# EDPM: Execution, Delivery and Process Management
# TCEM: Transaction Capture, Execution and Maintenance--think fat-finger mistake
counts_sim_df.head()
Out[35]:
In [52]:
#%% Do MLE (simple average for Poisson process
t_start = np.min(counts_sim_df['t'])
t_end = np.max(counts_sim_df['t'])
n_tenors_train = -t_start
n_tenors_test = t_end
counts_train = (counts_sim_df[counts_sim_df.t < 0]).groupby('L2_cat').sum()
counts_test = (counts_sim_df[counts_sim_df.t >= 0]).groupby('L2_cat').sum()
For the Poisson distribution, the MLE of the intensity (here lambda) is just the average of the counts per model horizon. In practice, OpRisk models sometimes take a weighted average, with the weight linearly decreasing over a period of years (see e.g. "LDA at Work" by Aue and Kalkbrener).
In [31]:
lambdas_train = counts_train['counts']/n_tenors_train
lambdas_test = counts_test['counts']/n_tenors_test
bin_tops = [1,2,3,4,5,6,7,8,9,10,15,101]
# Recall that digitize (used later) defines bins by lower <= x < upper
count_tops =[count - 1 for count in bin_tops]
# Calculate bin probabilities from MLE poisson
poi_mle = stats.poisson(lambdas_train)
poi_bins = rlf.bin_probs(poi_mle, bin_tops)
mle_probs = pd.DataFrame({'Count Top': count_tops, 'Probs': poi_bins})
# For later comparison
mle_probs_vals = list(mle_probs.Probs)
For example
In [32]:
import warnings
warnings.filterwarnings('ignore') # TODO: improve slicing to avoid warnings
x_train, y_train, x_test, y_test = rlf.prep_count_data(counts_sim_df, bin_tops)
In [33]:
#from keras.optimizers import SGD
#sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
# rl_train_net is a wrapper for standard keras functionality that
# makes it easier below to optimize hyperparameters
rl_net = rlf.rl_train_net(x_train, y_train, x_test, y_test, [150], \
n_epoch = 300, optimizer = 'adagrad')
proba = rl_net['probs_nn']
In [43]:
#% Convert proba from wide to long and append to other probs
mle_probs_vals = list(mle_probs.Probs)
# TODO: Missing last tenor in nn proba (already in x_test, y_test)
probs_list = []
kl_mle_list = []
kl_nn_list = []
for t in range(proba.shape[0]):
nn_probs_t = proba[t]
true_bins_t = rlf.bin_probs(stats.poisson(lambda_ts[-t_start+t]), bin_tops)
probs_t = pd.DataFrame({'Tenor': t, 'Count Top': count_tops, \
'Probs True': true_bins_t, \
'Probs NN': nn_probs_t, \
'Probs MLE': mle_probs_vals}, \
index = range(t*len(count_tops), \
t*len(count_tops) + len(count_tops)))
probs_list.append(probs_t)
# Calculate KL divergences
kl_mle_list.append(stats.entropy(true_bins_t, mle_probs_vals))
kl_nn_list.append(stats.entropy(true_bins_t, nn_probs_t))
probs = pd.concat(probs_list)
In [44]:
probs_tail = probs[probs.Tenor > 360 ]
gg.ggplot(probs_tail, gg.aes(x='Count Top',weight='Probs True')) \
+ gg.facet_grid('Tenor') \
+ gg.geom_bar() \
+ gg.geom_step(gg.aes(y='Probs MLE', color = 'red')) \
+ gg.geom_step(gg.aes(y='Probs NN', color = 'blue')) \
+ gg.scale_x_continuous(limits = (0,len(count_tops)))
Out[44]:
In [57]:
# KL divergences
kl_df = pd.DataFrame({'Tenor': range(0, t_end+1), \
'KL MLE': kl_mle_list, \
'KL NN': kl_nn_list})
print kl_df.head()
print kl_df.tail()
#%
# Plot KL divergences
gg.ggplot(kl_df, gg.aes(x='Tenor')) \
+ gg.geom_step(gg.aes(y='KL MLE', color = 'red')) \
+ gg.geom_step(gg.aes(y='KL NN', color = 'blue'))
Out[57]:
In [61]:
# More systematically with NN architecture
# Loop over different architectures, create panel plot
neurons_list = [10, 20,50,100, 150, 200]
#neurons_list = [10, 20,50]
depths_list = [1,2,3]
optimizer = 'adagrad'
#%%
kl_df_list = []
for depth in depths_list:
for n_neurons in neurons_list:
nn_arch = [n_neurons]*depth
print("Training " + str(depth) + " layer(s) of " + str(n_neurons) + " neurons")
rl_net = rlf.rl_train_net(x_train, y_train, x_test, y_test, nn_arch, \
n_epoch = 300, optimizer = optimizer)
proba = rl_net['probs_nn']
print("\nPredicting with " + str(depth) + " layer(s) of " + str(n_neurons) + " neurons")
probs_kl_dict = rlf.probs_kl(proba, lambda_ts, t_start, t_end+1, bin_tops, mle_probs_vals)
probs = probs_kl_dict['Probs']
kl_df_n = probs_kl_dict['KL df']
kl_df_n['Hidden layers'] = depth
kl_df_n['Neurons per layer'] = n_neurons
kl_df_n['Architecture'] = str(depth) + '_layers_of_' + str(n_neurons) \
+ '_neurons'
kl_df_list.append(kl_df_n)
#%%
kl_df_hyper = pd.concat(kl_df_list)
In [62]:
# Plot
kl_mle = kl_df_n['KL MLE'] # These values are constant over the above loops (KL between MLE and true distribution)
for depth in depths_list:
kl_df_depth = kl_df_hyper[kl_df_hyper['Hidden layers'] == depth]
kl_df_depth = kl_df_hyper[kl_df_hyper['Hidden layers'] == depth]
kl_depth_vals = kl_df_depth.pivot(index = 'Tenor', columns = 'Neurons per layer', values = 'KL NN')
kl_depth_vals['KL MLE'] = kl_mle
kl_depth_vals.plot(title = 'Kullback-Leibler divergences from true distribution \n for ' \
+ str(depth) + ' hidden layer(s)', \
figsize = (16,10))
In [65]:
# Try again, but now with RMSprop
neurons_list = [10, 20,50]
#neurons_list = [50]
depths_list = [2,3]
optimizer = 'RMSprop'
#%%
kl_df_list = []
for depth in depths_list:
for n_neurons in neurons_list:
nn_arch = [n_neurons]*depth
print("Training " + str(depth) + " layer(s) of " + str(n_neurons) + " neurons")
rl_net = rlf.rl_train_net(x_train, y_train, x_test, y_test, nn_arch, \
n_epoch = 300, optimizer = optimizer)
proba = rl_net['probs_nn']
print("\nPredicting with " + str(depth) + " layer(s) of " + str(n_neurons) + " neurons")
probs_kl_dict = rlf.probs_kl(proba, lambda_ts, t_start, t_end+1, bin_tops, mle_probs_vals)
probs = probs_kl_dict['Probs']
kl_df_n = probs_kl_dict['KL df']
kl_df_n['Hidden layers'] = depth
kl_df_n['Neurons per layer'] = n_neurons
kl_df_n['Architecture'] = str(depth) + '_layers_of_' + str(n_neurons) \
+ '_neurons'
kl_df_list.append(kl_df_n)
#%%
kl_df_hyper = pd.concat(kl_df_list)
# Plot
kl_mle = kl_df_n['KL MLE'] # These values are constant over the above loops (KL between MLE and true distribution)
for depth in depths_list:
kl_df_depth = kl_df_hyper[kl_df_hyper['Hidden layers'] == depth]
kl_df_depth = kl_df_hyper[kl_df_hyper['Hidden layers'] == depth]
kl_depth_vals = kl_df_depth.pivot(index = 'Tenor', columns = 'Neurons per layer', values = 'KL NN')
kl_depth_vals['KL MLE'] = kl_mle
kl_depth_vals.plot(title = 'Kullback-Leibler divergences from true distribution \n for ' \
+ str(depth) + ' hidden layer(s)', \
figsize = (16,10))
Note that with 50 nodes per layer, the KL error for RBM Neural Networks is worse than MLE once we are more than 100 tenors (here, days) from the beginning of the test sample. With more nodes per layer, the results are even worse, though we do not show them here.
We can see by the nn_probs data frame that the probability mass of the neural network shifts to the right, as does the underlying Poisson processes, with its intensity starting at 1 events per tenor / day at - 5 yrs and ending at 4 events per tenor / day at +1 yrs.
Next steps: