In [1]:

    
%load_ext autoreload
%autoreload 2



In [74]:

    
import warnings
warnings.filterwarnings('ignore')

Networks meet Finance in Python

Miguel Vaz

Amsterdam, 13$^{th}$ of March 2016

http://pydata.org/amsterdam2016/schedule/presentation/34/ https://github.com/mvaz/PyData2016-Amsterdam

About me

@migueljvaz

Finance pays my bills:

Risk Management (Consulting)
- d-fine GmbH (2012 - )
Software Developer
- 360 Trading Networks (2010-2012)

Before that:

PhD in Robotics at the Honda Research Institute
- once I saw ASIMO naked (!)

Currently

I'm interested in applications of networks to finance, in particular for risk management.

Modelling e.g. concentration risk, even information for investment strategies.

Question arose from several more skeptic colleagues:

why model it with networks?
how does it work?

I

networks

Why?

Networks give us a way to quantify and reason about concepts that are already familiar to us.

"Frankfurt airport is an international hub"

"Ebola is a contagious disease"

"Lehman was too central to be let fail"

Networks are great for representing structure

... and structure reveals much

http://www.bonkersworld.net/organizational-charts/

Beautiful representations!

http://inmaps.linkedinlabs.com/network

Financial networks: debt exposure

Aggregate numbers at the national level

Central Clearing

Central clearing tries to mitigate counterparty risk by having one central actor that assumes the counterparty risk: the Central Counterpary (CCP).

Obligation for certain types of contracts introduced by the "European Market Infrastructure Regulation (EMIR)", and "Dodd–Frank Wall Street Reform and Consumer Protection Act".

Central Clearing

Changes the network structure of the system

Structure of counterparty relationships exposes stock manipulation

Trading networks, abnormal motifs and stock manipulation, Quantitative Finance Letters, 1, 1-8 Zhi-Qiang Jiang , Wen-Jie Xie , Xiong Xiong , Wei Zhang , Yong-Jie Zhang & Wei-Xing Zhou (2013) http://dx.doi.org/10.1080/21649502.2013.802877

Large credit portfolio: equity dependencies

Exposure to some obligors may be larger than could be expected, because the equity structure implies a loss to a given obligor if another defaults.

Diagram only considers equity structure.

Some vocabulary

A graph is a set of nodes and edges between them

$$G = (V, E)$$

The adjacency matrix of $G$ is given by $$A_{i,j}:= \begin{cases} w(i,j) & \mbox{if}\ i \neq j \mbox{and}\ v_i \mbox{ is adjacent to } v_j \\ 0 & \mbox{otherwise} \end{cases} $$

Undirected, uweighted graph

Weighted, directed graph



In [39]:

    
from IPython.display import  display, display_html



In [40]:

    
from kmeans_aux import *



In [85]:

    
import matplotlib.pyplot as plt
import seaborn as sns
from bokeh.plotting import output_notebook, figure, show, ColumnDataSource
TOOLS="resize,crosshair,pan,wheel_zoom,box_zoom,reset,tap,previewsave,box_select,poly_select,lasso_select"
output_notebook()









    





    
        
        Loading BokehJS ...

A short break from finance: using local structure for clustering

Let's assume a different dataset, with points selected from either of the following moon-shaped distributions. The distribution cannot be described by centroids.



In [111]:

    
X, true_labels = generate_two_moons()
p = figure(tools=TOOLS);
x,y = [list(t) for t in zip(*X)]
p.scatter(x, y, color='black', radius=0.02);
show(p);

K-Means

A naïve density based approach using a method like K-Means does not capture the non-linearity of the dataset structure



In [91]:

    
k_means_2_labels, k_means_2_cluster_centres = do_kmeans(X, n_clusters=2, n_init=10)
src2 = create_datasource(X, k_means_2_labels, len(k_means_2_cluster_centres))



In [112]:

    
p = figure(tools=TOOLS);
p.scatter('x', 'y', color='fill_color', source=src2);
# alpha='alpha', radius='radius', 
show(p);

Spectral Clustering

Takes only into account the local structure of the data.

Makes use of the concept of similarity, more general than that of distance.

Uses important results on the eigenvectors of the Laplacian of the similarity matrix, and clusters on a lower dimensionsional space.

The laplacian of the graph is given by $$ L = D - A $$ where $D$ is the degree matrix $$D_{i,j}:= \begin{cases} \deg(v_i) & \mbox{if}\ i = j \\ 0 & \mbox{otherwise} \end{cases} $$ where $\deg(v_i)$ is degree of the vertex i.

The eigenvectors of the graph Laplacian are very interesting!

Unnormalized spectral clustering

Input:

Disimilarity matrix $S \in R^{n \times n}$
number $k$ of clusters to construct

Algorithm:

Construct a similarity graph $A$
Let $W$ be its weighted adjacency matrix.
Compute the unnormalized Laplacian $L = D - A$
Compute the first $k$ eigenvectors $u_1, \ldots, u_k$ of $L$
Let $U \in R^{n \times k}$ be the matrix containing the vectors $u_1, \ldots, u_k$ as columns
For $i = 1,\ldots, n$, let $y_i \in R^k$ be the vector corresponding to the i-th row of U
Cluster the points $(y_i)_{i=1,\ldots,n}$ in $R^k$ with the k-means algorithm into clusters $C_1,\ldots,C_k$

Output:

Clusters $A_1,\ldots,A_k$ with $A_i = \{ j\, |\, y_j \in C_i\}$

Similarity matrix / graph

Following similarity graphs can be defined

$\epsilon-$neighborhood graph connects all points whose pairwise distances are smaller than $\epsilon$

$k$-nearest neighbor graphs connects $v_i$ and $v_j$ if $v_j$ is among the $k$ nearest points of $k_i$

fully connected graph outsources the locality modelling to the similarity function, such as $s(x_i, x_j) = \exp\left( - \frac{|| x_i - x_j||^2}{ 2 \sigma^2} \right)$

No full metric is required E.g. it is possible to compare:

time-series based on their correlation,
probability distributions,
or strings on edit distance
others



In [93]:

    
# Turn the distance into a similarity by applying a Gaussian kernel
distance = euclidean_distances(X, X, squared=True)
sig = 0.5
similarity = np.exp(-distance/sig)



In [110]:

    
%matplotlib notebook
sns.clustermap(similarity);



In [96]:

    
A = create_adjacency_matrix(similarity, num_neighbours=10)

# Calculate the degree - the sum of all incoming weights to a vertex
D = np.diag(np.sum(A, axis=0))

# Unnormalised Laplacian Matrix
L = D - A



In [97]:

    
# Perform an Eigenvalue deccomposition of the Laplacian

from scipy.linalg import eig
[eig_vals, eig_vecs] = eig(L)

sorted_inds = np.argsort(eig_vals.real, axis=0)



In [98]:

    
# Perform a dimensionality reduction

k = 3
F = eig_vecs[:,sorted_inds[0:k]]

# Do k-means on the reduced space
k_means_3_labels, k_means_3_cluster_centers = do_kmeans(F, n_clusters=3)
ind_3 = np.argsort(k_means_3_labels)

Eigenvectors of Laplacian



In [101]:

    
%matplotlib notebook
plt.bar(range(10), eig_vals.real[sorted_inds[0:10]]);
# plt.imshow(similarity[ind_3].T[ind_3])









    














    











    Out[101]:





<Container object of 10 artists>



In [102]:

    
from networkx.generators import random_graphs
from networkx.generators import classic



In [103]:

    
import networkx as nx
g = nx.from_numpy_matrix(A)



In [104]:

    
g0 = random_graphs.empty_graph()
g0.add_nodes_from(g)
for v in g.edges_iter():
    g0.add_edge(v[0], v[1])



In [105]:

    
from graphs import *



In [113]:

    
p_g0 = figure(toolbar_location='left', tools=TOOLS)

ds_g0, ds_g0 = prepare_datasources(g0, k=.05, iterations=150)
plot_graph(p_g0, ds_g0, ds_g0)

show(p_g0);

Results with $k=3$ clusters



In [107]:

    
# k_means_2_labels, k_means_2_cluster_centres = do_kmeans(X, n_clusters=2, n_init=10)
src3 = create_datasource(X, k_means_3_labels, 3) #len(k_means_3_cluster_centres))



In [114]:

    
p = figure(toolbar_location='left', tools=TOOLS);
p.scatter('x', 'y', color='fill_color', source=src3);
show(p);

Other applications

Outlier detection

Construct a sparse similarity graph by applying a threshold r on the distance.

Determine the connected components.

Outliers are components with a number of nodes smaller than a threshold p

More info in https://github.com/dmarx/Topological-Anomaly-Detection

Back to finance: can we infer structure from the (stock) market?

Correlation of several financial time series

(picture taken from the paper)

$$Ret_t = \frac{P_t}{P_{t-1}}$$

$$ R_t = \frac{Ret_t - \overline{Ret}}{\sigma_{Ret}} $$

Then compute the windowed pairwise correlation of each pair of assets

$$X = R_{i,t_0\ldots t_k}$$$$Y = R_{j,t_0\ldots t_k}$$

$$\rho(X,Y)=\frac{Cov(X,Y)}{\sigma_X \sigma_Y} = \frac{E[(X- \mu_X)(Y-\mu_Y)]}{\sigma_X \sigma_Y}$$

$$ \rho_{XY} =\frac{\sum ^n _{i=1}(X_i - \bar{X})(Y_i - \bar{X})}{\sqrt{\sum ^n _{i=1}(X_i - \bar{X})^2} \sqrt{\sum ^n _{i=1}(Y_i - \bar{Y})^2}} $$

Eigenvalues of the correlation matrix (DAX 30)

Can we put this structure to use?

Use structure from correlation matrix to code diversity

Identify baskets of assets that are diversified (e.g. 20, 30 assets)

Spread of risk across financial markets: better to invest in the peripheries F. Pozzi, T. Di Matteo and T. Aste 2013, Nature Scientific Reports 3, Article number: 1665 doi:10.1038/srep01665 http://www.nature.com/srep/2013/130416/srep01665/full/srep01665.html

IPython notebooks and material in https://github.com/mvaz/PyData2014-Berlin

If you preprocess your data such that each row has $\mu=0$ and $\sigma = 1$ (which disallows constant vectors!), then correlation reduces to cosine:

$$Cor(X,Y)=\frac{Cov(X,Y)}{\sigma_X \sigma_Y} = \frac{E[(X- \mu_X)(Y-\mu_Y)]}{\sigma_X \sigma_Y} = E[X Y]= \frac{1}{dim<X,Y>}$$

Under the same conditions, squared Euclidean distance also reduces to cosine:

$$d^2Euclid(X,Y)= \Sigma_i (X_i-Y_i)^2= \Sigma_i X_i + \Sigma_i Yi - 2 \Sigma_i X_i Y_i = 2 <X,Y>$$

[other notebook]

The similarity matrix can be seen as a graph

Different methods for sparsifying the matrix were tested: a topological argument (Planar Maximally Filtered Graph) was used.

In this paper the authors were interested in centrality, instead of community structure. The gist of the paper is by choosing assets in the periphery, the portfolios become more diversified.

(picture taken from the paper)

Methodology

For each point in time:

Build asset baskets:

market portfolio (equally weighted for all assets)
random picked stocks
currently best performing
PMFG portfolio

Construct portfolios

Equally weighted
Optimized (Markowitz)

Evaluate SNR for periods following construction $$ \frac{\overline{r}}{s} $$

Equally weighted portfolios

Risk is well distributed across the network: no correlation between centrality and signal-to-noise ratio.
PMFG portfolios perform well

Markowitz portfolios (no short selling)

Market portfolio profits the most from Markowitz.

Central baskets profit less, which means that the diversification is already very good.

Large central portfolios already very good.

Take-home message(s)

Structure is important. Network theory helps you

groups in your data (community detection, outlier detection)
measure what is central

Lots of structure in finance

Local methods can help reveal the structure of your data.

You don't always need a distance metric. Often, a similarity function is enough, e.g.

correlation between time series
human label
co-occurrence

Thanks for listening

Part 4: back to clouds (t-SNE)

Goal is to embedd high-dimensional objects into a lower-dimensional space (e.g. 2D).

Sophisticated modelling of the effects of the local structure onto the low dimensional embedding.

More info :

https://github.com/oreillymedia/t-SNE-tutorial

https://lvdmaaten.github.io/tsne/

Local structure

On high-dimensional space

Given a set of N high-dimensional objects $\mathbf{x}_1, \dots, \mathbf{x}_N$, t-SNE first computes probabilities $p_{ij}$ that are proportional to the similarity of objects $\mathbf{x}_i$ and $\mathbf{x}_j$, as follows:

$$p_{j|i} = \frac{\exp(-\lVert\mathbf{x}_i - \mathbf{x}_j\rVert^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-\lVert\mathbf{x}_i - \mathbf{x}_k\rVert^2 / 2\sigma_i^2)}$$

Symmetric distance $$p_{ij} = \frac{p_{j|i} + p_{i|j}}{2N}$$

On lower-dimensional space

Heavy tailed t-Student distribution helps model dissimilar points far in a map

$$q_{ij} = \frac{(1 + \lVert \mathbf{y}_i - \mathbf{y}_j\rVert^2)^{-1}}{\sum_{k \neq l} (1 + \lVert \mathbf{y}_k - \mathbf{y}_l\rVert^2)^{-1}}$$

Global minimization of divergence

Goal is to minimize the Kuhlback-Leibler divergence between two probability distributions

$$KL(P||Q) = \sum_{i \neq j} p_{ij} \, \log \frac{p_{ij}}{q_{ij}}$$

helps preserve local structure

Optimization per stochastic gradient descent

The gradient turns out to be easy is easy to compute

$$ \frac{\partial \, KL(P || Q)}{\partial y_i} = 4 \sum_j (p_{ij} - q_{ij}) g\left( \left| x_i - x_j\right| \right) u_{ij},\\ \\ \quad \textrm{where} \, g(z) = \frac{z}{1+z^2} $$

Here, $u_{ij}$ is a unit vector going from $y_j$ to $y_i$. This gradient expresses the sum of all spring forces applied to map point $i$.

Adaptations to big data sets already available using the Barns Hut



In [ ]:

Practical considerations on Spectral Clustering

Choice of $k$

More of an art than a science. Run the algorithm with several configurations and try to minimize the 'inertia' $$\sum_{i=0}^{n} min_{\mu_j \in C} (||x_j - \mu_i||^2)$$ , i.e. the sum of the intra-cluster distances. The (related) concept of "silhouette" is also important.

Choice of Laplacian

No theoretical argument for each of the versions.

Data size

Not too many different clusters. Better for non-linear clusters.

Similarity matrix is quadratic on the number of samples... no "BigData"

It might also be helpful to perform a dimensionality reduction first with e.g. PCA.

More info

http://www.cs.yale.edu/homes/spielman/561/2009/lect02-09.pdf