In [23]:
#wide screen hack, great for making big plots and have more room for the control panels
from IPython.core.display import HTML
html ='''
<style>
.container { width:100% !important; }
.input{ width:60% !important;
align: center;
}
.text_cell{ width:70% !important;
font-size: 16px;}
.title {align:center !important;}
</style>'''
HTML(html)
Out[23]:
In this tutorial we will analyse the structure of the graph resulting from the correlation matrix of the prices of different currency exchanges. We just need time series data and a few lines of code to map matrices into graphs and calculate its metrics.
We are using a week of hourly data data from different to conduct our analysis. Each time series has also its normalized price, its returns and its logarithmic returns so you don't have to calculate it if you want to use it. In this example we will start just with the normalized price values.
In [ ]:
import numpy as np
import pandas as pd
from scipy import stats
from IPython.display import Image #this is for displaying the widgets in the github repo
from shaolin.dashboards.graph import GraphCalculator
forex_data = pd.read_hdf('gcalculator_data/forex_sample.h5')
forex_data.items,forex_data.minor_axis,forex_data.major_axis
In [25]:
fund = forex_data['fund']
fund.head()
Out[25]:
In this case we will start calculating the correlation and covariance matrices of the series.
In [6]:
matrices = {}
matrices['corr'] = fund.corr()
matrices['cov'] = fund.cov()
matrices['exchange'] = fund.corr()#alpha version hack, needs to be a matrix called exchange. you dont
#need to use it
matrix_panel = pd.Panel(matrices)
In [32]:
matrix_panel
Out[32]:
This data will be used to add additional information to every node of the graph. This information will be included in the GraphCalculator.
In [33]:
def calculate_pdf(x,n=1000):
"""Fits a Gaussian Kernel, resamples n values and returns (X,p(x=X))"""
try:
kernel = stats.gaussian_kde(x,bw_method='scott')
X= kernel.resample(n)
p = kernel(X)
except:
X= x
p = np.ones(len(x))*1.0/len(x)
return X.flatten(),p.flatten()
def shannon_entropy(x,n=1000):
X,p = calculate_pdf(x,n)
ent = p*np.log2(p)
return -ent.sum()/(10*n)# this way we dont get values too big.
#After all, entropy is a purely arbitrary measure ;)
def mean_rets(x):
return x.pct_change().mean()*10e4
def std_rets(x):
return x.pct_change().mean()*10e4
def total_rets(x):
return x.pct_change().sum()
In [34]:
node_metrics = pd.DataFrame(columns=fund.columns)
funcs = [total_rets, np.mean, np.std, shannon_entropy,mean_rets,std_rets]
for fun in funcs:
node_metrics.ix[fun.__name__] = fund.apply(fun)
In [35]:
node_metrics
Out[35]:
The GraphCalculator lets us have complete control over the process of turning a matrix and arbitrary data into a graph. Although every component can be used as a standalone widget it is recommended to use the widget attribute for controlling the GraphCalculator.
Every component can be hidden with the toggle buttons situated at the bottom, and every change you make in the widget will take effect after you click the Calculate button.
In [37]:
gc = GraphCalculator(node_metrics=node_metrics,matrix_panel=matrix_panel)
gc.widget
Image(filename='gcalculator_data/img_1.png')
Out[37]:
This widget is used to select which kind of graph will be calculated from the supplied matrix.
This is how the matrix will be converted into a graph. There are currently three possible options to build a non directed graph:
Full matrix: The matrix will be interpreted as the adjacency matrix of a full connected graph.
MST: The target graph will be a minimum spanning tree constructed and the data matrix.
PMFG: Maximum planary filtered graph, calculated from the data matrix.
Minimum absolute value forn an element of the matrix needed to be taken into account as a valid edge.
Use 1/value instead of the original value of the data matrix.
Node metrics that will be calculated when the graph is constructed
This widget allows us to select which matrix of our matrix panel will be used as an adjacency matrix for the graph. In other words, this will select which data matrix we will convert into a graph.
This widget handles the transformations we will apply to the target data matrix before mapping it into a graph. One, or a succession of the following transformations has to be applied to the matrix.
Raw: Do not apply any transformation.
Scale: Rescale the values of the matrix to match the interval [Rescale min, Rescale max].
Clip: All the values lower than Clip min will be changed for Clip min and all the values greater than Clip max will be changed for Clip max.
Zscore: The matrix will be transformed so its elements have mean 0 and a standard deviation of 1.
The Layout manager is the widget in charge of managing the layout that we will use to draw our graph. It can calculate the layouts of the selected type. Any layout available in the networkx package is included in the LayoutCalculator and any of its parametrs can be tweaked via a widget.
Alpha version disclaimer:
The results of running the GraphCalculator will be sored internally in different object attributes as explained below.
The metrics that are graph related are sored as a dataframe in the node attribute as a DataFrame.
In [14]:
gc.node = gc.node.dropna(axis=1)
gc.node.describe()
Out[14]:
In [15]:
gc.node.head()
Out[15]:
The edge attribute is a pandas Panel with all the information regarding the edges of the graph. This means that it will contain two kinds of metrics:
Edge: All the metrics calculated on the edges of the graph in the form of an adjacency matrix.
Matrix: All the matrices that were not used to build the graph are stored in the form of an adjacency matrix.
In [16]:
gc.edge
Out[16]:
In [17]:
gc.edge.items
Out[17]:
In [18]:
gc.edge['cov']
Out[18]:
In [19]:
gc.edge['edge_betweenness']
Out[19]:
The node metrics that were passed as a parameter are stored in the node_metrics attribute. They are not currently used by the GraphCalculator. This information is currently used only when displaying the node tooltips on a plot.
In [30]:
gc.node_metrics
Out[30]:
All the data is also stored in a networkx graph containing all the DataFrames of the GraphCalculator as node and edge attributes. It is possible to acces the Graph using the G attribute.
In [21]:
gc.G.node
Out[21]:
In [22]:
gc.G.edge
Out[22]:
In [ ]:
In [ ]: