igraph
and networkx
provide layouts based on different layout algorithms, and the networks generated by these Python libraries are plotted via cairo
, respectively
matplotlib
. Neither igraph
nor networkx
can handle and plot large networks.
An appropriate layout for large networks is provided by graphviz
. Moreover graphviz
and pygraphviz
, a Python interface to graphviz
, have graph layouts that are not implemented in igraph
, and networkx
.
Here is illustrated the so called Price network, created through a preferential attachement, and plotted with graphviz
layout sfdp
, destinated to large networks.
Using Python Plotly we can process network data with one or more of the above mentioned libraries, assign any layout in the collection of all their layouts, and get an interactive plot.
In this Jupyter notebook we illustrate the complementary use of networkx
, pygraphviz
(with graphviz
in the backend), and Plotly, to generate the radial tree representing the evolution of women players at U.S. Open 2016.
US Open is a tennis tournament attended by $N=128=2^7$ women players. It is a balanced knokout tournament, consisting in 7 rounds. A few days before the start the organizers release a draw, that points out the pairs that play in the first round. Then succesivelly, the $(2k+1)^{th}$ and $(2k+2)^{th}$ winners in a round $r\geq 1$ play together in the $(r+1)^{th}$ round, $k=0, ..., 2^{7-r}$. After the $6^{th}$ round remain only two players and the winner of their final match is the winner of the Grand Slam. The whole process is visualized by WTA as a balanced binary tree, having as root the tournament's winner, and the players in all rounds are the tree nodes. The parent of a pair $({2k+1}, 2k+2)$ playing in a round r is the winner of the corresponding match.
To define and plot the corresponding binary tree we proceed as follows:
networkx.balanced_tree
function creates the binary tree G.pygraphviz
graph H, with the radial layout, called twopi
.
In [1]:
import networkx as nx
import pygraphviz as pgv
import pandas as pd
import numpy as np
from ast import literal_eval
In [2]:
import platform
print(f'Python version: {platform.python_version()}')
print(f'pygraphviz version: {pgv.__version__}') #pygraphviz version 1.5 for python 3
Read the Excel file:
In [3]:
df = pd.read_excel("Data/US-Open-2016.xls")
df.loc[:6, :]#print tree levels 0, 1, 2
Out[3]:
In [8]:
N = len(df)
N
Out[8]:
In [9]:
labels = list(df['name'])
Define the tree $G$ as a networkx
graph:
In [10]:
G = nx.balanced_tree(2, 6)
V = G.nodes()
E = G.edges()
pygraphviz
tree H, and its layout are defined below:
In [11]:
H = pgv.AGraph(strict=True, directed=False)
H.add_nodes_from(V)
H.add_edges_from(E)
H.layout(prog='twopi')
The function position
extracts the node coordinates of the pygraphviz
tree $H$:
Process the above defined network data to get the corresponding Plotly plot of the binary tree:
In [12]:
import plotly.plotly as py
import plotly.graph_objs as go
The Plotly version of a graph of edges E and node coordinates pos
is returned by the following function:
In [13]:
def plotly_graph(E, pos):
# E is the list of tuples representing the graph edges
# pos is the list of node coordinates
N = len(pos)
Xn = [pos[k][0] for k in range(N)]# x-coordinates of nodes
Yn = [pos[k][1] for k in range(N)]# y-coordnates of nodes
Xe = []
Ye = []
for e in E:
Xe += [pos[e[0]][0],pos[e[1]][0], None]# x coordinates of the nodes defining the edge e
Ye += [pos[e[0]][1],pos[e[1]][1], None]# y - " -
return Xn, Yn, Xe, Ye
Get node positions in the tree H:
In [14]:
pos = np.array([literal_eval(H.get_node(k).attr['pos']) for k in range(N)])
#Rotate node positions with pi/2 counter-clockwise
pos[:, [0, 1]] = pos[:, [1, 0]]
pos[:, 0] =- pos[:,0]
Define the Plotly objects that represent the binary tree, and the finalist routes to the last match:
In [15]:
Xn, Yn, Xe, Ye=plotly_graph(E, pos)
edges = go.Scatter(x=Xe,
y=Ye,
mode='lines',
line=dict(color='rgb(160,160,160)', width=0.75),
hoverinfo='none'
)
nodes = go.Scatter(x=Xn,
y=Yn,
mode='markers',
name='',
marker=dict(size=8,
color='#85b6b6',
line=dict(color='rgb(100,100,100)', width=0.5)),
text=labels,
hoverinfo='text'
)
In [16]:
Kerber_path = [0, 2, 6, 14, 30, 62, 126]
Pliskova_path = [1, 4, 10, 21, 43, 87]
colorKP = ['#CC0000']*len(Kerber_path) + ['rgb(65, 64, 123)']*len(Pliskova_path)# set color for both paths
In [17]:
XK = [pos[k][0] for k in Kerber_path]
YK = [pos[k][1] for k in Kerber_path]
XP = [pos[k][0] for k in Pliskova_path]
YP = [pos[k][1] for k in Pliskova_path]
finalists = go.Scatter(x=XK+XP,
y=YK+YP,
mode='markers',
marker=dict(size=10,
color=colorKP,
line=dict(color='rgb(100,100,100)', width=0.5),
),
text=['Kerber']*len(Kerber_path) + ['Pliskova']*len(Pliskova_path),
hoverinfo='text')
We attach to each player in the second round its name aligned radially with respect to the corresponding node position. The function set_annotation
places the names at
their position, with text displayed at a given angle:
In [18]:
def set_annotation(x, y, anno_text, textangle, fontsize=11, color='rgb(10,10,10)'):
return dict(x= x,
y= y,
text= anno_text,
textangle=textangle,#angle with horizontal line through (x,y), in degrees;
#+ =clockwise, -=anti-clockwise
font= dict(size=fontsize, color=color),
showarrow=False
)
Define Plotly plot layout:
In [19]:
layout = go.Layout(
title="U.S. Open 2016<br>Radial binary tree associated to women's singles players",
font=dict(family='Balto'),
width=650,
height=650,
showlegend=False,
xaxis=dict(visible=False),
yaxis=dict(visible=False),
margin=dict(t=100),
hovermode='closest')
The node positions returned by pygraphviz
radial layout are not located on a circle centered at origin.
That is why we calculate the circle center and radius:
In [20]:
center = np.array([(np.min(pos[63:, 0])+np.max(pos[63:, 0]))/2, (np.min(pos[63:, 1])+np.max(pos[63:, 1]))/2])
radius = np.linalg.norm(pos[63,:]-center)
Compute the text angle:
In [21]:
angles=[]
for k in range(63, 127):
v = pos[k,:]-center
angles.append(-(180*np.arctan(v[1]/v[0])/np.pi))
In [22]:
pos_text = center+1.2*(pos[63:, :]-center)# text position
annotations = []
#define annotations for non-finalist players
for k in range(63, 87):
annotations += [set_annotation(pos_text[k-63][0], pos_text[k-63][1], labels[k], angles[k-63])]
for k in range(88, 126):
annotations += [set_annotation(pos_text[k-63][0], pos_text[k-63][1], labels[k], angles[k-63])]
#insert colored annotations for the finalists, Pliskova and Kerber
annotations += [set_annotation(pos_text[87-63][0], pos_text[87-63][1],
'<b>Pliskova</b>',
angles[87-63],
color='rgb(65, 64, 123)'),
set_annotation(pos_text[126-63][0], pos_text[126-63][1],
'<b>Kerber</b>',
angles[126-63],
color='#CC0000')]
annotations += [set_annotation(center[0]-0.15, center[1]+45,
'<b>Winner<br>A. Kerber</b>',
0, fontsize=12, color='#CC0000')]
Append the annotation that displays data source:
In [23]:
data_anno_text="Data source: "+\
"<a href='http://www.wtatennis.com/SEWTATour-Archive/posting/2016/905/MDS.pdf'> [1] </a>,"+\
" Excel file: "+\
"<a href='https://github.com/empet/Plotly-plots/blob/master/Data/US-Open-2016.xls'> [2] </a>"
annotations.append(dict(
showarrow=False,
text=data_anno_text,
xref='paper',
yref='paper',
x=0,
y=-0.1,
xanchor='left',
yanchor='bottom',
font=dict(size=12 )
))
layout.annotations = annotations
In [24]:
data = [edges, nodes, finalists]
fig = go.FigureWidget(data=data, layout=layout)
In [ ]:
#py.sign_in('empet', 'my_api_key')
py.plot(fig, filename='US-Open-16')
In [25]:
from IPython.display import IFrame
IFrame('https://plot.ly/~empet/14005', width=650, height=650)
Out[25]:
In [ ]: