In the introductory_tutorial
we ran through building structural covariance network analyses using scona
🍪.
In this tutorial we'll cover some of the visualisation tools to communicate these results.
Click on any of the links below to jump to that section
plot_degree
report_global_measures
plot_rich_club
In [1]:
import scona as scn
import scona.datasets as datasets
import numpy as np
import networkx as nx
import pandas as pd
from IPython.display import display
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%load_ext autoreload
%autoreload 2
If you're not sure about this step, please check out the introductory_tutorial
notebook for more explanation.
In [2]:
# Read in sample data from the NSPN WhitakerVertes PNAS 2016 paper.
df, names, covars, centroids = datasets.NSPN_WhitakerVertes_PNAS2016.import_data()
# calculate residuals of the matrix df for the columns of names
df_res = scn.create_residuals_df(df, names, covars)
# create a correlation matrix over the columns of df_res
M = scn.create_corrmat(df_res, method='pearson')
# Initialise a weighted graph G from the correlation matrix M
G = scn.BrainNetwork(network=M, parcellation=names, centroids=centroids)
# Threshold G at cost 10 to create a binary graph with 10% as many edges as the complete graph G.
G10 = G.threshold(10)
# Create a GraphBundle object that contains the G10 graph called "original_graph"
bundleGraphs = scn.GraphBundle([G10], ["original_graph"])
# Add ten random graphs to this bundle
# (In real life you'd want more than 10 random graphs,
# but this step can take quite a long time to run so
# for the demo we just create 10)
bundleGraphs.create_random_graphs("original_graph", 10)
plot_degree
The degree of each node is the number of edges adjacent to the node. For example if a node is connected to four other nodes then its degree is 4. If it is connected to 50 other nodes, its degree is 50.
Brain networks are usually "scale-free" which means that their degree distribution follows a power law. You can think of them as having a "heavy tail": there are a small number of nodes that have a large number of connections.
This is in contrast to - for example - an Erdős–Rényi graph where each node is connected to the others with a set, random probability. This graph is often called a binomial graph because the probability of connections follows a binomial (Yes-No) distribution.
One of the first things to check for the structural covariance network analysis with scona
is that our degree distribution shows this pattern.
In [3]:
degrees = dict(G10.degree())
# Print the degree of every 50th node to show what's inside this dictionary
for node_id, degree in list(degrees.items())[::50]:
print ('Node: {:3d} has degree = {:2d}'.format(node_id, degree))
You can see the information for a specific node from the graph itself.
Although note that the degree needs to be calculated. It hasn't been added to the attributes yet.
In [4]:
# Display the nodal attributes
G10.nodes[150]
Out[4]:
scona
has a command for that.
Lets go ahead and add the degree to the nodal attributes....along with a few other measures.
In [5]:
# Calculate nodal measures for graph G10
G10.calculate_nodal_measures()
# Display the nodal attributes
G10.nodes[150]
Out[5]:
Look at all that information!
We only want to visualise the degree distribution at the moment though.
In [6]:
# import the function to plot network measures
from scona.visualisations import plot_degree_dist
In [7]:
plot_degree_dist(G10)
In [8]:
plot_degree_dist(G10, binomial_graph=False)
You can save this figure in any location.
You can do that by passing a file name and (optional) directory path to the figure_name
option.
If you don't set a directory path the figure will be saved in the local directory.
For this tutorial we'll save the output in a figures
folder inside this tutorials
directory.
In [9]:
plot_degree_dist(G10, binomial_graph=False, figure_name="figures/DegreeDistribution.png")
☝️ Did you see an error message?
The code checks to see if the directory that you want to save your figure to actually exists. If it doesn't then it creates the directory, but gives you a little warning first to check that it isn't coming as a surprised (for example if you have tried to save your figure in the wrong place!)
We have the tutorials/figures
directory specifically ignored in this project so we shouldn't ever see changes there.
Run the cell above again. You won't see the error the second time because the folder already exists! You made it the first time you plotted the distribution 🚀.
Note that if you don't pass a file ending the file will be saved as a png
by default.
If you add a file extension allowed by matplotlib
(eg .jpg
, .svg
, .pdf
etc) then the figure will be saved in that format.
In [10]:
plot_degree_dist(G10, color=["red", "black"])
report_global_measures
One of the first things we want to know are how the global attributes of the network compare to those of random networks.
Specifically we'll calculate:
a
: assortativity C
: clustering E
: efficiency L
: shortest path M
: modularitysigma
: small world coefficientand plot a bar chart that compares the real network to the random graphs.
In [11]:
# Calculate the global measures
bundleGraphs_measures = bundleGraphs.report_global_measures()
# Show the dataframe so we can see the measures
display(bundleGraphs_measures)
Now you have everything to plot the network measures of the BrainNetwork Graph and compare these measures to random measures values obtained from 10 random graphs stored inside the graph bundle bundleGraphs
.
In [12]:
# import the function to plot network measures
from scona.visualisations import plot_network_measures
In [13]:
plot_network_measures(bundleGraphs, original_network="original_graph")
You'll probably want to save the beautiful figure you've made!
You can do that by passing a file name and (optional) directory path to the figure_name
option.
If you don't set a directory path the figure will be saved in the local directory.
For this tutorial we'll save the output in a figures
folder inside this tutorials
directory.
For fun, we'll also adjust the colours to make the real network orange (#FF4400
) and the random network turquoise (#00BBFF
).
In [14]:
plot_network_measures(bundleGraphs, "original_graph",
figure_name="figures/NetworkMeasuresDemo",
color=["#FF4400", "#00BBFF"])
In [15]:
plot_network_measures(bundleGraphs, "original_graph",
figure_name="figures/NetworkMeasuresDemoNoLegend.svg",
show_legend=False)
In [16]:
# Create a new graph bundle
realBundle = scn.GraphBundle([G10], ["original_graph"])
plot_network_measures(realBundle,
original_network = "original_graph",
color=["green"])
The variance of measures obtained from random graphs is - by default - shown as the 95% confidence interval.
They're calculated by bootstrapping the random graphs. There's more information in the seaborn documentation if you're curious.
But you don't have to calculate them. You can plot the standard deviations instead if you'd prefer. (These are a bit larger than the 95% confidence intervals so they're a bit easier to see in the plot below.)
In [17]:
plot_network_measures(bundleGraphs,
original_network="original_graph",
ci="sd")
Alternatively you could show the 99% confidence interval.
In [18]:
plot_network_measures(bundleGraphs, original_network="original_graph",
ci=99)
You can't publish results with 10 random graphs. These don't give meaningful variations. So let's add 90 more random graphs.
(This still isn't enough, but much better than 10! We'd recommend that you run 1000 random graphs for publication quality results.)
This takes some time (around 5 minutes) so the cell below is commented out by default.
Remove the #
at the start of each of the lines below to run the commands yourself.
In [19]:
#bundleGraphs.create_random_graphs("original_graph", 90)
#print (len(bundleGraphs))
Congratulations! 🎉
You created additional 90 random graphs, to give you a total of 100 random graphs and 1 real graph, and you managed to answer to some of your emails while waiting.
Here's a beautiful plot of your network measures with 95% confidence intervals....which you can't see because the random networks are all so similar to each other 🤦
In [20]:
plot_network_measures(bundleGraphs, original_network="original_graph")
Brain networks often have a "rich club". The rich club is a measure of the extent to which nodes in the graph with the highest degree (the largest number of connections) are preferentially connected to other highly connected nodes.
This network measure is fantastically named because a rich club exists in so many other networks too.
In the brain, the "hubs" of the network - the high degree nodes - may also be the ones coordinating complex actions, and so they need to connect with each other to efficiently send messages to different specialised regions.
One thing that's a little confusing is that
The rich club coefficient is calculated as a function of a "cut off" degree.
$\phi(k) = \frac{2 E_k}{N_k (N_k - 1)}$
So the first thing to do is calculate the rich club coefficient ($\phi$) for all possible values of $k$.
In [21]:
# Calculate the rich club coefficients and then write them a data frame
rich_club_df = bundleGraphs.report_rich_club()
# Show the first few entries in the data frame
display(rich_club_df.iloc[:5, :6])
# Show the last few entries in the data frame
display(rich_club_df.iloc[-5:, :6])
The index column in the data frame above is the cut off degree value.
You may be wondering where 105
comes from!
The answer is in figuring out what the smallest and most highly connected sub-graph would be. Logically that would be the connection between the two nodes that have the highest and second highest degree.
Take a look at the top five degree values for theoriginal_graph
:
In [22]:
# Show the degree of the 5 most connected nodes
display (G10.report_nodal_measures().sort_values(by='degree',
ascending=False).loc[:, ['name', 'degree']].head())
The two most connected nodes in our graph have degree 126
and 106
.
So the maximum degree cut off must be 105
otherwise we'd be left with a single node by themselves!
You can see from the data frames above that for very low degree cut-off values the rich club coefficient in the real graph is very similar to that of the random graphs.
When we only consider the highest possible connections, the random and the real graphs have either a rich club coefficient of exactly 1 (the maximum value) or 0 (the minimum value). This comes from the fact that we've maintained the degree distribution for every node, but switched which nodes the edges connect.
Both of these observations makes sense!
The most interesting information is therefore in the middle of the dataframe! And that's where our visualisation is really helpful.
In [23]:
# import the function to plot rich club values
from scona.visualisations import plot_rich_club
In [24]:
plot_rich_club(bundleGraphs, original_network="original_graph")
Passing a file name and (optional) directory path to the figure_name
option.
If you don't set a directory path the figure will be saved in the local directory.
In this case we've given the figure a .jpg
suffix.
(By default the figures are usually saved as .png
.)
For fun, we'll also adjust the colours to make the real network cornflower blue (#6495ed
) and the random network coral (#ff7f50
). (Thank you Alex, Liberty and Chris H at Neurohackademy 2019 for the suggestions!)
In [25]:
plot_rich_club(bundleGraphs,
original_network="original_graph",
figure_name="figures/RichClub.jpg",
color=["#6495ed", "#ff7f50"])
In [26]:
plot_rich_club(bundleGraphs,
original_network="original_graph",
figure_name="figures/RichClub.jpg",
color=["#6495ed", "#ff7f50"],
show_legend=False)
The axes always start at (0,0) and by default the maximum values are set according to the data. The maximum degree on the x axis and 1.1 on the y axis. (The maximum rich club value is 1.0 but adding a little extra space at the top looks good.)
You aren't likely to need to change the y axis values, but you might have a time when you want to specifically set the maximum value for the x axis, for example if you want to compare two networks on two different plots and have them have the same dimensions.
scona
has an option for that!
In [27]:
plot_rich_club(bundleGraphs,
original_network="original_graph",
x_max=130)
Yeah, it looks slightly strange that the rich club curve stops early, but at least you'd be able to compare it to a different plot.