In [1]:
%pylab inline
In [2]:
from pprint import pprint
import matplotlib.pyplot as plt
In [3]:
import networkx as nx
from collections import defaultdict
import warnings
from tethne import networks
from tethne.utilities import _iterable
Now that we can index our Corpus
temporally using the slice
method, we can start to build time-variant networks. In this workbook we'll build a time-variant coauthor network using a GraphCollection
.
We'll use the same WoS dataset that we've used in previous workbooks. Load it up using read()
.
In [4]:
from tethne.readers.wos import read
datadirpath = '/Users/erickpeirson/Projects/tethne-notebooks/data/wos'
MyCorpus = read(datadirpath)
To generate a time-variant network, we must first slice our Corpus
temporally. Many research questions about social networks like coauthor networks involve how nodes recruit new neighbors. To look at this in the context of our dataset, we'll want to keep old nodes and edges around even if they don't show up in more recent slices. So we'll do a simple 1-year time-period slice, but with cumulative=True
.
Note: In early versions of Tethne, slice()
performed indexing on the spot, and stored the results in memory. As of v0.7.x, slice()
returns a generator over slices. slice()
also no longer operates on non-date fields, since this functionality is already provided by index()
.
In [5]:
MyCorpus.slice(window_size=3, cumulative=True)
Out[5]:
As the name suggests, a GraphCollection
is a container for graphs. The GraphCollection
class gives us some convenient methods for generating and interrogating time-variant networks. The GraphCollection.build
method allows us to build a series of graphs from our Corpus
-- one graph per slice -- all in one step.
First, import GraphCollection
directly from the tethne
package.
In [6]:
from tethne import GraphCollection
In [7]:
from tethne.networks import coauthors
The following code builds a collection of co-authorship networks, using a 3-year cumulative time-window.
In [8]:
MyGraphCollection = GraphCollection(MyCorpus, coauthors, slice_kwargs={'window_size': 5, 'step_size': 2})
We passed the GraphCollection
constructor three arguments in the code-block above:
Corpus
instance;coauthors
);slice()
.We can use the plot_node_distribution
and plot_edge_distribution
methods to see how many nodes and edges are in the graph at each point in time (technically, how many nodes and edges are in each graph, each of which corresponds to a slice in the Corpus
).
In [9]:
node_distribution = MyGraphCollection.node_distribution()
edge_distribution = MyGraphCollection.edge_distribution()
plt.figure(figsize=(15, 4))
plt.subplot(121)
plt.bar(node_distribution.keys(), node_distribution.values())
plt.xlim(min(node_distribution.keys()), max(node_distribution.keys()))
plt.ylabel('Number of nodes per graph')
plt.subplot(122)
plt.bar(edge_distribution.keys(), edge_distribution.values())
plt.xlim(min(edge_distribution.keys()), max(edge_distribution.keys()))
plt.ylabel('Number of edges per graph')
plt.show()
The GraphCollection
makes it easy to apply algorithms from NetworkX across the whole time-variant network (i.e. to all graphs in the GraphCollection
).
The method analyze
applies an algorithm to all of the graphs in the GraphCollection
.
In [12]:
dc = MyGraphCollection.analyze('degree_centrality')
In [16]:
dc[1986].items()[20:30]
Out[16]:
In [17]:
bcentrality = MyGraphCollection.analyze('betweenness_centrality')
Some algorithms, like "degree_centrality" and "betweenness_centrality" return a value for each node in each graph. In that case, the nodes in each graph are updated with those values.
In [25]:
MyGraphCollection[2008].nodes(data=True)[15:17] # Shows the attributes for two of the nodes in the 2008 graph.
Out[25]:
The method plot_attr_distribution
can help to visualize the results of an algorithm across the graphs in the GraphCollection
. In the example below, attr='degree_centrality'
selects the degree_centrality attribute, etype='node'
indicates that the attribute belongs to nodes (not edges), and stat=mean
specifies that the Python function mean
should be applied to the collection of values in each graph.
We can use node_history
to look at how the attribute of a particular node changes across graphs. In the example below, the specified node appears first in 2008, and its centrality increases through 2011.
In [36]:
node_id = MyGraphCollection.node_lookup[(u'WARWICK', u'SI')]
warwick_centrality = MyGraphCollection.node_history(node_id, 'degree_centrality')
In [40]:
warwick_centrality.items()[:20] # First 20 years.
Out[40]:
In [34]:
plt.plot(warwick_centrality.keys(), warwick_centrality.values(), 'ro')
plt.ylabel('Degree Centrality')
plt.show()
Note that in the example above we had to convert our author name, (u'WARWICK', u'SI')
, to an id, node_id
. That's because the GraphCollection
indexes all of the nodes so that we can track them across graphs. The index is stored in GraphCollection.node_index
:
In [46]:
MyGraphCollection.node_index.items()[0:10] # The first ten nodes in the index.
Out[46]:
In [47]:
MyGraphCollection.node_index[2467] # Get the name of a specific node.
Out[47]:
To look up the index of a node based on its name (e.g. an author name), use GraphCollection.node_lookup
:
In [48]:
MyGraphCollection.node_lookup.items()[0:10] # The first ten nodes in the lookup table.
Out[48]:
In [50]:
MyGraphCollection.node_lookup[(u'SEXTON', u'JASON P')] # Get the index of a specific node.
Out[50]:
Cytoscape provides support for Dynamic XGMML, which is a network file format that supports time-variant graphs. You can write DXGMML using the to_dxgmml
function in the writers.collection
module.
In [51]:
from tethne.writers import collection
In [53]:
outpath = '/Users/erickpeirson/Projects/tethne-notebooks/output/my_dynnetwork.xgmml'
collection.to_dxgmml(MyGraphCollection, outpath)
Here's a snapshot from around 2008. Node size is mapped to betweenness centrality.
Caution: Cytoscape still has a hard time with large dynamic graphs. This is mostly useful for heuristic purposes, or small graphs.