It's Monday in what is looking to be a mellow week at work. You get back from lunch and see email from the Chief Developer marked URGENT in your inbox:
from: Chief Developer
subject: URGENT Analytics Demo Day
Happy monday! We've just found out that we'll be demoing our analytics tomorrow morning for the board, so we've got all the Junior Developers putting together demo datasets. They'll be a trial run today at 5 p.m, all dev staff is required to attend, and the CTO will be there, so I'd cancel your dinner plans :D
Here's what we need:
1. Find a data set.
Locate a data set that can be modeled as a graph.
I would recommend using one of the data sets available from the Gephi Wiki. They should be formatted for easy import into NetworkX using one of the built-in functions. Furthermore, many of the data sets contain nodes with attributes, which will be important in the analysis phase.
Things to keep in mind:
The data set must contain nodes with attributes that can be used to measure assortativity.
Big graphs are cool, but remember, we will be analyzing this data in memory, so it has to fit in the RAM. I will run these on a computer with 16 gigs of RAM, so if it runs on your laptop it should be fine.
2. Load the data into NetworkX.
Depending on its format, you can use one of NetworkX's built in data loading functions, or write your own code to import the data into memory for analysis. Remember to import all relavent node and edge attributes for use in the analysis phase.
3. Use NetworkX to analyze the data.
Below you will find a variety of unfinished functions that you need to complete. They will help you perform analysis and create visualizations. You will be expected to provide the following types of analysis:
General description: number of nodes and edges, directed vs. undirected.
Structure: degree distribtion, triangle, transitivity, clustering.
Paths: average shortest path, diameter.
Centrality: degree (in- and out-), closeness, betweenness, eigenvector...
Assortativity: contextual homophily based on relavant node attributes.
Mystery function: VERY IMPORTANT! Write a function(s) that demonstrates your skill and creativity.
4. Use Gephi to visualize the network.
Experiment with a variety of layouts, sizing, and coloring schemes to create 3-4 visualizations that are complimentary to other parts of your analysis. For example, if your graph has a heavy tailed degree distribution, you can use Gephi to create visualization that highlights the large hubs in the network.
5. Interpret the results.
Using what they know about graph theory and social network analysis, all developers will be expected to provide a brief (500-750 word) interpretation of the results of the analysis. This must take into account the idiosyncracies of the network. For example, degree in a Facebook network means something different than degree in a power grid. Also, you should explain your visualizations here. What settings did you you use to make them and why?
Thanks guys! I appreciate the effort. See you soon!
This assignment will be evaluated as follows:
Percentage of Final Grade: 10%
In :# Set up environment. %matplotlib inline import networkx as nx import matplotlib as plt plt.rcParams['figure.figsize'] = 14, 7
In :########## FUNCTIONS ########## # Use NetworkX combined with any other Python libraries to complete the following functions: def load_data(filename): """ Load your graph data into NetworkX. :param filename: str. :returns: networkx.Graph or subclass """ pass def metrics(g): """ Returns the following metrics in a container of your choice: * Number of nodes * Number of edges * Density * Number of triangle * Transitivity * Average clustering coefficient * Average shortest path * Diameter :param g: networkx.Graph or subclass :returns: A collection of stats. """ pass def deg_dist(g): """ Calculate the degree distribution of the graph. :param g: networkx.Graph or subclass :returns: A collection of degree values the corresponding number of nodes """ pass def plot_deg_dist(col): """ Plot the degree distribution using Matplotlib. :param col: a collection containing the degree values and their probablities. :returns: matplotlib object """ pass def assortativity(g): """ Returns the following stats in a container of your choice: * Degree Assortativity * Assortativity based on any relevant node attribute :param g: networkx.Graph or subclass :returns: A collection of stats """ pass def centrality(g): """ Returns the following stats in a container of your choice: * Degree * Degree centrality * Betweenness centrality * Closeness centrality * Eigenvector centrality If directed: * In-degree * Out-degree :param g: networkx.Graph or subclass :returns: A collection of stats """ pass def centrality_hist(cents): """ This function produces a series of histograms, showing the probability distribution for all of the centrality measures. :param cents: a collection containing all centrality measures :returns: matplotlib object """ pass def mystery_function(): """ This one is up to you. Do a different kind of analysis, a traversal, a visualization. Try to WOW your instructor :D """ pass
In :########## SCRIPTING ########## # Use as many cells as you need to run ALL of the # functions and generate ALL the metrics and visualizations:
In [ ]:
In [ ]:
In [ ]:
The write up of your results goes here (please embed visualizations created with Gephi):