DH3501: Advanced Social Networks

Assignment 1: Analytics Demo Day

Western University
Department of Modern Languages and Literatures
Digital Humanities – DH 3501

Instructor: David Brown
E-mail: dbrow52@uwo.ca
Office: AHB 1R14

Description

It's Monday in what is looking to be a mellow week at work. You get back from lunch and see email from the Chief Developer marked URGENT in your inbox:

from: Chief Developer

to: DevTeamList

cc: CTO

subject: URGENT Analytics Demo Day

Happy monday! We've just found out that we'll be demoing our analytics tomorrow morning for the board, so we've got all the Junior Developers putting together demo datasets. They'll be a trial run today at 5 p.m, all dev staff is required to attend, and the CTO will be there, so I'd cancel your dinner plans :D

Here's what we need:

1. Find a data set.

Locate a data set that can be modeled as a graph.

  • I would recommend using one of the data sets available from the Gephi Wiki. They should be formatted for easy import into NetworkX using one of the built-in functions. Furthermore, many of the data sets contain nodes with attributes, which will be important in the analysis phase.

  • Things to keep in mind:

    • The data set must contain nodes with attributes that can be used to measure assortativity.

    • Big graphs are cool, but remember, we will be analyzing this data in memory, so it has to fit in the RAM. I will run these on a computer with 16 gigs of RAM, so if it runs on your laptop it should be fine.

2. Load the data into NetworkX.

Depending on its format, you can use one of NetworkX's built in data loading functions, or write your own code to import the data into memory for analysis. Remember to import all relavent node and edge attributes for use in the analysis phase.

3. Use NetworkX to analyze the data.

Below you will find a variety of unfinished functions that you need to complete. They will help you perform analysis and create visualizations. You will be expected to provide the following types of analysis:

  • General description: number of nodes and edges, directed vs. undirected.

  • Structure: degree distribtion, triangle, transitivity, clustering.

  • Paths: average shortest path, diameter.

  • Centrality: degree (in- and out-), closeness, betweenness, eigenvector...

  • Assortativity: contextual homophily based on relavant node attributes.

  • Mystery function: VERY IMPORTANT! Write a function(s) that demonstrates your skill and creativity.

4. Use Gephi to visualize the network.

Experiment with a variety of layouts, sizing, and coloring schemes to create 3-4 visualizations that are complimentary to other parts of your analysis. For example, if your graph has a heavy tailed degree distribution, you can use Gephi to create visualization that highlights the large hubs in the network.

5. Interpret the results.

Using what they know about graph theory and social network analysis, all developers will be expected to provide a brief (500-750 word) interpretation of the results of the analysis. This must take into account the idiosyncracies of the network. For example, degree in a Facebook network means something different than degree in a power grid. Also, you should explain your visualizations here. What settings did you you use to make them and why?

Thanks guys! I appreciate the effort. See you soon!

CD

Grading

This assignment will be evaluated as follows:

Percentage of Final Grade: 10%

  • Analysis: 50% - Analysis will be graded based upon the completeness and efficacy of the code written by the student. Students should complete all provided functions so that they run without throwing errors, and properly use the NetworkX API to produce the desired metric. Considerations of style and optimality of the solution will be secondary; however, students are encourage to try to write readable, idomatic Python and follow PEP 8 guidlines.
  • Visualization (Gephi): 20% - Visualizations will be graded on their relevance with regards to the analysis and interpretation. Each visualization produced with Gephi must be designed to highlight a particular aspect of the analysis, such as the previously mentioned example of degree distributions, homophily, clustering, etc.
  • Interpretation: 30% - The interpretation will be graded on the students ability to pick out interesting measures and explain why they are important to understanding the network. This section must also explain the reasoning behind the visualizations, clearly stating how they help the viewer understand key aspects of the analysis.

Analysis


In [1]:
# Set up environment.
%matplotlib inline
import networkx as nx
import matplotlib as plt
plt.rcParams['figure.figsize'] = 14, 7

In [2]:
########## FUNCTIONS ##########

# Use NetworkX combined with any other Python libraries to complete the following functions:

def load_data(filename):
    """
    Load your graph data into NetworkX.
    
    :param filename: str.
    :returns: networkx.Graph or subclass
    """
    pass


def metrics(g):
    """
    Returns the following metrics in a container of your choice:
    
    * Number of nodes
    
    * Number of edges
    
    * Density
    
    * Number of triangle
    
    * Transitivity
    
    * Average clustering coefficient
    
    * Average shortest path
    
    * Diameter
    
    :param g: networkx.Graph or subclass
    :returns: A collection of stats.
    """
    pass


def deg_dist(g):
    """
    Calculate the degree distribution of the graph.
    
    :param g: networkx.Graph or subclass
    :returns: A collection of degree values the corresponding number of nodes
    """
    pass


def plot_deg_dist(col):
    """
    Plot the degree distribution using Matplotlib. 
    
    :param col: a collection containing the degree values and their probablities.
    :returns: matplotlib object
    """
    pass


def assortativity(g):
    """
    Returns the following stats in a container of your choice:
    
    * Degree Assortativity
    
    * Assortativity based on any relevant node attribute
    
    :param g: networkx.Graph or subclass
    :returns: A collection of stats
    """
    pass


def centrality(g):
    """
    Returns the following stats in a container of your choice:
    
    * Degree
    
    * Degree centrality
    
    * Betweenness centrality
    
    * Closeness centrality
    
    * Eigenvector centrality
    
    If directed:
    
    * In-degree
    
    * Out-degree
    
    :param g: networkx.Graph or subclass
    :returns: A collection of stats
    """
    pass


def centrality_hist(cents):
    """
    This function produces a series of histograms, showing the probability distribution
    for all of the centrality measures. 
    
    
    :param cents: a collection containing all centrality measures
    :returns: matplotlib object
    """
    pass


def mystery_function():
    """
    This one is up to you. Do a different kind of analysis, a traversal, a visualization.
    Try to WOW your instructor :D
    """
    pass

In [3]:
########## SCRIPTING ##########

# Use as many cells as you need to run ALL of the
# functions and generate ALL the metrics and visualizations:

In [ ]:


In [ ]:


In [ ]:

Results

The write up of your results goes here (please embed visualizations created with Gephi):