Western University
Department of Modern Languages and Literatures
Digital Humanities – DH 3501

Instructor: David Brown
E-mail: dbrow52@uwo.ca
Office: AHB 1R14

Class 4: Introduction to Graph Theory

In this class we focus on formalizing the introductory concepts necessary to studying networks. First we will discuss the basic concepts fundamental to all network studies: nodes, edges, and various basic topological configuration. We look at different representations of graphs, and see how they have been implemented in the Python library NetworkX. After learning some of the basics of the NetworkX API, we look at some general concepts in basic graph theory: neighbors, paths, and traversals.

  • After this class a student should be able to:
    • Define the basic concepts from graph theory: graph, node, edge, dyad, triad, path, and traversal.
    • Recognize various representations of graph data: edge list, adjacency matrix etc.
    • Explain the difference between a depth first and breadth first traversal.
    • Use the basic NetworkX API to create a graph and write a simple neighbors search traversal.

Class 5: NetworkX and Centrality

This class presents an overview of the core NetworkX API, along with an introduction to some of its most commonly used algortihms, graph generators, and drawing functions. Then, using the various implementations of centrality algorithms included with NetworkX, we discuss how nodes are evaluated on an individual level using various measures of centrality. We emphasize how the interpretation of network measures must be based on the semantics of the graph you are studying, and demonstrate this using the concept of degree centrality combined with several real world examples.

  • After this class a student should be able to:
    • Use NetworkX to build graphs, generate random graphs, and perform basic analysis.
    • Transfer the knowledge of NetworkX gained in class to other modules in the NetworkX library.
    • Analyze a graph for various measures of node centrality.
    • Begin to interpret the result of network analytics within the context of the graph.

Class 6: Formal Definitions and the Jackson Text

This class addresses a topic that can be difficult for many students: mathematical notation. More specifically, we focus on set builder notation, which is used quite frequently in graph theory. It begins with simple aspects of notation, basic set operations like union, intersection, and membership, and reading set builder notation. After we translate these kinds of expressions to Python, we look at how Jackson has employed set builder notation to describe concepts from graphs in his book. This notation is then applied in a varity of contexts from describing node neighborhoods to statistical algorithms for calculating network centrality.

  • After this class a student should be able to:
    • Struggle their way through graph concepts expressed in set builder notation.
    • Students are not expected to be able to produce this sort of notation, but using available references and a little bit of practice...
    • Recognize and use the very basics of MathJax(LaTex) to produce "mathy" text in the Ipython notebook.

Class 7: Triadic Closure and the Strength of Ties

This class begins with a discussion of Mark Granovetters famous 1973 study, and its importance for our understanding of the way information is transmitted accross a network. Then, we take a step-by-step approach to understanding the theoretical models associated with some of the concept introduced in the study. Beginning with an introduction to the dynamic process of triadic closure, we analyze how triangle form in a graph to produce clusters, and how this process intereacts with the concept of strength of ties, and local bridges. We then see how these rough models have been fine tuned into the more subtle model provided by Onnela's study of cell phone communication, the concept of neighborhood overlap, and how this functions as an extension of the previous concepts. We also examine how different proxies for edge strength have been applied in studies of Facebook, and make an intuitive evaluation of the efficacy of this technique. Finally, we contextualize the concepts of closure, bridges, and structural holes by providing real world examples of how these structural properties provide social capital in real world situations.

  • After this class a student should be able to:
    • Recognize clusters and local bridges within network structure.
    • Explain the relationship between the concepts of strong triadic closure.
    • Recognize different techniques for proxying concepts such as edge strength (edge proxies will be a recurring theme in this class)
    • Apply the basic concepts of network structure to real world situations and make connections between theoretical models and more subtle empirical results.

Class 9: Homophily and Association in Networks

This class introduces the concept of network context through an exploration of the concept of homophily. By presenting homophily as a motivation for network dynamics (link formation), this class emphasizes the distinction of intristic vs. contextual factors in network processes. We then dichotomize the concept of homophily into status and value homophily, looking at how immutable (or status) characteristics affect network formation (as in Moody 2001), and contrasting this with homophily based mutable characteristics, such as political affiliation. With respect to mutable characteristics, we investigate the process of social influences such as peer pressure in order to evaluate the (marginally) competing theories of homophily and socialization. This class also introduces the concept of the affiliation, or multi-partite network, along with a discussion of the basic analytic techniques used with this kind of network. We see how these techniques have been implemented in NetworkX, and then we consider an alternative implementation of multi-partite analysis using projx, a Python library that wraps NetworkX to provide a domain specific language (DSL) for performing multi-partite transformations and compressions. Note: projx was implemented by yours truly, and while it hasn't seen really heavy adoption, it averages about 30-50 downloads a day. However, it is important to understand that I don't introduce this tool to promote its use, but instead to provide a conceptual excercise for the students to think about multi-partite graph transformations.

After this class a student should be able to:

  • Recognize the difference between instrinsic and conceptual factors in network dynamics.
  • Explain how both status homophily and value homophily affect network structure.
  • Analyze a multi-partite network for possible avenues of compression.
  • Compress and project a bipartite network to produce a social network and perform analysis.
  • Write basic transformation statements using both the NetworkX Bipartite API and projx.

Class 14: Graph Data and Storage

In this class we explore various ways to store graph data ranging from flat files to graph databases. We begin by evaluating the different flat file formats for storing graph data, along with their advantages and disadvantages. We also consider the advantages and disadvantages of using a library like NetworkX to perform network analysis. The class then moves on to address the fundamentals of database technology. While this course doesn't focus on relational databases, in this class I present them as a means to learn some basic database concepts and also to contrast them with graph databases. We then discuss alternative solutions for data modeling and storage focusing on the big data revolution and the advent of NoSQL databases. Finally, we discuss the model used for graph databases, the difference between native and non-native graph storage, and the multitude of options that have become available in recent years.

After this class a student should be able to:

  • Recognize the various flat file formats for storing graph data.
  • Choose a technique for data storage and management based on their requirements.
  • Explain some basic relational database concepts such as lazy evaluation and transactions.
  • Use the property graph model.
  • Recognize the benefits and limitations of graph databases.

Class 15: Graph Data Modeling and the Cypher Query Language

In this class we begin by looking at an example of iterative data modeling, identifing the trap of lossy data models and discussing how to avoid common data modeling pitfalls. Then we dive into Neo4j using the Shakespeare graph presented in the text Graph Databases. After analyzing the structure of create queries in Neo4j Cypher, students will practice retrieving and modifying data using Cypher via both the IPython notebook extension ipython-cypher (developed by Javier de la Rosa) and the Neo4j Python driver Py2Neo. We will also consider the flexibility of graph data modeling and how it can be applied to model multi-domain data structure.

After this class a student should be able to:

  • Download and run a local Neo4j database.
  • Write and execute queries in Cypher.
  • Use graph databases to model complex domain problems encountered in humanities settings.
  • Recognize lossy data models and implement strategies to overcome them.

I was just told by students who spoke with Joyce and Victoria and they say that class descriptions are not required.