In this class we focus on formalizing the introductory concepts necessary to studying networks. First we will discuss the basic concepts fundamental to all network studies: nodes, edges, and various basic topological configuration. We look at different representations of graphs, and see how they have been implemented in the Python library NetworkX. After learning some of the basics of the NetworkX API, we look at some general concepts in basic graph theory: neighbors, paths, and traversals.
This class presents an overview of the core NetworkX API, along with an introduction to some of its most commonly used algortihms, graph generators, and drawing functions. Then, using the various implementations of centrality algorithms included with NetworkX, we discuss how nodes are evaluated on an individual level using various measures of centrality. We emphasize how the interpretation of network measures must be based on the semantics of the graph you are studying, and demonstrate this using the concept of degree centrality combined with several real world examples.
This class addresses a topic that can be difficult for many students: mathematical notation. More specifically, we focus on set builder notation, which is used quite frequently in graph theory. It begins with simple aspects of notation, basic set operations like union, intersection, and membership, and reading set builder notation. After we translate these kinds of expressions to Python, we look at how Jackson has employed set builder notation to describe concepts from graphs in his book. This notation is then applied in a varity of contexts from describing node neighborhoods to statistical algorithms for calculating network centrality.
This class begins with a discussion of Mark Granovetters famous 1973 study, and its importance for our understanding of the way information is transmitted accross a network. Then, we take a step-by-step approach to understanding the theoretical models associated with some of the concept introduced in the study. Beginning with an introduction to the dynamic process of triadic closure, we analyze how triangle form in a graph to produce clusters, and how this process intereacts with the concept of strength of ties, and local bridges. We then see how these rough models have been fine tuned into the more subtle model provided by Onnela's study of cell phone communication, the concept of neighborhood overlap, and how this functions as an extension of the previous concepts. We also examine how different proxies for edge strength have been applied in studies of Facebook, and make an intuitive evaluation of the efficacy of this technique. Finally, we contextualize the concepts of closure, bridges, and structural holes by providing real world examples of how these structural properties provide social capital in real world situations.
This class introduces the concept of network context through an exploration of the concept of homophily. By presenting homophily as a motivation for network dynamics (link formation), this class emphasizes the distinction of intristic vs. contextual factors in network processes. We then dichotomize the concept of homophily into status and value homophily, looking at how immutable (or status) characteristics affect network formation (as in Moody 2001), and contrasting this with homophily based mutable characteristics, such as political affiliation. With respect to mutable characteristics, we investigate the process of social influences such as peer pressure in order to evaluate the (marginally) competing theories of homophily and socialization. This class also introduces the concept of the affiliation, or multi-partite network, along with a discussion of the basic analytic techniques used with this kind of network. We see how these techniques have been implemented in NetworkX, and then we consider an alternative implementation of multi-partite analysis using projx, a Python library that wraps NetworkX to provide a domain specific language (DSL) for performing multi-partite transformations and compressions. Note: projx was implemented by yours truly, and while it hasn't seen really heavy adoption, it averages about 30-50 downloads a day. However, it is important to understand that I don't introduce this tool to promote its use, but instead to provide a conceptual excercise for the students to think about multi-partite graph transformations.
After this class a student should be able to:
In this class we explore various ways to store graph data ranging from flat files to graph databases. We begin by evaluating the different flat file formats for storing graph data, along with their advantages and disadvantages. We also consider the advantages and disadvantages of using a library like NetworkX to perform network analysis. The class then moves on to address the fundamentals of database technology. While this course doesn't focus on relational databases, in this class I present them as a means to learn some basic database concepts and also to contrast them with graph databases. We then discuss alternative solutions for data modeling and storage focusing on the big data revolution and the advent of NoSQL databases. Finally, we discuss the model used for graph databases, the difference between native and non-native graph storage, and the multitude of options that have become available in recent years.
After this class a student should be able to:
In this class we begin by looking at an example of iterative data modeling, identifing the trap of lossy data models and discussing how to avoid common data modeling pitfalls. Then we dive into Neo4j using the Shakespeare graph presented in the text Graph Databases. After analyzing the structure of create queries in Neo4j Cypher, students will practice retrieving and modifying data using Cypher via both the IPython notebook extension ipython-cypher (developed by Javier de la Rosa) and the Neo4j Python driver Py2Neo. We will also consider the flexibility of graph data modeling and how it can be applied to model multi-domain data structure.
After this class a student should be able to: