Western University
Department of Modern Languages and Literatures
Digital Humanities – DH 3501

Instructor: David Brown
E-mail: dbrow52@uwo.ca
Office: AHB 1R14

Overview

Overview

This course introduces students to the ubiquitous and rapidly growing fields of graph theory and network analysis. Using a wide variety of real world examples from current events and empirical research, students will obtain a broad knowledge of the theory behind network analysis and its application to understanding human behaviour and cultural systems. This knowledge will be applied hands on as students process network data using a variety of up-to-date graph storage, analysis, and visualization software. Based around a solid core of reading, discussion, and Python programming, students will extend their knowledge into the realm of graph databases and the application frameworks used for storing and processing the "big data" produced by social networking sites like Facebook and Twitter. Finally, this course provides an introduction to advanced topics in graph theory such as dynamics, random models, strategy and interaction, and diffusion that can serve as a springboard for future work in this exciting new field. The course material is largely drawn from four texts and two technological platforms, which are presented in 4 units over the course of the semester.

Texts

  • Social network analysis for startups: Finding connections on the social web (Tsvetovat & Kouznetsov, 2011), provides a wide range of high level examples of the application of social network analysis to current events as well as classic network studies. The text includes a wide variety of practical thought experiments and coding examples using Python’s NetworkX library. This is the introductory/beginners text.
  • Networks, crowds, and markets (Easley & Kleinberg, 2010), offers a more theoretical approach to graph theory and network analysis, focusing on canonical examples and studies. This text presents more advanced concepts in a relatively approachable manner, and contains a wide variety of supplementary material. This is the intermediate text.
  • Social and economic networks (Jackson, 2010), is a more formal approach to network studies that addresses many advanced topics in graph theory. This text provides formal definitions for the representation of graphs, and thus is a good introduction to the mathematical notations such as set builder notations used in graph theory. A large portion of this course’s advanced material is drawn from this text.
  • Graph databases (Robinson, Webber, & Eifrem, 2013), deals with the implementation of concepts from graph theory in terms of data storage, specifically in the context of the Neo4j graph database. This book provides an introduction to the property graph model of data storage, and the use and implementation of Neo4j graph database. It is the practical introduction to graph databases.

Technology

  • Python programming language will be the primary software utilized in this course. The prerequisite specifies that students already be familiar with the basics of Python upon entering the course. This course builds on this knowledge, training the students in the use of the NetworkX library for network analysis. NetworkX allows students to perform complicated graph algorithms, manipulations, and statistics using a familiar Python API. Furthermore, Python drivers will also by the principal means of interaction with external software, such as database systems.
  • Neo4j graph database will be the primary database technology introduced in this class. It will serve as a foundation to teach the property graph model for data storage and as a backend to store and analyze large quantities of graph data. Students will learn how to model their data as a graph, access Neo4j using a Python driver, and execute queries using the Cypher query language.

Units

  • Unit 1, A networked world, is really just an introduction: we will be installing software, verifying the class roster, and getting everything ready to go. We will start with some light reading here, and watch an interesting talk by the influential physicist Albert Laslo Barabasi. Perhaps most importantly, we’ll make sure that everyone has their computing environment ready to go: we need an IPython scientific computing environment and Java up and running to support Neo4j. Hopefully, by class 4 the roster will be set, and everyone will be ready to go.
  • Unit 2, Graph theory and social network analysis, will begin with class 4. Unit 2 will be the real meat of the course: we focus on defining the fundamental concepts in graph theory: nodes, edges, dyads, triads, paths, cycles, walks, components, clusters, communities, centrality, and degree distribution. In terms of technology, this unit focuses on using Python to process graph data, build graphs in NetworkX, and perform analysis. The vast majority of the readings in Unit 2 are drawn from the first three texts listed above, and class time will be spent reviewing core concepts, going over examples drawn from the book, and implementing statistical algorithms for graphs using NetworkX. A strong focus on comprehension of formal representations of graph theoretic concepts is present throughout this unit, both as they are explained in text and using mathematical notation. Furthermore, considerable effort is placed on understanding the implication of graph analysis techniques in real world studies, and students will become familiar with canonical empirical research as it relates to core graph theory concepts. There is a consistent emphasis placed on the development of methodology for modelling networks. Topics such as edge proxying and weighting are of particular importance as they are fundamental to concepts that emerge later in the course. This unit culminates with a directed reading of several articles discussed in the texts, with an emphasis on full comprehension of the implications of methodological choices and interpretation of results.
  • Unit 3, Unit 3: Graphs in the Real World: Tooling and Techniques, introduces students to models of storage for graph data that is too big to be stored in flat files. In particular, this unit focuses on the property graph model, and its implementation in the Neo4j graph database. The majority of readings in this unit come from the 4th text, and focus on both the theoretical and practical underpinnings of using a graph database. Students will install and run Neo4j, build property graph models that correspond to real data, and use the Cypher query language to populate and query a database. Furthermore, this unit surveys the current landscape of graph and big data processing software, data sources, and databases, and discusses the technical requirements and knowledge necessary to use them.
  • Unit 4, Models and theory: Advanced topics in network analysis, presents topics such as: models and structure, strategic interaction, and diffusion and dynamics. Using a selection of readings from Easley & Kleinberg and Jackson, students will become engaged in more advanced topics in network analysis. This unit also presents the opportunity for students to peruse some of the most exciting new research in cultural studies (Schich et al. 2014) through directed reading and discussion. Unit 4 also focuses on the integration of Python and Neo4j with other software for graph storage and analytics.

Justification

DH3501 was designed to be incorporated into the Department of Modern Languages and Literatures Digital Humanities Minor at the 3000+ level. It aims to broaden the scope of the upper division course offerings and thereby attract the growing number of students enrolled in courses like DH1011, DH2021, and CS2120/DH2220 to the minor. Roughly corresponding to the subject matter presented in the description of the currently unoffered DH3501 Advanced Social Networking, both the theoretical and applied content of this course is inline with the proposed goals of the Digital Humanities IDI at Western of “mastering high levels of digital literacy” and “formalizing and storing of cultural and historical data in complex relations” (Suarez, 2013). Furthermore, the technology and methodology presented in this course is designed to smoothly integrate with the ecosystem of tools and techniques presented in the current program offerings, and thereby contribute the cohesiveness and comprehensiveness of the program as a whole.

Looking beyond the Minor in Digital Humanities at Western, it is important to note the growing importance of understanding networks regardless of one's field. The relative ubiquity the network structure, whether it be found in an online social network like Facebook or the physical systems that enable communication, has made it impossible to ignore. In academic work, the study of networks was once confined to the fields of math and physics, but now it has expanded to be employed to better understand recipes (Ahn et al., 2011), world trade (De Benedictis & Tajoli, 2011), medieval power structures (Padgett & Ansell, 1993), and even literary texts such as Hamlet (Moretti, 2011). This last example is especially interesting in the current context, as it refers to research carried out by Franco Moretti, one of the earliest proponents of Digital Humanities (Moretti, 2005). Indeed, high profile DHers such as Elijah Meeks have indicated the importance of network analysis, identifying it as one of the three “pillars of digital humanities” (Meeks, 2012).

Meeks's statement, while possibly contentious, raises an interesting point about network analysis and its use in the the digital humanities: it is one of the primary avenues that researchers have for approaching the human problems that form the core of this field. Along with the other pillars--GIS, NLP, and also visualization--network analysis is definitely a focal point of this emerging field. Thus it seems that to offer a program in digital humanities without a course on network analysis would be doing any young practitioner of the field a disservice, and combined with the general ubiquity and interest in networks, this seems like more than enough to justify planning this course. If fact, I would like to take this concept to the next level and propose that I teach this course next year.

Methodology

The overarching goal of this course is to produce students that have the theoretical background and the necessary technical skills to intelligently perform network analysis in the real world. To achieve this goal, DH3501 employs a methodology designed to unify theory and practice, presenting students with programming and problem solving challenges based upon a solid base of content comprised of theoretical and applied graph theory. The following paragraphs detail how classroom activity, self-directed assignments, and course content will be presented and conducted in this course.

Classroom

First and foremost, this is a hands-on, interactive course. Ideally it will be carried out using a WALS classroom. These modern classrooms leverage pod (small group style) seating arrangements with local IPs and laptop screen projection capability to foster small group work as well as intergroup sharing and interaction. Class will be presented in microlecture format, ten to fifteen minute instructor lectures that highlight or supplement important concepts from the readings. The primary medium of class content presentation will be through online and locally hosted IPython notebooks, which allow the instructor to include examples written in Python that can be run interactively by students. Between microlectures, students will be presented coding challenges, in which they must use Python and other software to creatively solve problems relating to the course material. These challenges will generally be embedded into the IPython notebook lectures, which facilitates content organization and allows the instructor to guide students with code examples, function skeletons, etc. For the first nine weeks of the course, every class will end with a longer coding challenge that requires each pod of students to solve a computational problem, implement an algorithm etc. Student solutions can then be shared and discussed using WALS classroom technology for screen projection. During the last 3 weeks of the course (before final project presentations), the last half hour of class will be devoted to final project development in a semi-structured workshop environment.

Assignments

The hands-on approach continues in the assigned work. Every assignment requires that students perform the whole pipeline of data processing. They are designed to simulate real life work/research situations that require students to not only apply what they know, but also to improvise based on the idiosyncrasies of their chosen data set. They will be required to locate and download real data sets, and deal with any problems of compression, formatting, and storage as they occur. Furthermore, they will be expected load the data into graph analytics software, perform analysis, create visualizations, and perhaps most importantly, interpret the results in terms of the network they are studying. This last point is particularly relevant in the field of network analysis due to the high degree of semantic variance present in the network data model, for example, the nature of a relationship represented by an edge can vary highly between networks. Therefore, regardless of technical skill, any student of graphs must have a solid understanding of both the qualitative and quantitative aspects of graph theory. This understanding will be developed through the study of course content, discussed in the next section.

Content

In order to properly apply graph analytics technique, students require a strong background in theory. This course presents students with a rigorous and directed reading schedule that provides the comprehensive background necessary to understand graph theory. Beginning with the second unit, students hit the texts hard to build the foundation for the rest of the course. This unit is mainly about literacy: in order to read empirical network analysis studies, students must have a strong foundation in the basic concepts, as well as the ability to read the mathematical notation commonly used. It begins with a practical approach to networks, using real world examples and thought experiments to help students develop their intuitive understanding of graph theory. As students become comfortable with the terminology and basic concepts, more formal examples are introduced to slowly lay the groundwork for more advanced work later in the course. Towards the end of this unit, students will be challenged to employ their knowledge by reading several empirical studies published in professional journals. Finally, as this unit represents what I consider to be the core knowledge focus of the course, the midterm will focus exclusively on the information presented in the three texts (Tsvetovat & Kouznetsov, Easly & Kleinberg, and Jackson) during this unit.

Unit 3 is all about the application of the concepts learned in Unit 2 in the real world, especially to solve problems of storage, analysis, and visualization. Its goal is to expose students to the panorama of available tools for processing graph data, and demonstrate how these tools can be integrated into the Python programming environment. This in turn opens up a broad range of possibilities to the student, allowing them to choose specific tools depending on their needs without feeling limited by their current skill set. The flexibility with regards to tooling, combined with the solid theoretical background presented in the first half of this class prepares students for both the final project and the challenges encountered in real world analytics.

Finally, Unit 4 presents students with a broad introduction to more advanced topics in graph theory. Much of this material requires more complex math, and is much more difficult at a conceptual level than what was presented in the first half of the course; however, it is designed to serve more as a springboard into future indagations in the world of graph theory. While they will not be tested on this material, they will be encouraged to integrate the ideas presented in this unit with their approach to the final project. This unit will provide semi-structured class time focused on student progress, in which students can ask questions related to any concepts from the course as they relate to final projects, and the instructor can take advantage of teachable moments to provide more customized instruction.

Conclusion

Overall, this course aims to simulate and stimulate the natural process of learning about graphs. Beginning with real world examples and coding, backed by deeper theoretical and algorithmic indagationes into the world of graph theory, students build a basis of knowledge that allows them to transition into the world of modern analytics. They learn to navigate the veritable jungle of tooling, and chose the software best suited to their needs, all the while building confidence in their abilities. Finally, they start to study the math, foundations, and complex behaviour of graphs and networks as they begin their journey to a lifetime of interest in the most complex everyday data structure around: the graph.

Teaching Philosophy

My teaching philosophy is based on my experience as a Spanish instructor, a TA in the Digital Humanities program, and as a life-long student. First and foremost, I am a proponent of experiential education. I believe that to learn, and perhaps more importantly, to retain, you must do. Based on my experience as a student, the vast majority of real, deep learning comes from having to really interact with your materials: one must search, find, read, present, and above all care about what they are doing. However, for students to get the most out of this interaction they must be truly motivated, and therein lies the serious challenge my teaching philosophy will address.

In my experience as a Spanish instructor, lack of student motivation has been an issue. Many students take first year Spanish to fulfil a language requirement, or an elective, while the majority of their coursework is in business, engineering, or the sciences. It is easy to imagine how difficult time management becomes for these students, and how these demanding and attractive classes--in the sense of prestige and job prospects--can cause some students to allot less time and energy to a language requirement or elective. Student attitudes towards the humanities are, to a certain extent, affected by broader societal values, and currently North American culture has placed a high premium on the STEM disciplines and Business programs.

While the importance of these disciplines is obvious, I personally continue to believe that the humanities are relevant in the 21st century. Indeed, as we are increasingly barraged by media, especially digital media, the ability to read/view critically, to think, and write are perhaps more important than ever. Promoting this kind of critical and thoughtful approach to daily life should be one of the key tenets of the humanities, and while reading and study are still crucial skills, a new set of digital and interdisciplinary skills is emerging as an important aspect of personal education. I believe that the humanities need to embrace these new skills, and provide students with the type of education they need to become leaders in the digital age with a broad variety of skills: research, data analysis, resource management and storage, collaborative methodology, creativity, and critical interpretation of the past, present, and future.

My teaching exemplifies this belief in that my primary goal is to motivate students to acquire skills applicable in any academic or professional field, while at the same time emphasizing the importance of the study of human interaction, history and culture. In my courses, students learn how to think about learning as a project, identify and solve problems using traditional and digital techniques, embrace the advantages of organized collaboration, and become leaders who are responsible for their own education and future.

Evaluation

The evaluation of this course is based primarily on student activities involving programming, problem solving, and data analysis. Seventy percent of the course grade is drawn from the two assignments, the final project, and the notebook activities that students will complete daily in class. Thirty percent of the grade will come from a midterm that focuses on material from the first half of the class. This weighting corresponds to course outcomes and methodology in the following way. The primary motivation for this class is to train students to perform actual analysis with real data. Therefore, the majority of the students grades will reflect activities that evaluate these abilities. However, as previously mentioned, in order to properly apply graph analytics techniques and interpret the results, a well rounded understanding of graph theory must be developed. In order to ensure that this basic knowledge has been acquired before the majority of the work is due, students will be tested on this knowledge with the midterm. For more detail about the evaluation, please see the student syllabus and the individual assignments: assignment 1, assignment 2, midterm, and final project.

Class Descriptions

Class descriptions can be found here

Instructor Bibliography

Bibliography

Ahn, Y. Y., Ahnert, S. E., Bagrow, J. P., & Barabási, A. L. (2011). Flavor network and the principles of food pairing. Scientific reports, 1.

Albert, R., & Barabási, A. L. (2002). Statistical mechanics of complex networks. Reviews of modern physics, 74(1), 47.

Baerveldt, C., Van Duijn, M. A., Vermeij, L., & Van Hemert, D. A. (2004). Ethnic boundaries and personal choice. Assessing the influence of individual inclinations to choose intra-ethnic relationships on pupils’ networks. Social Networks, 26(1), 55-74.

Barabási, Albert-László, and Réka Albert. "Emergence of scaling in random networks." science 286.5439 (1999): 509-512.

Burt, R. S. (2009). Structural holes: The social structure of competition. Harvard university press.

Burt, Ronald S. "The network structure of social capital." Research in organizational behavior 22 (2000): 345-423.

Calvo-Armengol, A., & Jackson, M. O. (2004). The effects of social networks on employment and inequality. American economic review, 426-454.

Coleman, J. S., Katz, E., & Menzel, H. (1966). Medical innovation: A diffusion study (Second Edition). Indianapolis: Bobbs-Merrill.

De Benedictis, L., & Tajoli, L. (2011). The world trade network. The World Economy, 34(8), 1417-1454.

De Sola Pool, Ithiel, and Manfred Kochen. "Contacts and influence." Social networks 1.1 (1979): 5-51.

De Weerdt, J. (2002). Risk-sharing and endogenous network formation (No. 2002/57). WIDER Discussion Papers//World Institute for Development Economics (UNU-WIDER).

Easley, D., & Kleinberg, J. (2010). Networks, crowds, and markets. Cambridge University.

Emerson, R. M. (1962). Power-dependence relations. American sociological review, 31-41.

Fafchamps, M., & Lund, S. (2003). Risk-sharing networks in rural Philippines. Journal of development Economics, 71(2), 261-287.

Goyal, S., Van Der Leij, M. J., & Moraga‐González, J. L. (2006). Economics: An emerging small world. Journal of political economy, 114(2), 403-412.

Granovetter, M. S. (1973). The strength of weak ties. American journal of sociology, 1360-1380.

Jackson, M. O. (2010). Social and economic networks. Princeton University Press.

Jackson, M. O., & Rogers, B. W. (2007). Meeting strangers and friends of friends: How random are social networks?. The American economic review, 890-915.

Kandel, D. B. (1978). Homophily, selection, and socialization in adolescent friendships. American journal of Sociology, 427-436.

Lazarsfeld, P. F., & Merton, R. K. (1954). Friendship as a social process: A substantive and methodological analysis. Freedom and control in modern society, 18(1), 18-66.

Marsden, P. V. (1990). Network data and measurement. Annual review of sociology, 435-463.

Meeks, E. (2012). More Networks in the Humanities or Did books have DNA? Retrieved from https://dhs.stanford.edu/visualization/more-networks/.

Milgram, Stanley. "The small world problem." Psychology today 2.1 (1967): 60-67.

Moody, J. (2001). Race, school integration, and friendship segregation in america1. American Journal of Sociology, 107(3), 679-716.

Moretti, F. (2011). Network theory, plot analysis. New Left Review.

Moretti, F. (2005). Graphs, maps, trees: abstract models for a literary history. Verso.

Myers, C. A., & Shultz, G. P. (1951). The dynamics of a labor market: a study of the impact of employment changes on labor mobility, job satisfactions, and company and union policies. Prentice-Hall.

Newman, Mark EJ. "The structure and function of complex networks." SIAM review 45.2 (2003): 167-256.

Opsahl, T. (2013). Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2), 159-167.

Opsahl, T., & Panzarasa, P. (2009). Clustering in weighted networks. Social networks, 31(2), 155-163.

Padgett, J. F., & Ansell, C. K. (1993). Robust Action and the Rise of the Medici, 1400-1434. American journal of sociology, 1259-1319.

Rees, A., & Shultz, G. P. (1970). Workers and wages in an urban labor market.

Reiss Jr, Albert J. "Co-offending and criminal careers." Crime and justice (1988): 117-170.

Robinson, I., Webber, J., & Eifrem, E. (2013). Graph databases. O'Reilly Media, Inc.

Ryan, B., & Gross, N. C. (1950). Acceptance and diffusion of hybrid corn seed in two Iowa communities (Vol. 372). Agricultural Experiment Station, Iowa State College of Agriculture and Mechanic Arts.

Schich, M., Song, C., Ahn, Y. Y., Mirsky, A., Martino, M., Barabási, A. L., & Helbing, D. (2014). A network framework of cultural history. science, 345(6196), 558-562.

Serrano, M. Á., & Boguñá, M. (2003). Topology of the world trade web. Physical Review E, 68(1), 015101.

Suarez, J.L. (2013). Digital Humanities. Retrieved from http://provost.uwo.ca/idi/digital_humanities.html.

Tsvetovat, M., & Kouznetsov, A. (2011). Social network analysis for startups: Finding connections on the social web. O'Reilly Media, Inc.

Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’networks. nature, 393(6684), 440-442.

Weisbuch, G., Kirman, A., & Herreiner, D. (1997). Market organisation (pp. 221-240). Springer Berlin Heidelberg.

Willer, D. (Ed.). (1999). Network exchange theory. Greenwood Publishing Group.

Zipf, G. K. (1949). Human behavior and the principle of least effort.