Western University
Department of Modern Languages and Literatures
Digital Humanities – DH 3501

Instructor: David Brown
E-mail: dbrow52@uwo.ca
Office: AHB 1R14


Well you got the job. You're chief analyst now and you're company wants to expand its analytics into new frontiers. It's time for you to build your own project from scratch. Here's what you need to do:

1. Find a data set. A big, bad data set.

This is kind of like the first assignment but big. You need to go out and find a huge data set, with at the very least edges on the order of $10^6$, preferably much bigger, with millions of nodes and tens or hundreds of millions of edges. Also, the graph must have attributes! This is the crux, you will probably have to find a data set that is not already prepared as a graph, and creatively model the data.

2. Model the data.

You always need a data model. Make a nice image of your model using a computer drawing program or tool of your choice. With the model you will need to provide a detailed explanation of node types, relationships, and properties.

3. Load it into a graph database.

Process the data and load it into a database. There are no rules here, you can use whatever technology that you want--as long as it makes sense.

4. Write a series of queries against the data.

Demonstrate the property graph structure by writing a series of queries and traversals against your database.

5. Export a subgraph projection to NetworkX and perform analysis.

Find a relevant subsection of the graph to project into memory for more detailed analysis using NetworkX.

6. Create a series of visualizations of the subgraph with Gephi.

Highlight import parts of your analysis of your analysis with visualizations.

7. Prepare a detailed write up of your research.

This should explain your entire process and include:

  • A description of the data set.
  • A description of the technology used in the project.
  • A write up of the analysis.
  • Explanation of the visualizations.

8. Prepare a presentation of your project.

9. Turn in your work.

Turn in all materials used in the final project including:

  • The data set you used.
  • The image and description of the data model.~300 words.
  • All scripts used in data processing and database import.
  • The queries and traversals used to demonstrate the structure of your database.
  • The actual database files.
  • A flat file version of the projected subgraph, with the accompanying scripts and analysis results.
  • All generated visualizations.
  • Write up.~350-700 words.
  • Presentation materials.