Graph Databases and the Humanities

%matplotlib inline
%load_ext gremlin
import asyncio
import aiogremlin
import networkx as nx

What's a graph?

A binary mathematical structure consisting of nodes and edges:

$g = \begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}$

g = nx.scale_free_graph(10)

Graphs are everywhere these days!

  • Facebook

  • Twitter

  • LinkedIn

But wait...these graphs are more than ones and zeros...

Property graph model

Why graphs?

Graphs are very good at representing complex interrelations between entities...

Ahn, Y. Y., Ahnert, S. E., Bagrow, J. P., & Barabási, A. L. (2011). Flavor network and the principles of food pairing. Scientific reports, 1.

Schich, M., Song, C., Ahn, Y. Y., Mirsky, A., Martino, M., Barabási, A. L., & Helbing, D. (2014). A network framework of cultural history. Science, 345(6196), 558-562.

The CulturePlex Lab: Our research

  • The production and diffusion of cultural objects.

Towards a Digital Geography of Hispanic Baroque Art

The Art Space of a Global Community

Why GraphDBs?

  • Relational databases:

    • Inflexible
    • Bad at relationships
    • Lacking in semantic richness


  • Neo4jrestclient by versae - 58977 downloads

  • SylvaDB


  • Landscapes of Castas Painting - Masters Thesis/DH2014
  • Preliminaries Project - DH2013/Congress 2015

  • Preliminaries Projections required a wide variety of schema transformations and projections.

  • A tedious task to be sure.

  • Enter projx - a graph transformation library written in Python with a Cypher based DSL

subgraph = projection.execute("""
    MATCH   (p1:Person)-(wild)-(p2:Person)
    PROJECT (p1)-(p2)
    METHOD NEWMAN Institution, City
    SET     label = wild.label
    DELETE  wild


Tinkerpop/Gremlin Ecosystem

  • A standard API for graph databases

  • Gremlin traversal language

  • Tinkerpop enabled backends:

    • Titan
    • Neo4j
    • Gremlin-Elastic
    • Hadoop (Spark/Giraph)
  • All accessed using the Gremlin Server

def stream(gc):
    results = []
    resp = yield from gc.submit("x + x", bindings={"x": 1})
    while True:
        result = yield from
        if result is None:
    return results
loop = asyncio.get_event_loop()
gc = aiogremlin.GremlinClient()
results = loop.run_until_complete(stream(gc))

[Message(status_code=200, data=[2], message={}, metadata='')]

loop.run_until_complete(gc.close())  # Explicitly close client!!!


graph = TinkerFactory.createModern()
g = graph.traversal(standard())

['vadas', 'josh']

10 Million Spanish Books

Identify statistical regularities in the production and diffusion of Spanish literature.

  • 13,188,245 records from OCLC

  • Neo4j/Tinkerpop/Apache Spark

Conclusion: Graphs are cool!