Graph Databases and the Humanities

In [2]:
%matplotlib inline
%load_ext gremlin
import asyncio
import aiogremlin
import networkx as nx

The gremlin extension is already loaded. To reload it, use:
  %reload_ext gremlin

What's a graph?

A binary mathematical structure consisting of nodes and edges:

$g = \begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}$

In [8]:
g = nx.scale_free_graph(10)

Graphs are everywhere these days!

  • Facebook

  • Twitter

  • LinkedIn

But wait...these graphs are more than ones and zeros...

Property graph model

Why graphs?

Graphs are very good at representing complex interrelations between entities...

Ahn, Y. Y., Ahnert, S. E., Bagrow, J. P., & Barabási, A. L. (2011). Flavor network and the principles of food pairing. Scientific reports, 1.

Schich, M., Song, C., Ahn, Y. Y., Mirsky, A., Martino, M., Barabási, A. L., & Helbing, D. (2014). A network framework of cultural history. Science, 345(6196), 558-562.

The CulturePlex Lab: Our research

  • The production and diffusion of cultural objects.

Towards a Digital Geography of Hispanic Baroque Art

The Art Space of a Global Community

Why GraphDBs?

  • Relational databases:

    • Inflexible
    • Bad at relationships
    • Lacking in semantic richness


  • Neo4jrestclient by versae - 58977 downloads

  • SylvaDB


  • Landscapes of Castas Painting - Masters Thesis/DH2014
  • Preliminaries Project - DH2013/Congress 2015

Interested in SylvaDB? Check out Javier de la Rosa's talk tomorrow at 11:00 in Colonel By E015

Interested in the Preliminaries Project? Check out my talk on Wednesday at 1:15 in Colonel By C03


  • Preliminaries Projections required a wide variety of schema transformations and projections.

  • A tedious task to be sure.

  • Enter projx - a graph transformation library written in Python with a Cypher based DSL

subgraph = projection.execute("""
    MATCH   (p1:Person)-(wild)-(p2:Person)
    PROJECT (p1)-(p2)
    METHOD NEWMAN Institution, City
    SET     label = wild.label
    DELETE  wild


Tinkerpop/Gremlin Ecosystem

  • A standard API for graph databases

  • Gremlin traversal language

  • Tinkerpop enabled backends:

    • Titan
    • Neo4j
    • Gremlin-Elastic
    • Hadoop (Spark/Giraph)
  • All accessed using the Gremlin Server

In [9]:
def stream(gc):
    results = []
    resp = yield from gc.submit("x + x", bindings={"x": 1})
    while True:
        result = yield from
        if result is None:
    return results
loop = asyncio.get_event_loop()
gc = aiogremlin.GremlinClient()
results = loop.run_until_complete(stream(gc))

In [10]:

[Message(status_code=200, data=[2], message={}, metadata='')]

In [11]:
loop.run_until_complete(gc.close())  # Explicitly close client!!!


In [12]:

graph = TinkerFactory.createModern()
g = graph.traversal(standard())

['vadas', 'josh']

10 Million Spanish Books

Identify statistical regularities in the production and diffusion of Spanish literature.

  • 13,188,245 records from OCLC

  • Neo4j/Tinkerpop/Apache Spark

Interested? Check out my talk on Wednesday at 11:00 in Louis-Pasteur 155

Conclusion: Graphs are cool!