Graph Databases and the Humanities



In [2]:

%matplotlib inline
import asyncio
import aiogremlin
import networkx as nx






What's a graph?

A binary mathematical structure consisting of nodes and edges:

$g = \begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}$



In [8]:

g = nx.scale_free_graph(10)
nx.draw_networkx(g)






Graphs are everywhere these days!

But wait...these graphs are more than ones and zeros...

Why graphs?

Graphs are very good at representing complex interrelations between entities...

Ahn, Y. Y., Ahnert, S. E., Bagrow, J. P., & Barabási, A. L. (2011). Flavor network and the principles of food pairing. Scientific reports, 1.

Schich, M., Song, C., Ahn, Y. Y., Mirsky, A., Martino, M., Barabási, A. L., & Helbing, D. (2014). A network framework of cultural history. Science, 345(6196), 558-562.

The CulturePlex Lab: Our research

• The production and diffusion of cultural objects.

Why GraphDBs?

• Relational databases:

• Inflexible
• Lacking in semantic richness

Neo4j

• Landscapes of Castas Painting - Masters Thesis/DH2014
• Preliminaries Project - DH2013/Congress 2015

Interested in SylvaDB? Check out Javier de la Rosa's talk tomorrow at 11:00 in Colonel By E015

Interested in the Preliminaries Project? Check out my talk on Wednesday at 1:15 in Colonel By C03

projx

• Preliminaries Projections required a wide variety of schema transformations and projections.

• A tedious task to be sure.

• Enter projx - a graph transformation library written in Python with a Cypher based DSL

subgraph = projection.execute("""
MATCH   (p1:Person)-(wild)-(p2:Person)
PROJECT (p1)-(p2)
METHOD NEWMAN Institution, City
SET     label = wild.label
DELETE  wild
""")


aiogremlin

Tinkerpop/Gremlin Ecosystem

• A standard API for graph databases

• Gremlin traversal language

• Tinkerpop enabled backends:

• Titan
• Neo4j
• Gremlin-Elastic
• All accessed using the Gremlin Server



In [9]:

@asyncio.coroutine
def stream(gc):
results = []
resp = yield from gc.submit("x + x", bindings={"x": 1})
while True:
if result is None:
break
results.append(result)
return results
loop = asyncio.get_event_loop()
gc = aiogremlin.GremlinClient()
results = loop.run_until_complete(stream(gc))




In [10]:

results




Out[10]:




In [11]:

loop.run_until_complete(gc.close())  # Explicitly close client!!!



ipython-gremlin



In [12]:

%%gremlin

graph = TinkerFactory.createModern()
g = graph.traversal(standard())
g.V().has('name','marko').out('knows').values('name')




Out[12]:



10 Million Spanish Books

Identify statistical regularities in the production and diffusion of Spanish literature.

• 13,188,245 records from OCLC

• Neo4j/Tinkerpop/Apache Spark

Interested? Check out my talk on Wednesday at 11:00 in Louis-Pasteur 155