Example notebook illustrating W3C prov --> neo4j graph generation based on the provio library

First add the library to your path

  • modify the path to your installation directory

In [1]:
%load_ext autoreload
%autoreload 2
import os, sys
# import_path = os.path.abspath('..')   not necessary ?
install_path = '/home/stephan/Repos/ENES-EUDAT/enes_graph_use_case'
sys.path.append(install_path)
from neo4j_prov import provio

Load W3C prov json file, visualize it and generate the neo4j graph (relations)


In [2]:
#rov_doc_from_json = provio.get_provdoc('json',install_path+"/neo4j_prov/examples/wps-prov.json")
prov_doc_from_json = provio.get_provdoc('json','/home/stephan/Repos/ENES-EUDAT/submission_forms/test/ingest_prov_1.json')

rels = provio.gen_graph_model(prov_doc_from_json)

In [3]:
print prov_doc_from_json.get_records()
print rels


[<ProvAssociation: (subm:review, data_manager:peter_lenzen_id)>, <ProvAssociation: (subm:submit, data_provider:test_user_id)>, <ProvAssociation: (subm:check, data_manager:hdh_id)>, <ProvAssociation: (subm:archive, data_manager:lta_id)>, <ProvAssociation: (subm:ingest, data_manager:peter_lenzen_id)>, <ProvAssociation: (subm:publish, data_manager:katharina_b_id)>, <ProvDerivation: (subm:out1_sub, subm:empty)>, <ProvDerivation: (subm:out1_rev, subm:out1_sub)>, <ProvDerivation: (subm:out1_pub, subm:out1_che)>, <ProvDerivation: (subm:out1_che, subm:out1_ing)>, <ProvDerivation: (subm:out1_arch, subm:out1_pub)>, <ProvDerivation: (subm:out1_ing, subm:out1_rev)>, <ProvUsage: (subm:review, subm:out1_sub)>, <ProvUsage: (subm:submit, subm:empty)>, <ProvUsage: (subm:check, subm:out1_ing)>, <ProvUsage: (subm:archive, subm:out1_pub)>, <ProvUsage: (subm:ingest, subm:out1_rev)>, <ProvUsage: (subm:publish, subm:out1_che)>, <ProvAgent: data_provider:test_user_id>, <ProvAgent: data_manager:katharina_b_id>, <ProvAgent: data_manager:peter_lenzen_id>, <ProvAgent: data_manager:peter_lenzen_id>, <ProvAgent: data_manager:lta_id>, <ProvAgent: data_manager:hdh_id>, <ProvEntity: subm:out1_arch>, <ProvEntity: subm:out1_sub>, <ProvEntity: subm:out1_che>, <ProvEntity: subm:out1_rev>, <ProvEntity: qua:empty>, <ProvEntity: subm:out1_pub>, <ProvEntity: subm:out1_ing>, <ProvActivity: subm:review>, <ProvActivity: subm:publish>, <ProvActivity: subm:submit>, <ProvActivity: subm:check>, <ProvActivity: subm:ingest>, <ProvActivity: subm:archive>, <ProvGeneration: (subm:out1_rev, subm:review)>, <ProvGeneration: (subm:out1_sub, subm:submit)>, <ProvGeneration: (subm:out1_ing, subm:ingest)>, <ProvGeneration: (subm:out1_arch, subm:archive)>, <ProvGeneration: (subm:out1_pub, subm:publish)>, <ProvGeneration: (subm:out1_che, subm:check)>]
[(e609024)-[:`prov:Association`]->(e6667ce), (bd90808)-[:`prov:Association`]->(dd58fc8), (c03e000)-[:`prov:Association`]->(e2237f2), (f499475)-[:`prov:Association`]->(dc3ec73), (b9bcab4)-[:`prov:Association`]->(e6667ce), (e3dbff4)-[:`prov:Association`]->(ddea5ea), (fac507e)-[:`prov:Derivation`]->(abe9178), (bd18db7)-[:`prov:Derivation`]->(fac507e), (c4bc618)-[:`prov:Derivation`]->(b61f386), (b61f386)-[:`prov:Derivation`]->(e4aab21), (a754624)-[:`prov:Derivation`]->(c4bc618), (e4aab21)-[:`prov:Derivation`]->(bd18db7), (e609024)-[:`prov:Usage`]->(fac507e), (bd90808)-[:`prov:Usage`]->(abe9178), (c03e000)-[:`prov:Usage`]->(e4aab21), (f499475)-[:`prov:Usage`]->(c4bc618), (b9bcab4)-[:`prov:Usage`]->(bd18db7), (e3dbff4)-[:`prov:Usage`]->(b61f386), (bd18db7)-[:`prov:Generation`]->(e609024), (fac507e)-[:`prov:Generation`]->(bd90808), (e4aab21)-[:`prov:Generation`]->(b9bcab4), (a754624)-[:`prov:Generation`]->(f499475), (c4bc618)-[:`prov:Generation`]->(e3dbff4), (b61f386)-[:`prov:Generation`]->(c03e000)]

In [4]:
provio.visualize_prov(prov_doc_from_json)


Out[4]:

Alternatively load W3C prov xml file and generate neo4j graph (relations)


In [ ]:
prov_doc_from_xml = provio.get_provdoc('xml',install_path+"/neo4j_prov/examples/wps-prov.xml")
rels = provio.gen_graph_model(prov_doc_from_xml)

In [ ]:
print prov_doc_from_xml.get_records()
print rels

Connect to a neo4j graph endpoint and generate new graph

  • Attention: previous graph(s) are deleted

In [5]:
from py2neo import Graph, Node, Relationship, authenticate
authenticate("localhost:7474", "neo4j", "prolog16")

# connect to authenticated graph database
graph = Graph("http://localhost:7474/db/data/")
graph.delete_all()

In [6]:
for rel in rels:
    graph.create(rel)

query the newly generated graph and display result


In [7]:
%load_ext cypher
%matplotlib inline


/home/stephan/.conda/envs/generic1/lib/python2.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated. You should import from traitlets.config instead.
  "You should import from traitlets.config instead.", ShimWarning)
/home/stephan/.conda/envs/generic1/lib/python2.7/site-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
  warn("IPython.utils.traitlets has moved to a top-level traitlets package.")

In [8]:
results = %cypher http://neo4j:prolog16@localhost:7474/db/data MATCH (a)-[r]-(b) RETURN a,r, b
results.get_graph()
results.draw()


48 rows affected.
Out[8]:
(<networkx.classes.multidigraph.MultiDiGraph at 0x7f467e868690>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7f467e72d050>,
 <matplotlib.collections.PathCollection at 0x7f467c278dd0>)

To help visualization of large graphs the javascript library from Almende B.V. is helpful (git clone git://github.com/almende/vis.git) Therefore a javascript visualization generation is provided by the vis script (which I adapted from https://github.com/nicolewhite/neo4j-jupyter/tree/master/scripts)


In [9]:
from neo4j_prov.vis import draw

In [10]:
options = {"16":"label"}
result_iframe = draw(graph,options)


Javascript visualization generated:  /home/stephan/Repos/ENES-EUDAT/enes_graph_use_case/neo4j_prov/notebooks/graph-730f0d56-7857-495f-ab2b-3bd8eac1f275.html

In [ ]: