PROV-O Diagram Rendering Example

This example takes a PROV-O activity graph and uses the PROV Python library, which is an implementation of the Provenance Data Model by the World Wide Web Consortium, to create a graphical representations like PNG, SVG, PDF.

Prerequisites

  • python libraries - prov[dot]
  • jupyter
  • graphviz

In [4]:
#if you need to install dependencies, do so in this cell
!pip install pydot prov


Requirement already satisfied: pydot in /opt/conda/envs/py36/lib/python3.6/site-packages (1.2.4)
Collecting prov
  Downloading https://files.pythonhosted.org/packages/5c/1e/ac3989756b8a0262de881f7378783e53b693409273ac5725eab59c028f55/prov-1.5.2-py2.py3-none-any.whl (423kB)
    100% |████████████████████████████████| 430kB 11.7MB/s 
Requirement already satisfied: pyparsing>=2.1.4 in /opt/conda/envs/py36/lib/python3.6/site-packages (from pydot) (2.2.0)
Requirement already satisfied: lxml>=3.3.5 in /opt/conda/envs/py36/lib/python3.6/site-packages (from prov) (4.2.4)
Collecting rdflib>=4.2.1 (from prov)
  Downloading https://files.pythonhosted.org/packages/3c/fe/630bacb652680f6d481b9febbb3e2c3869194a1a5fc3401a4a41195a2f8f/rdflib-4.2.2-py3-none-any.whl (344kB)
    100% |████████████████████████████████| 348kB 18.3MB/s 
Requirement already satisfied: python-dateutil>=2.2 in /opt/conda/envs/py36/lib/python3.6/site-packages (from prov) (2.7.3)
Requirement already satisfied: six>=1.9.0 in /opt/conda/envs/py36/lib/python3.6/site-packages (from prov) (1.11.0)
Requirement already satisfied: networkx>=2.0 in /opt/conda/envs/py36/lib/python3.6/site-packages (from prov) (2.1)
Collecting isodate (from rdflib>=4.2.1->prov)
  Downloading https://files.pythonhosted.org/packages/9b/9f/b36f7774ff5ea8e428fdcfc4bb332c39ee5b9362ddd3d40d9516a55221b2/isodate-0.6.0-py2.py3-none-any.whl (45kB)
    100% |████████████████████████████████| 51kB 14.2MB/s 
Requirement already satisfied: decorator>=4.1.0 in /opt/conda/envs/py36/lib/python3.6/site-packages (from networkx>=2.0->prov) (4.3.0)
twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
Installing collected packages: isodate, rdflib, prov
Successfully installed isodate-0.6.0 prov-1.5.2 rdflib-4.2.2
You are using pip version 10.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

In [2]:
!conda install -y python-graphviz


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/py36

  added / updated specs: 
    - python-graphviz


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    graphviz-2.40.1            |       h21bd128_2         6.9 MB  defaults
    python-graphviz-0.8.4      |           py36_1          27 KB  defaults
    pango-1.42.4               |       h049681c_0         528 KB  defaults
    graphite2-1.3.12           |       h23475e2_2         106 KB  defaults
    fribidi-1.0.5              |       h7b6447c_0         112 KB  defaults
    harfbuzz-1.8.8             |       hffaf4a1_0         863 KB  defaults
    ------------------------------------------------------------
                                           Total:         8.5 MB

The following NEW packages will be INSTALLED:

    fribidi:         1.0.5-h7b6447c_0  defaults
    graphite2:       1.3.12-h23475e2_2 defaults
    graphviz:        2.40.1-h21bd128_2 defaults
    harfbuzz:        1.8.8-hffaf4a1_0  defaults
    pango:           1.42.4-h049681c_0 defaults
    python-graphviz: 0.8.4-py36_1      defaults


Downloading and Extracting Packages
graphviz-2.40.1      | 6.9 MB    | ##################################### | 100% 
python-graphviz-0.8. | 27 KB     | ##################################### | 100% 
pango-1.42.4         | 528 KB    | ##################################### | 100% 
graphite2-1.3.12     | 106 KB    | ##################################### | 100% 
fribidi-1.0.5        | 112 KB    | ##################################### | 100% 
harfbuzz-1.8.8       | 863 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Read a simple provenance document

To create a provenance document (a package of provenance statements or assertions), import ProvDocument class from prov.model:


In [5]:
from prov.model import ProvDocument
import prov.model as pm

Create some setup variables filename and basename which will be used for the encoding of the outputs


In [6]:
filename = "https://raw.githubusercontent.com/oznome/jupyter-examples/master/prov/rdf/prov-ex1.ttl"
basename = "prov-ex1"

In [7]:
import urllib.request
url = filename
data = urllib.request.urlopen(url).read()

Use the prov library to deserialize the example document


In [8]:
# Create a new provenance document
d1 = pm.ProvDocument.deserialize(content=data, format="rdf")

Graphics export (PNG and PDF)

In addition to the PROV-N output (as above), the document can be exported into a graphical representation with the help of the GraphViz. It is provided as a software package in popular Linux distributions, or can be downloaded for Windows and Mac.

Once you have GraphViz installed and the dot command available in your operating system's paths, you can save the document we have so far into a PNG file as follows.


In [9]:
basename


Out[9]:
'prov-ex1'

In [10]:
from prov.dot import prov_to_dot
d = prov_to_dot(d1)

In [11]:
d.write_png(basename+'.png')

The above saves the PNG file as article-prov.png in your current folder. If you're runing this tutorial in Jupyter Notebook, you can see it here as well.


In [12]:
from IPython.display import Image
Image(filename=basename+'.png')


Out[12]:

In [13]:
# Or save to a PDF
d.write_pdf(basename + '.pdf')

Similarly, the above saves the document into a PDF file in your current working folder. Graphviz supports a wide ranges of raster and vector outputs, to which you can export your provenance documents created by the library. To find out what formats are available from your version, run dot -T? at the command line.

PROV-JSON export

PROV-JSON is a JSON representation for PROV that was designed for the ease of accessing various PROV elements in a PROV document and to work well with web applications. The format is natively supported by the library and is its default serialisation format.


In [14]:
print(d1.serialize(indent=2))


{
  "prefix": {
    "xml": "http://www.w3.org/XML/1998/namespace",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "": "http://example.org#",
    "ns1": "mailto:derek@"
  },
  "wasDerivedFrom": {
    "_:id1": {
      "prov:generatedEntity": "bar_chart",
      "prov:usedEntity": "aggregatedByRegions"
    }
  },
  "used": {
    "_:id2": {
      "prov:activity": "illustrationActivity",
      "prov:entity": "aggregatedByRegions"
    },
    "_:id11": {
      "prov:activity": "aggregationActivity",
      "prov:entity": "crimeData"
    },
    "_:id13": {
      "prov:activity": "aggregationActivity",
      "prov:entity": "nationalRegionsList"
    }
  },
  "wasGeneratedBy": {
    "_:id3": {
      "prov:entity": "bar_chart",
      "prov:activity": "illustrationActivity"
    },
    "_:id10": {
      "prov:entity": "aggregatedByRegions",
      "prov:activity": "aggregationActivity"
    }
  },
  "wasInformedBy": {
    "_:id4": {
      "prov:informed": "illustrationActivity",
      "prov:informant": "aggregationActivity"
    }
  },
  "wasAssociatedWith": {
    "_:id5": {
      "prov:activity": "aggregationActivity",
      "prov:agent": "derek"
    },
    "_:id14": {
      "prov:activity": "illustrationActivity",
      "prov:agent": "derek"
    }
  },
  "actedOnBehalfOf": {
    "_:id6": {
      "prov:delegate": "derek",
      "prov:responsible": "natonal_newspaper_inc"
    }
  },
  "wasAttributedTo": {
    "_:id7": {
      "prov:entity": "crimeData",
      "prov:agent": "government"
    },
    "_:id8": {
      "prov:entity": "aggregatedByRegions",
      "prov:agent": "derek"
    },
    "_:id9": {
      "prov:entity": "bar_chart",
      "prov:agent": "derek"
    },
    "_:id12": {
      "prov:entity": "nationalRegionsList",
      "prov:agent": "civil_action_group"
    }
  },
  "entity": {
    "crimeData": {},
    "nationalRegionsList": {},
    "aggregatedByRegions": {},
    "bar_chart": {}
  },
  "activity": {
    "aggregationActivity": {
      "prov:endTime": "2011-07-14T02:02:02+00:00",
      "prov:startTime": "2011-07-14T01:01:01+00:00"
    },
    "illustrationActivity": {}
  },
  "agent": {
    "national_newspaper_inc": {
      "prov:type": {
        "$": "foaf:Organization",
        "type": "prov:QUALIFIED_NAME"
      },
      "foaf:name": "National Newspaper, Inc."
    },
    "derek": {
      "prov:type": {
        "$": "foaf:Person",
        "type": "prov:QUALIFIED_NAME"
      },
      "foaf:givenName": "Derek",
      "foaf:mbox": {
        "$": "ns1:example.org",
        "type": "prov:QUALIFIED_NAME"
      }
    },
    "civil_action_group": {
      "prov:type": {
        "$": "foaf:Organization",
        "type": "prov:QUALIFIED_NAME"
      }
    },
    "government": {
      "prov:type": {
        "$": "foaf:Organization",
        "type": "prov:QUALIFIED_NAME"
      }
    }
  }
}

You can also serialize the document directly to a file by providing a filename (below) or a Python File object.


In [15]:
d1.serialize(basename + '.json')

In [ ]: