Go down for licence and other metadata about this presentation




Unless stated otherwise all content is released under a [CC0]+BY licence. I'd appreciate it if you reference this but it is not necessary.


Using Ipython for presentations

A short video showing how to use Ipython for presentations

In [2]:
from IPython.display import YouTubeVideo


In [34]:
## PDF output using pandoc

import os

### Export this notebook as markdown
commandLineSyntax = 'ipython nbconvert --to markdown 20160202_Nottingham_GIServices_Lecture3_Beck_InteroperabilitySemanticsAndOpenData.ipynb'
print (commandLineSyntax)


### Export this notebook and the document header as PDF using Pandoc

commandLineSyntax = 'pandoc  -f markdown -t latex -N -V geometry:margin=1in DocumentHeader.md 20160202_Nottingham_GIServices_Lecture3_Beck_InteroperabilitySemanticsAndOpenData.md --filter pandoc-citeproc  --latex-engine=xelatex --toc -o interim.pdf '


### Remove cruft from the pdf

commandLineSyntax = 'pdftk interim.pdf cat 1-5 18-end output 20160202_Nottingham_GIServices_Lecture3_Beck_InteroperabilitySemanticsAndOpenData.pdf'


### Remove the interim pdf

commandLineSyntax = 'rm interim.pdf'


ipython nbconvert --to markdown 20160202_Nottingham_GIServices_Lecture3_Beck_InteroperabilitySemanticsAndOpenData.ipynb

The environment

In order to replicate my environment you need to know what I have installed!

Set up watermark

This describes the versions of software used during the creation.

Please note that critical libraries can also be watermarked as follows:

%watermark -v -m -p numpy,scipy

In [3]:
%install_ext https://raw.githubusercontent.com/rasbt/python_reference/master/ipython_magic/watermark.py
%load_ext watermark

Installed watermark.py. To use it, type:
  %load_ext watermark
/home/arb/LocalInstalls/anaconda3/lib/python3.5/site-packages/IPython/core/magics/extension.py:47: UserWarning: %install_ext` is deprecated, please distribute your extension(s)as a python packages.
  "as a python packages.", UserWarning)

In [4]:
%watermark -a "Anthony Beck"  -d -v -m -g

Anthony Beck 24/01/2016 

CPython 3.5.1
IPython 4.0.2

compiler   : GCC 4.4.7 20120313 (Red Hat 4.4.7-1)
system     : Linux
release    : 3.19.0-32-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 4
interpreter: 64bit
Git hash   : 

In [5]:
#List of installed conda packages
!conda list

# packages in environment at /home/arb/LocalInstalls/anaconda3:
_license                  1.1                      py35_1    defaults
abstract-rendering        0.5.1               np110py35_0    defaults
accelerate                2.0                np110py35_p0    defaults
accelerate_cudalib        2.0                           0    defaults
affine                    1.2.0                    py35_0    https://conda.binstar.org/ioos/linux-64/affine-1.2.0-py35_0.tar.bz2
alabaster                 0.7.6                    py35_0    defaults
anaconda                  2.4.1               np110py35_0    defaults
anaconda-client           1.2.1                    py35_0    defaults
argcomplete               1.0.0                    py35_1    defaults
astropy                   1.1.1               np110py35_0    defaults
babel                     2.1.1                    py35_0    defaults
basemap                   1.0.7               np110py35_0    defaults
beautifulsoup4            4.4.1                    py35_0    defaults
bitarray                  0.8.1                    py35_0    defaults
blaze                     0.9.0                     <pip>
blaze-core                0.9.0                    py35_0    defaults
bokeh                     0.11.0                   py35_1    defaults
boto                      2.38.0                   py35_0    defaults
bottleneck                1.0.0               np110py35_0    defaults
cffi                      1.2.1                    py35_0    defaults
click                     4.1                      py35_0    defaults
click-plugins             1.0.2                    py35_0    https://conda.binstar.org/ioos/linux-64/click-plugins-1.0.2-py35_0.tar.bz2
cligj                     0.2.0                    py35_0    defaults
clyent                    1.2.0                    py35_0    defaults
colorama                  0.3.3                    py35_0    defaults
colorlover                0.2.1                     <pip>
conda                     3.19.0                   py35_0    defaults
conda-build               1.18.2                   py35_0    defaults
conda-env                 2.4.5                    py35_0    defaults
configobj                 5.0.6                    py35_0    defaults
cryptography              1.0.2                    py35_0    defaults
cudatoolkit               7.0                           1    defaults
cufflinks                 0.7.1                     <pip>
curl                      7.43.0                        1    defaults
cycler                    0.9.0                    py35_0    defaults
cython                    0.23.4                   py35_0    defaults
cytoolz                   0.7.4                    py35_0    defaults
datashape                 0.5.0                    py35_0    defaults
decorator                 4.0.6                    py35_0    defaults
descartes                 1.0.1                    py35_0    https://conda.binstar.org/ioos/linux-64/descartes-1.0.1-py35_0.tar.bz2
docutils                  0.12                     py35_0    defaults
dynd                      9b63882                   <pip>
dynd-python               0.7.0                    py35_0    defaults
et-xmlfile                1.0.1                     <pip>
et_xmlfile                1.0.1                    py35_0    defaults
fastcache                 1.0.2                    py35_0    defaults
fiona                     1.6.0               np110py35_0    defaults
flask                     0.10.1                   py35_1    defaults
fontconfig                2.11.1                        5    defaults
freetype                  2.5.5                         0    defaults
funcsigs                  0.4                      py35_0    defaults
gdal                      2.0.0                    py35_1    defaults
geos                      3.3.3                         0    defaults
greenlet                  0.4.9                    py35_0    defaults
h5py                      2.5.0               np110py35_4    defaults
hdf4                      4.2.11                        0    defaults
hdf5                                  2    defaults
html5lib                  0.9999999                 <pip>
idna                      2.0                      py35_0    defaults
iopro                     1.7.2              np110py35_p0    defaults
ipykernel                 4.2.2                    py35_0    defaults
ipython                   4.0.2                    py35_0    defaults
ipython-genutils          0.1.0                     <pip>
ipython-notebook          4.0.4                    py35_0    defaults
ipython-qtconsole         4.0.1                    py35_0    defaults
ipython_genutils          0.1.0                    py35_0    defaults
ipywidgets                4.1.1                    py35_0    defaults
itsdangerous              0.24                     py35_0    defaults
jbig                      2.1                           0    defaults
jdcal                     1.2                      py35_0    defaults
jedi                      0.9.0                    py35_0    defaults
jinja2                    2.8                      py35_0    defaults
jpeg                      8d                            0    defaults
jsonschema                2.4.0                    py35_0    defaults
jupyter                   1.0.0                    py35_1    defaults
jupyter-client            4.1.1                     <pip>
jupyter-console           4.1.0                     <pip>
jupyter-core              4.0.6                     <pip>
jupyter_client            4.1.1                    py35_0    defaults
jupyter_console           4.1.0                    py35_0    defaults
jupyter_core              4.0.6                    py35_0    defaults
kealib                    1.4.5                         0    defaults
krb5                      1.13.2                        0    defaults
libdynd                   0.7.0                         0    defaults
libffi                    3.0.13                        0    defaults
libgdal                   2.0.0                         1    defaults
libgfortran               1.0                           0    defaults
libnetcdf                              1    defaults
libpng                    1.6.17                        0    defaults
libsodium                 1.0.3                         0    defaults
libtiff                   4.0.6                         1    defaults
libxml2                   2.9.2                         0    defaults
libxslt                   1.1.28                        0    defaults
llvmlite                  0.8.0                    py35_0    defaults
lxml                      3.5.0                    py35_0    defaults
markupsafe                0.23                     py35_0    defaults
matplotlib                1.5.1               np110py35_0    defaults
mistune                   0.7.1                    py35_0    defaults
mkl                       11.1               np110py35_p1    defaults
mkl-rt                    11.1                         p0    defaults
mkl-service               1.1.0                   py35_p0    defaults
mock                      1.3.0                    py35_0    defaults
multipledispatch          0.4.8                    py35_0    defaults
nbconvert                 4.1.0                    py35_0    defaults
nbformat                  4.0.1                    py35_0    defaults
networkx                  1.10                     py35_0    defaults
nltk                      3.1                      py35_0    defaults
nose                      1.3.7                    py35_0    defaults
notebook                  4.1.0                    py35_0    defaults
numba                     0.22.1              np110py35_0    defaults
numbapro                  0.22.1                  py35_p0    defaults
numexpr                   2.4.4              np110py35_p0  [mkl]  defaults
numpy                     1.10.2                  py35_p0  [mkl]  defaults
odo                       0.4.0                    py35_0    defaults
openblas                  0.2.14                        3    defaults
openjpeg                  2.1.0                         0    https://conda.binstar.org/ioos/linux-64/openjpeg-2.1.0-0.tar.bz2
openpyxl                  2.3.2                    py35_0    defaults
openssl                   1.0.2e                        0    defaults
owslib                    0.10.3                   py35_0    https://conda.binstar.org/ioos/linux-64/owslib-0.10.3-py35_0.tar.bz2
pandas                    0.17.1              np110py35_0    defaults
patchelf                  0.8                           0    defaults
path.py                   8.1.2                    py35_1    defaults
patsy                     0.4.0               np110py35_0    defaults
pbr                       1.3.0                    py35_0    defaults
pep8                      1.6.2                    py35_0    defaults
pexpect                   3.3                      py35_0    defaults
pickleshare               0.5                      py35_0    defaults
pillow                    3.1.0                    py35_0    defaults
pip                       7.1.2                    py35_0    defaults
plotly                    1.9.5                     <pip>
ply                       3.8                      py35_0    defaults
postgresql                9.1.4                         0    defaults
proj.4                    4.9.1                         1    https://conda.binstar.org/ioos/linux-64/proj.4-4.9.1-1.tar.bz2
proj4                     4.9.1                         1    SciTools
psutil                    3.3.0                    py35_0    defaults
ptyprocess                0.5                      py35_0    defaults
py                        1.4.30                   py35_0    defaults
pyasn1                    0.1.9                    py35_0    defaults
pycosat                   0.6.1                    py35_0    defaults
pycparser                 2.14                     py35_0    defaults
pycrypto                  2.6.1                    py35_0    defaults
pycurl                           py35_2    defaults
pyepsg                    0.2.0                    py35_0    SciTools
pyflakes                  1.0.0                    py35_0    defaults
pygments                  2.0.2                    py35_0    defaults
pyodbc                    3.0.10                   py35_0    defaults
pyopenssl                 0.15.1                   py35_1    defaults
pyparsing                 2.0.3                    py35_0    defaults
pyproj                    1.9.4                    py35_1    defaults
pyqt                      4.11.4                   py35_1    defaults
pyshp                     1.2.3                    py35_0    https://conda.binstar.org/ioos/linux-64/pyshp-1.2.3-py35_0.tar.bz2
pytables                  3.2.2               np110py35_0    defaults
pytest                    2.8.1                    py35_0    defaults
python                    3.5.1                         0    defaults
python-dateutil           2.4.2                    py35_0    defaults
pytz                      2015.7                   py35_0    defaults
pyyaml                    3.11                     py35_1    defaults
pyzmq                     15.2.0                   py35_0    defaults
qt                        4.8.7                         1    defaults
qtconsole                 4.1.1                    py35_0    defaults
rasterio                  0.25.0              np110py35_0    defaults
readline                  6.2                           2    defaults
redis                     2.6.9                         0    defaults
redis-py                  2.10.3                   py35_0    defaults
requests                  2.9.1                    py35_0    defaults
rope                      0.9.4                    py35_1    defaults
rope-py3k-0.9.4           1                         <pip>
scikit-image              0.11.3              np110py35_0    defaults
scikit-learn              0.17               np110py35_p1  [mkl]  defaults
scipy                     0.16.1             np110py35_p0  [mkl]  defaults
seaborn                   0.6.0               np110py35_0    defaults
setuptools                19.2                     py35_0    defaults
shapely                   1.5.11                   py35_0    defaults
simplegeneric             0.8.1                    py35_0    defaults
simplejson                3.8.1                     <pip>
sip                       4.16.9                   py35_0    defaults
six                       1.10.0                   py35_0    defaults
snowballstemmer           1.2.0                    py35_0    defaults
snuggs                    1.3.1               np110py35_0    defaults
sockjs-tornado            1.0.1                    py35_0    defaults
sphinx                    1.3.1                    py35_0    defaults
sphinx-rtd-theme          0.1.7                     <pip>
sphinx_rtd_theme          0.1.7                    py35_0    defaults
spyder                    2.3.8                    py35_0    defaults
spyder-app                2.3.8                    py35_0    defaults
sqlalchemy                1.0.11                   py35_0    defaults
sqlite                    3.9.2                         0    defaults
statsmodels               0.6.1               np110py35_0    defaults
sympy                             py35_0    defaults
tables                    3.2.2                     <pip>
terminado                 0.5                      py35_1    defaults
theano                    0.7.0               np110py35_0    defaults
tk                        8.5.18                        0    defaults
toolz                     0.7.4                    py35_0    defaults
tornado                   4.3                      py35_0    defaults
traitlets                 4.1.0                    py35_0    defaults
ujson                     1.33                     py35_0    defaults
unicodecsv                0.14.1                   py35_0    defaults
unixodbc                  2.3.1                         1    defaults
util-linux                2.21                          0    defaults
werkzeug                  0.11.3                   py35_0    defaults
wheel                     0.26.0                   py35_1    defaults
xerces-c                  3.1.2                         0    defaults
xlrd                      0.9.4                    py35_0    defaults
xlsxwriter                0.8.2                    py35_0    defaults
xlwt                      1.0.0                    py35_0    defaults
xz                        5.0.5                         0    defaults
yaml                      0.1.6                         0    defaults
zeromq                    4.1.3                         0    defaults
zlib                      1.2.8                         0    defaults

In [8]:
#List of installed pip packages
!pip list

abstract-rendering (0.5.1)
accelerate (2.0.0)
affine (1.2.0)
alabaster (0.7.6)
anaconda-client (1.2.1)
argcomplete (1.0.0)
astropy (1.1.1)
Babel (2.1.1)
basemap (1.0.7)
beautifulsoup4 (4.4.1)
bitarray (0.8.1)
blaze (0.9.0)
bokeh (0.11.0)
boto (2.38.0)
Bottleneck (1.0.0)
cffi (1.2.1)
click (4.1)
click-plugins (1.0.2)
cligj (0.2.0)
clyent (1.2.0)
colorama (0.3.3)
colorlover (0.2.1)
conda (3.19.0)
conda-build (1.18.2)
conda-env (2.4.5)
configobj (5.0.6)
cryptography (0.9.3)
cufflinks (0.7.1)
cycler (0.9.0)
Cython (0.23.4)
cytoolz (0.7.4)
datashape (0.5.0)
decorator (4.0.6)
descartes (1.0.1)
docutils (0.12)
dynd (9b63882)
et-xmlfile (1.0.1)
fastcache (1.0.2)
Fiona (1.6.0)
Flask (0.10.1)
funcsigs (0.4)
GDAL (2.0.0)
greenlet (0.4.9)
h5py (2.5.0)
html5lib (0.9999999)
idna (2.0)
iopro (1.7.2)
ipykernel (4.2.2)
ipython (4.0.2)
ipython-genutils (0.1.0)
ipywidgets (4.1.1)
itsdangerous (0.24)
jdcal (1.2)
jedi (0.9.0)
Jinja2 (2.8)
jsonschema (2.4.0)
jupyter (1.0.0)
jupyter-client (4.1.1)
jupyter-console (4.1.0)
jupyter-core (4.0.6)
llvmlite (0.8.0)
lxml (3.5.0)
MarkupSafe (0.23)
matplotlib (1.5.1)
mistune (0.7.1)
mock (1.3.0)
multipledispatch (0.4.8)
nbconvert (4.1.0)
nbformat (4.0.1)
networkx (1.10)
nltk (3.1)
nose (1.3.7)
notebook (4.1.0)
numba (0.22.1)
numbapro (0.22.1)
numexpr (2.4.4)
numpy (1.10.2)
odo (0.4.0)
openpyxl (2.3.2)
OWSLib (0.10.3)
pandas (0.17.1)
path.py (0.0.0)
patsy (0.4.0)
pbr (1.3.0)
pep8 (1.6.2)
pexpect (3.3)
pickleshare (0.5)
Pillow (3.1.0)
pip (8.0.2)
plotly (1.9.5)
ply (3.8)
psutil (3.3.0)
ptyprocess (0.5)
py (1.4.30)
pyasn1 (0.1.9)
pycosat (0.6.1)
pycparser (2.14)
pycrypto (2.6.1)
pycurl (
pyepsg (0.2.0)
pyflakes (1.0.0)
Pygments (2.0.2)
pyodbc (3.0.10)
pyOpenSSL (0.15.1)
pyparsing (2.0.3)
pyproj (1.9.4)
pyshp (1.2.3)
pytest (2.8.1)
python-dateutil (2.4.2)
pytz (2015.7)
PyYAML (3.11)
pyzmq (15.2.0)
qtconsole (4.1.1)
rasterio (0.25.0)
redis (2.10.3)
requests (2.9.1)
rope-py3k (0.9.4.post1)
scikit-image (0.11.3)
scikit-learn (0.17)
scipy (0.16.1)
seaborn (0.6.0)
setuptools (19.2)
Shapely (1.5.11)
simplegeneric (0.8.1)
simplejson (3.8.1)
six (1.10.0)
snowballstemmer (1.2.0)
snuggs (1.3.1)
sockjs-tornado (1.0.1)
Sphinx (1.3.1)
sphinx-rtd-theme (0.1.7)
spyder (2.3.8)
SQLAlchemy (1.0.11)
statsmodels (0.6.1)
sympy (
tables (3.2.2)
terminado (0.5)
Theano (0.7.0)
toolz (0.7.4)
tornado (4.3)
traitlets (4.1.0)
ujson (1.33)
unicodecsv (0.14.1)
Werkzeug (0.11.3)
wheel (0.26.0)
xlrd (0.9.4)
XlsxWriter (0.8.2)
xlwt (1.0.0)

Running dynamic presentations

You need to install the RISE Ipython Library from Damián Avila for dynamic presentations

To convert and run this as a static presentation run the following command:

In [ ]:
# Notes don't show in a python3 environment

!ipython nbconvert 20160202_Nottingham_GIServices_Lecture3_Beck_InteroperabilitySemanticsAndOpenData.ipynb --to slides --post serve

To close this instances press control 'c' in the ipython notebook terminal console

Static presentations allow the presenter to see speakers notes (use the 's' key)

If running dynamically run the scripts below

Pre load some useful libraries

In [13]:
#Future proof python 2
from __future__ import print_function #For python3 print syntax
from __future__ import division

# def
import IPython.core.display

# A function to collect user input - ipynb_input(varname='username', prompt='What is your username')

def ipynb_input(varname, prompt=''):
        """Prompt user for input and assign string val to given variable name."""
        js_code = ("""
            var value = prompt("{prompt}","");
            var py_code = "{varname} = '" + value + "'";
        """).format(prompt=prompt, varname=varname)
        return IPython.core.display.Javascript(js_code)
# inline

%pylab inline

Populating the interactive namespace from numpy and matplotlib


About me

  • Honorary Research Fellow, University of Nottingham: orcid
  • Director, Geolytics Limited - A spatial data analytics consultancy

About this presentation


Contribution to GIScience learning outcomes

This presentation contributes to the following learning outcomes for this course.

  1. Knowledge and Understanding:
    • Appreciate the importance of standards for Geographic Information and the role of the Open Geospatial Consortium.
    • Understand the term 'interoperability'.
    • Appreciate the different models for database design.
    • Understand the basis of Linked Data.
    • Find UK government open data and understand some of the complexities in the use of this data.
    • Appreciate the data issues involved in managing large distributed databases, Location-Based Services and the emergence of real-time data gathering through the 'Sensor-Web'.
    • Understand the different models for creating international Spatial Data Infrastructures.
  2. Intellectual Skills:
    • Evaluate the role of standards and professional bodies in GIS.
    • Articulate the meaning and importance of interoperability, semantics and ontologies.
    • Assess the technical and organisational issues which come into play when attempting to design large distributed geographic databases aimed at supporting 'real-world' problems.

A potted history of mapping

In the beginning was the geoword

and the word was cartography


  • Cartography was king.
  • Static representations of spatial knowledge with the cartographer deciding what to represent.
    • Hence, maps are domain specific knowledge repositories for spatial data


And then there was data .........


Restrictive data


Disconnected data with different:

  • Standards
  • Quality
  • Databases
  • Semantics


Why is this an issue?

Over to you......

  • Decision Making
    • certainty
    • uncertainty
  • Co-ordination
  • Policy formation
  • Efficiencies
  • Best Practice


In [1]:
from IPython.display import YouTubeVideo



INSPIRE principles

  • Data should be collected only once and kept where it can be maintained most effectively
  • It should be possible to combine seamless spatial information from different sources across Europe and share it with many users and applications
  • It should be possible for information collected at one level/scale to be shared with all levels/scales; detailed for thorough investigations, general for strategic purposes
  • Geoinformation needed for good governance at all levels should be readily and transparently available
  • Easy to find what geoinformation is available, how it can be used to meet a particular need, and under which conditions it can be acquired and used


Making data interoperable and open



is a property of a product or system, whose interfaces are completely understood, to work with other products or systems, present or future, without any restricted access or implementation.



Technical interoperability - levelling the field


Syntactic Heterogeneity

the difference in data format. The same logical model can be represented in a range of different physical models (for example ESRI shape file or Geography Mark-up Language (GML)).

This mismatch between underlying data models implies that the same information could be represented differently in different organisations.

The most profound difference is in the storage paradigm:

  • relational,
  • object orientated or
  • hybrids.

@beck_uk_2008, @bishr_overcoming_1998


Semantic Heterogeneity

Semantic heterogeneity refers to differences in naming conventions and conceptual groupings.

This can be subdivided into naming and cognitive heterogeneities.

  • Naming (synonym) mismatch arises when semantically identical data items are named differently.
  • Cognitive (homonym) mismatch arises when semantically different data items are named identically.
    • Cognitive semantics can be subtle, reflecting the domain of discourse.

@beck_uk_2008, @bishr_overcoming_1998


Schematic Heterogeneity

refers to the differences in data model between organisations modelling the same concepts.

This reflects each organisation’s abstracted view of their business and physical assets. Hence, different hierarchical and classification concepts are adopted by each organisation to refer to identical or similar real world objects.

@beck_uk_2008, @bishr_overcoming_1998


The role of the OGC (a geospatial standards body)

  • To serve as a global forum for the development, promotion and harmonization of open and freely available geospatial standards
  • To achieve the full societal, economic and scientific benefits of integrating electronic location resources into commercial and institutional processes worldwide.


The role of the OGC (a geospatial standards body)

OGC’s Open Standards are:

  • Freely and publicly available
  • Non discriminatory
  • No license fees
  • Vendor neutral
  • Data neutral
  • Adopted in a formal, member based consensus process

OGC’s Open Standards are submitted to other industry and National Standards Development Organisations in the vertical area and to global organisations like ISO for standard branding.


OGC Technologies


The main OGC standards


Other OGC standards

In [2]:
from IPython.display import IFrame
IFrame('http://www.opengeospatial.org/standards', width=1000, height=700)



Interoperability in action


What did technical interoperability facilitate

From Map to Model The changing paradigm of map creation from cartography to data driven visualization



The world was a happy place.......

Our data was interoperable!


Then ....... along came open data


The Open landscape integrates formal and informal data



Background - originally a grass roots (community) movement..

Open access to knowledge gained significant momementum with the increased uptake of the World Wide Web. This is particularly seen in initiatives like Wikipedia (established in 2001) and Open Knowledge (was the Open Knowledge Foundation: established in 2004). Within the Geo community Open Street Map (also established in 2004) and the Open Source Geospatial Foundation (OSGeo - established in 2006) are key initiatives that promote accessible data and software resources respectively.

Critical to this is that these were grass roots (community) movements that have proven to be highly disruptive to incumbent data providers, practices and policies.


Open in government

The impact of these grass roots movements is seen in Open Data (dot) gov. Pioneered by leaders such as Tim Berners Lee and Nigel Shadbolt

The Shakespeare review [-@shakespeare_shakespeare_2013] indicate that the amount of government Open Data, at least in the UK, is only going to grow. Open data has the potential to trigger a revolution in how governments think about providing services to citizens and how they measure their success: this produces societal impact. This will require an understanding of citizen needs, behaviours, and mental models, and how to use data to improve services.


Valuing Open Data

A McKinsey Global Institute report examines the economic impact of Open Data [@mckinsey_open_2013] and estimates that globally open data could be worth a minimum of $3 trillion annually.


Open in academia

Open inquiry is at the heart of the scientific enterprise..... Science’s powerful capacity for self-correction comes from this openness to scrutiny and challenge.

Science as an open enterprise [@royal_society_science_2012 p. 7].

Science is based on building on, reusing and openly criticising the published body of scientific knowledge.

For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open.

The Panton Principles (@murray-rust_panton_2010) which underpin Open Science.

The Royal Society’s report Science as an open enterprise [-@royal_society_science_2012] identifies how 21^st^ century communication technologies are changing the ways in which scientists conduct, and society engages with, science. The report recognises that ‘open’ enquiry is pivotal for the success of science, both in research and in society.

The Panton Principles pre-cursed this call with a clarion call to the academic community to open their data and start to conduct open science.

This goes beyond open access to publications (Open Access), to include access to data and other research outputs (Open Data), and the process by which data is turned into knowledge (Open Science).

The next generation open data in academia

Zenodo is a DATA REPOSITORY which offers:

  • accreditation
  • different licences
  • different exposure (private (closed), public (open) and embargoed (timestamped))
  • DOIs
  • is free at the point of use
  • is likely to be around for a long time
    • supported by Horizon 2020 and delivered by CERN


The underlying rationale of Open Data is:

  • unfettered access to large amounts of ‘raw’ data
    • enables patterns of re-use and knowledge creation that were previously impossible.
    • improves transparency and efficiency
    • encourages innovative service delivery
  • introduces a range of data-mining and visualisation challenges,
    • which require multi-disciplinary collaboration across domains
    • catalyst to research and industry
  • supports the generation of new products, services and markets
  • the prize for succeeding is improved knowledge-led policy and practice that transforms
    • communities,
    • practitioners,
    • science and
    • society


Free and Open Source Software in in Geo

In [3]:
from IPython.display import IFrame
IFrame('http://www.osgeo.org/', width=1200, height=700)



So...... we have access to lots of data and software

  • Formal and Informal
  • Open and Proprietary

Where are these new data products?

Data, data everywhere - but where are the new derivatives and services?



The Defense domain are a bit more explicit......

As defined by DoD policy, interoperability is the ability of systems, units, or forces to provide data, information, material, and services to, and accept the same from, other systems, units, or forces; and to use the data, information, material, and services so exchanged to enable them to operate effectively together. IT and NSS interoperability includes both the technical exchange of information and the end-to-end operational effectiveness of that exchanged information as required for mission accomplishment. Interoperability is more than just information exchange; it includes systems, processes, procedures, organizations, and missions over the life cycle and must be balanced with information assurance.



Non-technical interoperability issues?




Non-technical interoperability

Issues surrounding non-technical interoperability include:

  • Policy interoperabilty
  • Licence interoperability
  • Legal interoperability
  • Social interoperability

We will focus on licence interoperability


Policy Interoperability

The relationship between:

  • Individuals
  • Organisations
  • Countries

Policy determines what, who and how different content can be accessed.

In addition to other elements the policy statements determine:

  • Authentication
  • Authorization
  • Audit

See @innocenti_towards_2011 for more details


Social (or human) Interoperability

Social interoperability is concerned about the environment and business and human processes.

  • Tools are used by people
  • The social dimension of operational use is underestimated (it's difficult)
  • People form complex inclusive and exclusive networks
    • These operate at many scales

US Department of Defence researchers have advocated the development of Policy, Standards, and Operational Procedures for:

  • forming human networks
  • human to human communications
  • organization to organization communications
  • human system integration
  • information sharing across disparate domains:
    • DoD-Coalition-Interagency-intercommunity


Legal interoperability addresses the process of making legal rules cooperate across jurisdictions, on different subsidiary levels within a single state or between two or more states.

(@weber_legal_2014, p. 6)

The Research Data Alliance state that legal interoperability occurs among multiple datasets when:

  • use conditions are clearly and readily determinable for each of the datasets,
  • the legal use conditions imposed on each dataset allow creation and use of combined or derivative products, and
  • users may legally access and use each dataset without seeking authorization from data rights holders on a case-by-case basis, assuming that the accumulated conditions of use for each and all of the datasets are met.

Legal interoperability also implies that the search for or tracking of licenses or other legal instruments and their compatibility with other legal conditions will occur in online environments.


Licence Interoperability

A specific form of legal interoperability


Example of applying the semantic web to licence interoperability

There is a multitude of formal and informal data.


What is a licence?

Wikipedia state:

A license may be granted by a party ("licensor") to another party ("licensee") as an element of an agreement between those parties.

A shorthand definition of a license is "an authorization (by the licensor) to use the licensed material (by the licensee)."

Each of these data objects can be licenced in a different way. This shows some of the licences described by the RDFLicence ontology


In [ ]:
### Export this notebook as markdown
commandLineSyntax = 'dot -Tpng FCA_ConceptAnalysis.dot > FCA_ConceptAnalysis.png'
commandLineSyntax = 'dot -Tsvg FCA_ConceptAnalysis.dot > FCA_ConceptAnalysis.svg'
print (commandLineSyntax)


Concepts (derived from Formal Concept Analysis) surrounding licences


Two lead organisations have developed legal frameworks for content licensing:

Until the release of CC version 4, published in November 2013, the CC licence did not cover data. Between them, CC and ODC licences can cover all forms of digital work.

  • There are many other licence types
  • Many are bespoke
    • Bespoke licences are difficult to manage
    • Many legacy datasets have bespoke licences

I'll describe CC in more detail


Creative Commons Zero

Creative Commons Zero (CC0) is essentially public domain which allows:

  • Reproduction
  • Distribution
  • Derivations


Constraints on CC0

The following clauses constrain CC0:

  • Permissions
    • ND – No derivatives: the licensee can not derive new content from the resource.
  • Requirements
    • BY – By attribution: the licensee must attribute the source.
    • SA – Share-alike: if the licensee adapts the resource, it must be released under the same licence.
  • Prohibitions
    • NC – Non commercial: the licensee must not use the work commercially without prior approval.

CC license combinations

License Reproduction Distribution Derivation BY SA NC

Table: Creative Commons license combinations


Why are licenses important?

  • They tell you what you can and can't do with 'stuff'
  • Very significant when multiple datasets are combined
    • It then becomes an issue of license compatibility


Which is important when we mash up data

Certain licences when combined:

  • Are incompatible
    • Creating data islands
  • Inhibit commercial exploitation (NC)
  • Force the adoption of certain licences
    • If you want people to commercially exploit your stuff don't incorporate CC-BY-NC-SA data!
  • Stops the derivation of new works



A conceptual licence processing workflow. The licence processing service analyses the incoming licence metadata and determines if the data can be legally integrated and any resulting licence implications for the derived product.


A rudimentry logic example

Data1 hasDerivedContentIn NewThing.

Data1 hasLicence a cc-by-sa.

What hasLicence a cc-by-sa? #reason here

If X hasDerivedContentIn Y and hasLicence Z then Y hasLicence Z. #reason here

Data2 hasDerivedContentIn NewThing.

Data2 hasLicence a cc-by-nc-sa.

What hasLicence a cc-by-nc-sa? #reason here

Nothing hasLicence a cc-by-nc-sa and hasLicence a cc-by-sa. #reason here

And processing this within the Protege reasoning environment

In [ ]:
from IPython.display import YouTubeVideo


Here's something I prepared earlier

A live presentation (for those who weren't at the event).....

In [12]:
from IPython.display import YouTubeVideo



A more robust logic

  • Would need to decouple licence incompatibility from licence name into licence clause (see table below)
  • Deal with all licence type
  • Provide recommendations based on desired derivative licence type
  • Link this through to the type of process in a workflow:
    • data derivation is, from a licence position, very different to contextual display
License Reproduction Distribution Derivation BY SA NC
OGL 2.0 X X X X
OS OpenData X X X X ?

Table: Creative Commons license combinations


OGC and Licence interoperability

  • The geo business landscape is increasingly based on integrating heterogeneous data to develop new products
  • Licence heterogeneity is a barrier to data integration and interoperability
  • A licence calculus can help resolve and identify heterogenties leading to
    • legal compliance
    • confidence
  • Use of standards and collaboration with organisations is crucial
  • Failure to do this could lead to breaches in data licenses
    • and we all know where that puts us........


Linked data and the Semantic Web

The web of Documents

  • a global filesystem
  • Designed for human consumption
  • Primary objects are documents
  • Expresses links between documents (or sub-parts of)
  • Degree of structure in objects is fairly low
  • Semantics of content and links is implicit

The web of Linked Data

  • a global database
  • Designed for machines first, humans later
  • Primary objects are things (or descriptions of things)
  • Expresses links between things
  • Degree of structure in (descriptions of) things is high
  • Semantics of content and links explicit


Linked Data

a way of publishing data on the Web that:

  • encourages reuse
  • reduces redundancy
  • maximises its (real and potential) inter-connectedness
  • enables network effects to add value to data

Why publish Linked Data

  • Ease of discovery
  • Ease of consumption
    • standards-based data sharing
  • Reduced redundancy
  • Added value
    • build ecosystems around your data/content


Linked Data Basics

Four rules for Linked Data from Tim Berners Lee

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  4. Include links to other URIs, so that they can discover more things.


The Resource Description Framework (RDF) data model

RDF stores data as triples in the following manner:

This is a graph model that consists of nodes (subject and object)) and edges (predicate).


Data expressed as RDF


Data expressed as RDF Linked Data


RDF notation

RDF can be represented in different ways - each of which are interoperable. For example:

  • RDF/XML,
  • Notation-3 (N3),
  • Turtle (.ttl),
  • N-Triples,
  • RDFa,

Each represent subject, predicate, object triples in different ways


One step beyond.... Linked Open Data

Is your Linked Open Data 5 star

★   Available on the web (whatever format) but with an open licence, to be Open Data
★★  Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★    All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★   All the above, plus: Link your data to other people’s data to provide context


The Supporting Semantic Web Stack


It's about re-use


The glue that joins concepts together.

A concept shared is a link gained. By re-using concepts it makes it easier to understand what your data means and where and how it should be re-used.

In [14]:
from IPython.display import IFrame
IFrame('http://lov.okfn.org/dataset/lov/', width=1000, height=700)



It's about re-use


An ontology is a shared formal explicit specialisation of a conceptualisation



  • The term originated from a philosophy
    • which deals with the nature and organization of reality
  • It tries to answer the questions:
    • What is being?
    • What are the features common to all beings?
    • How should things be classified?


An ontology is a shared formal explicit specialisation of a conceptualisation

After Agarwal -(@agarwal_ontological_2005):

  • conceptualisation is identifying relevant abstracted concepts of a phenomena suited to a specific domain
  • explicit means that the concepts are explicitly defined
  • formal refers to the fact that the ontology should be machine-readable
  • shared refers to notion that on ontology captures consensual knowledge


Ontology Example

  • A ‘Carnivore’ is a concept whose members are exactly those animals who eat only meat
  • A ‘Bear’ is a concept whose members are a kind of ‘Carnivore’
  • A ‘Cub’ is a concept whose members are exactly those ‘Bear’ whose age is less than one year
  • A Panda is a individual of a ‘Bear’

We can use these concepts to infer new information from facts.

For example: from the fact 'Ching Ching' is a newborn Panda we know:

'Ching Ching' is a Panda.
'Ching Ching' is a newborn.

We can infer:

'Ching Ching' is a Bear.
'Ching Ching' is a Carnivore.  ????
'Ching Ching' eats only meat.  ????

If we had other logic that told us that 'newborn' is the same as saying less than one year then we can also infer

'Ching Ching' is a Cub.

In an ontology/RDF you can say Anything about Anything. Whilst carnivore is a generally useful concept about bears it is not specifically useful when considering pandas. The domain of application is clearly important.


SPARQL the SQL of the semantic web

Find me the capital of all countries in Africa:

PREFIX abc: <nul://sparql/exampleOntology#> .
SELECT ?capital ?country
  ?x abc:cityname ?capital ;
     abc:isCapitalOf ?y.
  ?y abc:countryname ?country ;
     abc:isInContinent abc:Africa.

There is a thing ('x') against which the following concepts exist:

  • 'abc:cityname' (the name of a city: stored in the variable 'capital')
  • 'abc:isCapitalOf' (the concept for which the city is capital: stored in the variable 'y')

The 'concept for which the city is capital' (stored in variable 'y') must also have the following concepts:

  • 'abc:countryname' (the name of a country: stored in the variable 'country')
  • 'abc:isInContinent' abc:Africa (isInContinent of the the individual Africa')


GeoSPARQL the SQL of the spatial semantic web

An OGC standard

WHERE { ?f my:hasPointGeometry ?fGeom .
?fGeom ogc:asWKT ?fWKT .
FILTER (ogcf:relate(?fWKT,
Polygon ((-83.5 34.0, -83.5 34.3, -83.1 34.3,
-83.1 34.0, -83.5 34.0))”^^ogc:WKTLiteral,

In [19]:
from IPython.display import IFrame
IFrame('http://www.opengeospatial.org/projects/groups/geosparqlswg', width=1000, height=700)


Linked Data and Geo


GeoSPARQL employs spatial calculus


Querying Linked Data in the wild

The Ordnance Survey

A URI for every place in the UK

In [22]:
from IPython.display import IFrame
IFrame('http://data.ordnancesurvey.co.uk/doc/50kGazetteer/177276', width=1000, height=700)



In [24]:
from IPython.display import IFrame
IFrame('http://data.ordnancesurvey.co.uk/id/postcodeunit/NG72QL', width=1000, height=700)



In [20]:
from IPython.display import IFrame
IFrame('http://data.ordnancesurvey.co.uk/', width=1000, height=700)



In [21]:
from IPython.display import IFrame
IFrame('http://data.ordnancesurvey.co.uk/ontology/', width=1000, height=700)



In [17]:
from IPython.display import IFrame
IFrame('http://data.ordnancesurvey.co.uk/datasets/code-point-open/explorer/sparql', width=1000, height=700)



Open Street Map

In [25]:
from IPython.display import IFrame
IFrame('http://linkedgeodata.org/About', width=1000, height=700)



In [26]:
from IPython.display import IFrame
IFrame('http://browser.linkedgeodata.org/', width=1000, height=700)




In [27]:
from IPython.display import IFrame
IFrame('http://www.geonames.org/ontology/documentation.html', width=1000, height=700)



In [28]:
from IPython.display import IFrame
IFrame('http://www.geonames.org/maps/google_52.94_358.8.html', width=1000, height=700)



In [29]:
from IPython.display import IFrame
IFrame('http://lov.okfn.org/dataset/lov/vocabs/gn', width=1000, height=700)



Geo Vocabularies

In [30]:
from IPython.display import IFrame
IFrame('http://lov.okfn.org/dataset/lov/vocabs/?q=geo+space+address+geonames+os+spatial', width=1000, height=700)




  • Technical interoperability is only one part of the problem
  • Open data will become increasingly important as governments and other groups release resources under clear licences
    • Licences are a barrier to re-use
  • Data shows its true value when combined with other data sources – linked data creates an opportunity
  • Usability: common data model and reference of common URIs (for example, postcodes) allows for easy data aggregation and integration.
  • Shift in focus from cartography and geometries to ‘things’ and the relationships between them.
  • Spatial no longer special – part of the bigger information world....
  • location is a very important information hub and provides a key underpinning reference framework which brings many datasets together and provides important context.


Geo reasoning example (if time)

Geo example:

Leeds is a city.

Yorkshire is a county.

Sheffield is a city.

Lancaster is a city.

Lancashire is a county.

Lancaster has a port.

What is Leeds?

Leeds isIn Yorkshire.

Sheffield isIn Yorkshire.

Lancaster isIn Lancashire.

What isIn Yorkshire?

If X isIn Y then Y contains X.

What contains Leeds?

Yorkshire borders Lancashire.

If X borders Y then Y borders X.

What borders Lancashire?

Yorkshire isIn UnitedKingdom.

Lancashire isIn UnitedKingdom.


If X isIn Y and Y isIn Z then X isIn Z.

If X contains Y and Y contains Z then X contains Z

using proper isIn

Leeds is a city.

Yorkshire is a county.

Sheffield is a city.

Lancaster is a city.

Lancashire is a county.

Lancaster has a port.

What is Leeds?

Leeds is spatiallyWithin Yorkshire.

Sheffield is spatiallyWithin Yorkshire.

Lancaster is spatiallyWithin Lancashire.

What is spatiallyWithin Yorkshire?

If X is spatiallyWithin Y then Y spatiallyContains X.

What spatiallyContains Leeds?

Yorkshire borders Lancashire.

If X borders Y then Y borders X.

What borders Lancashire?

Yorkshire is spatiallyWithin UnitedKingdom.

Lancashire is spatiallyWithin UnitedKingdom.


If X is spatiallyWithin Y and Y is spatiallyWithin Z then X is spatiallyWithin Z.

If X spatiallyContains Y and Y spatiallyContains Z then X spatiallyContains Z

What is spatiallyWithin UnitedKingdom?

Adding more......

Pudsey is spatiallyWithin Leeds.

Kirkstall is spatiallyWithin Leeds.

Meanwood is spatiallyWithin Leeds.

Roundhay is spatiallyWithin Leeds.

Scarcroft is spatiallyWithin Leeds.

and more

UnitedKingdom isPartOf Europe.

UnitedKingdom is a country.

If X isPartOf Y and X spatiallyContains Z then Z isPartOf Y.

What isPartOf Europe?

In [ ]:
and more


If X spatiallyContains Y and X is a city then Y is a place and Y is a cityPart.

Every city is a place.

What is a place.


In [ ]:
and more

UK isPartOf Europe.

UK is sameAs UnitedKingdom.

If X has a port then X borders Water.

What borders Water?


In terms of discussion I'm interested in how these issues affect you.........