Open Context: Data Publication for Cultural Heritage and Field Research: "Open Context reviews, edits, and publishes archaeological research data and archives data with university-backed repositories, including the California Digital Library."
I often think of OpenContext as an examplar – a model from the future: academic data archiving done right. Some cool features (About Open Context: Technologies):
We want to provide for the long-term citability and availability of this data.
Also contextualization.
What was the excellent presentation/paper he made to us in WwOD13?
There are so many possibilities here; we can work iteratively with Eric Kansa to develop a good project without having it all figured out upfront.
Eric has mentioned to me the idea of time span facets.
How to reproduce data represented by the map on Open Context?
How to use the API to get a list of projects?
http://opencontext.org/sets/.json returns json representation of items, but http://opencontext.org/projects/.json doesn't work for getting list of all projects. Answer:
Let's use a specific project to focus on:
The API documentation: http://opencontext.org/about/services
In [1]:
# using an example in the API documentation to confirm that we can get json representation from API
import requests
json_url = "http://opencontext.org/sets/Palestinian+Authority/Tell+en-Nasbeh/.json?proj=Bade+Museum"
r = requests.get(json_url)
# what are the top level keys of response?
r.json().keys()
Out[1]:
In [2]:
# Now let's apply same logic to the Asian Stoneware Jars project
json_url = "http://opencontext.org/sets/.json?proj=Asian+Stoneware+Jars"
request = requests.get(json_url)
request_json = request.json()
results= request_json['results']
In [3]:
request_json.keys()
Out[3]:
In [4]:
# number of results matches what is on human UI
request_json['numFound']
Out[4]:
In [5]:
# we get back the first page of 10
len(results)
Out[5]:
In [6]:
results[0]
Out[6]:
In [7]:
# list the URLs for the thumbnails
[result.get('thumbIcon') for result in results]
Out[7]:
In [8]:
# do a quick display
from IPython.display import HTML
from jinja2 import Template
CSS = """
<style>
.wrap img {
margin-left: 0px;
margin-right: 0px;
display: inline-block;
}
</style>
"""
IMAGES_TEMPLATE = CSS + """
<div class="wrap">
{% for item in items %}<img title="{{item.label}}" src="{{item.thumbIcon}}"/>{% endfor %}
</div>
"""
template = Template(IMAGES_TEMPLATE)
HTML(template.render(items=results))
Out[8]:
In [9]:
import requests
url = "http://opencontext.org/sets/.json"
r = requests.get(url)
r.json().keys()
Out[9]:
In [10]:
r.json()['numFound']
Out[10]:
In [11]:
r.json()['paging']['prev']
Out[11]:
In [12]:
# write a generator for all items in http://opencontext.org/sets/.json
import requests
def opencontext_items():
url = "http://opencontext.org/sets/.json"
more_items = True
while more_items:
r = requests.get(url)
for item in r.json()['results']:
yield item
url = r.json()['paging']['next']
if not url:
more_items = False
In [13]:
from itertools import islice
results = list(islice(opencontext_items(), 25))
HTML(template.render(items=results))
Out[13]:
In [14]:
import requests
import lxml
from lxml import etree
url = "http://opencontext.org/projects/.atom"
r = requests.get(url)
In [15]:
doc = etree.fromstring(r.content)
doc
Out[15]:
In [16]:
# get list of titles
project_titles = [e.find('{http://www.w3.org/2005/Atom}title').text for e in doc.findall('{http://www.w3.org/2005/Atom}entry')]
for (i, title) in enumerate(project_titles):
print i+1, title