Open Context: Data Publication for Cultural Heritage and Field Research: "Open Context reviews, edits, and publishes archaeological research data and archives data with university-backed repositories, including the California Digital Library."

I often think of OpenContext as an examplar – a model from the future: academic data archiving done right. Some cool features (About Open Context: Technologies):

use of Atom feeds
JSON
KML
use of timemap - Javascript library to help use a SIMILE timeline with online maps to map events/objects in time and space (though I wonder whether this technology has been superceded).
contextualization by making ties to other data services
- putting things on maps
- ties to controlled vocabulary around biologica taxa and archaelogical terminology.
- use of linked open data: Examples?
- would be great to tie in any technology we develop for the visualization of large image collections into OpenContext.

We want to provide for the long-term citability and availability of this data.

Also contextualization.

What was the excellent presentation/paper he made to us in WwOD13?

Questions
- What costs of archiving data on OpenContext (How are the costs shared among depositor, OpenContext, CDL, and funding agencies?) I think About Open Context: Estimate Data Management + Publication Costs gives some clues.
- What guarantees are made about the data once it's archived at OpenContext and CDL? About Open Context: "Data safeguarded and preserved though archiving with the University of California's California Digital Library"
- How do you cite data in OpenContext?
- Who is reusing this data? Examples?
- IP rights of images (and other data)? Can we tie to Wikipedia and to Wikimedia Commons? (Some insights in About Open Context: Intellectual Property but all the whole range of issues are complex. It seems like there will be varying levels of openness and restriction in OpenContext. I will want to dive in to look at specific examples.)
  - For example, I don't see any explicit copyright statement at Open Context Image Lightbox: (1021 Images Showing) or http://opencontext.org/sets/.json?proj=Asian+Stoneware+Jars
- Can we bulk download data from OpenContext?
open data in science, specifically archaeology
- OpenContext has an API: About Open Context: Web Services and APIs
Open Context: Data Publication for Cultural Heritage and Field Research
Eric Kansa (@ekansa) is a former I School Adjunct Prof, and we've done work together on open goverment, particularly on the Recovery Act. Eric was recently honored (in 2013) by the White House Open Science Champion of Change.
possible project ideas
- visualizing the image collections in OpenContext.
- making ties to Encyclopedia of Life
- thinking about challenges of archiving data, reconciling data, aligning metadata to standards.
- I see a "suggested citation" in pages like Open Context view of Item: Trench 6. Good idea to embed metadata into page to make Zotero know how to grab citation metadata?

There are so many possibilities here; we can work iteratively with Eric Kansa to develop a good project without having it all figured out upfront.

Eric has mentioned to me the idea of time span facets.

Studying the UI

How to reproduce data represented by the map on Open Context?

How to use the API to get a list of projects?

http://opencontext.org/sets/.json returns json representation of items, but http://opencontext.org/projects/.json doesn't work for getting list of all projects. Answer:

A quick jump into the API of opencontext.org

Let's use a specific project to focus on:

The API documentation: http://opencontext.org/about/services

Open Context: Data Publication for Cultural Heritage and Field Research



In [1]:

    
# using an example in the API documentation to confirm that we can get json representation from API

import requests
json_url = "http://opencontext.org/sets/Palestinian+Authority/Tell+en-Nasbeh/.json?proj=Bade+Museum"

r = requests.get(json_url)

# what are the top level keys of response?
r.json().keys()









    Out[1]:





[u'updated',
 u'sorting',
 u'numFound',
 u'facets',
 u'offset',
 u'geoCount',
 u'chronoTileFacets',
 u'summary',
 u'paging',
 u'qstring',
 u'published',
 u'results',
 u'paramCount',
 u'geoTileFacets']



In [2]:

    
# Now let's apply same logic to the Asian Stoneware Jars project

json_url = "http://opencontext.org/sets/.json?proj=Asian+Stoneware+Jars"

request = requests.get(json_url)
request_json = request.json()

results= request_json['results']



In [3]:

    
request_json.keys()









    Out[3]:





[u'updated',
 u'sorting',
 u'numFound',
 u'facets',
 u'offset',
 u'geoCount',
 u'chronoTileFacets',
 u'summary',
 u'paging',
 u'qstring',
 u'published',
 u'results',
 u'paramCount',
 u'geoTileFacets']



In [4]:

    
# number of results matches what is on human UI
request_json['numFound']









    Out[4]:





1008



In [5]:

    
# we get back the first page of 10
len(results)









    Out[5]:





10



In [6]:

    
results[0]









    Out[6]:





{u'catIcon': u'http://opencontext.org/database/ui_images/med_oc_icons/ceramic_artifacts_50x50.jpg',
 u'category': u'Pottery',
 u'context': u'<div class="context"><div>\nContext: <span class="item_root_parent">Philippines</span> / <span class="item_parent">San Diego</span></div>\n</div>',
 u'geoTime': {u'geoLat': 13.539201,
  u'geoLong': 121.168213,
  u'timeBegin': False,
  u'timeEnd': False},
 u'label': u'UNE 104',
 u'project': u'Asian Stoneware Jars',
 u'thumbIcon': u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/Copy%20(2)%20of%20une104%20copy.jpg',
 u'uri': u'http://opencontext.org/subjects/EAFD5A63-83C0-43A3-691C-08878757A66D',
 u'var_vals': {u'Artefact Type': u'intact jar',
  u'Compositional Group': u'1',
  u'Dataset Type': u'A Ship',
  u'Diameter (mm)': u'440',
  u'Donor Institution/sample Source': u'National Museum of the Philippines',
  u'Group (INAA)': u'1',
  u'Group (icp)': u'1',
  u'Height (mm)': u'530',
  u'ICP - Ba': u'475.54',
  u'ICP - Ca': u'1366.57',
  u'ICP - Ce': u'120.03',
  u'ICP - Cu': u'27.82',
  u'ICP - Fe': u'14855.69',
  u'ICP - Ga': u'27.29',
  u'ICP - Hf': u'4.82',
  u'ICP - K': u'21721.65',
  u'ICP - La': u'88.31',
  u'ICP - Li': u'25.39',
  u'ICP - Mg': u'5677.31',
  u'ICP - Na': u'2328.68',
  u'ICP - Ni': u'8.6',
  u'ICP - Sc': u'9.74',
  u'ICP - Sr': u'53.75',
  u'ICP - Ti': u'3030.14',
  u'ICP - V': u'39.88',
  u'ICP - Yb': u'3.83',
  u'ICP - Zn': u'88.53',
  u'Museum No.': u'706',
  u'NAA validation - As': u'6.9',
  u'NAA validation - Au': u'0',
  u'NAA validation - Ba': u'526',
  u'NAA validation - Br': u'0',
  u'NAA validation - Ca': u'0',
  u'NAA validation - Ce': u'128',
  u'NAA validation - Co': u'20.9',
  u'NAA validation - Cr': u'23.4',
  u'NAA validation - Cs': u'6.5',
  u'NAA validation - Eu': u'1.36',
  u'NAA validation - Fe': u'1.53',
  u'NAA validation - Hf': u'11',
  u'NAA validation - K': u'2.59',
  u'NAA validation - La': u'74.5',
  u'NAA validation - Lu': u'0.59',
  u'NAA validation - Na': u'0.244',
  u'NAA validation - Rb': u'154',
  u'NAA validation - Sb': u'0.49',
  u'NAA validation - Sc': u'11.4',
  u'NAA validation - Sm': u'7.8',
  u'NAA validation - Ta': u'2.74',
  u'NAA validation - Tb': u'1.2',
  u'NAA validation - Th': u'37.6',
  u'NAA validation - U': u'10.5',
  u'NAA validation - Yb': u'4',
  u'NAA validation - Zn': u'90.2',
  u'PIXE - Al(1014)': u'129585',
  u'PIXE - Ca': u'1601',
  u'PIXE - F(area)': u'183.9',
  u'PIXE - Fe': u'14970',
  u'PIXE - K': u'25205',
  u'PIXE - Li(478)': u'10.2',
  u'PIXE - Mg(585)': u'0.0001',
  u'PIXE - Mn': u'405',
  u'PIXE - Na(440)': u'2659.3',
  u'PIXE - Rb': u'153',
  u'PIXE - Si': u'434771',
  u'PIXE - Sr': u'44',
  u'PIXE - Ti': u'4227',
  u'PIXE - V': u'74',
  u'PIXE - Zr': u'312',
  u'Photograph No. - Located In Photographs Folder.': u'UNE 104',
  u'Rel: http://www.cidoc-crm.org/rdfs/cidoc-crm#P2.has_type': u'http://collection.britishmuseum.org/id/thesauri/x7402',
  u'Rel: http://www.cidoc-crm.org/rdfs/cidoc-crm#P45F.consists_of': u'http://collection.britishmuseum.org/id/thesauri/x10539',
  u'Sample Source Person': u'Eusebio Dizon',
  u'Sample Weight (g)': u'4.5',
  u'Vessel Part Sampled': u'base',
  u'Year': u'1600'}}



In [7]:

    
# list the URLs for the thumbnails
[result.get('thumbIcon') for result in results]









    Out[7]:





[u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/Copy%20(2)%20of%20une104%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/UNE373%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une343%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une342%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une338%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une233%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une267%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une375%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/UNE115%20copy.jpg',
 u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/UNE149.JPG']



In [8]:

    
# do a quick display

from IPython.display import HTML
from jinja2 import Template

CSS = """
<style>
  .wrap img {
    margin-left: 0px;
    margin-right: 0px;
    display: inline-block;
  }
</style>
"""

IMAGES_TEMPLATE = CSS + """
<div class="wrap">
 {% for item in items %}<img title="{{item.label}}" src="{{item.thumbIcon}}"/>{% endfor %}
</div>
"""
    
template = Template(IMAGES_TEMPLATE)
HTML(template.render(items=results))









    Out[8]:

Parsing http://opencontext.org/sets/.json



In [9]:

    
import requests
url = "http://opencontext.org/sets/.json"

r = requests.get(url)
r.json().keys()









    Out[9]:





[u'updated',
 u'sorting',
 u'numFound',
 u'facets',
 u'offset',
 u'geoCount',
 u'chronoTileFacets',
 u'summary',
 u'paging',
 u'qstring',
 u'published',
 u'results',
 u'paramCount',
 u'geoTileFacets']



In [10]:

    
r.json()['numFound']









    Out[10]:





656491



In [11]:

    
r.json()['paging']['prev']









    Out[11]:





False



In [12]:

    
# write a generator for all items in http://opencontext.org/sets/.json

import requests

def opencontext_items():
    
    url = "http://opencontext.org/sets/.json"
    more_items = True
    
    while more_items:
        r = requests.get(url)
        for item in r.json()['results']:
            yield item
    
        url = r.json()['paging']['next']
        if not url:
            more_items = False



In [13]:

    
from itertools import islice
results = list(islice(opencontext_items(), 25))
HTML(template.render(items=results))









    Out[13]:

Parsing http://opencontext.org/projects/.atom



In [14]:

    
import requests
import lxml
from lxml import etree

url = "http://opencontext.org/projects/.atom"
r = requests.get(url)



In [15]:

    
doc = etree.fromstring(r.content)
doc









    Out[15]:





<Element {http://www.w3.org/2005/Atom}feed at 0x104d7d368>



In [16]:

    
# get list of titles

project_titles = [e.find('{http://www.w3.org/2005/Atom}title').text for e in doc.findall('{http://www.w3.org/2005/Atom}entry')]
for (i, title) in enumerate(project_titles):
    print i+1, title









    



1 South Carolina SHPO: (Overview)
2 Georgia Archaeological Site File (GASF): (Overview)
3 Florida Site Files: (Overview)
4 Pyla-Koutsopetria Archaeological Project: (Overview)
5 Balance Pan Weights from Nippur: (Overview)
6 Osteometric Database of South American Camelids: (Overview)
7 Ceramics, Trade, Provenience and Geology: Cyprus in the Late Bronze Age: (Overview)
8 Archaeology of Mesoamerican Animals: (Overview)
9 Çatalhöyük Zooarchaeology: (Overview)
10 Çatalhöyük Area TP Zooarchaeology: (Overview)
11 Ilıpınar Zooarchaeology: (Overview)
12 Zooarchaeology of Neolithic Ulucak: (Overview)
13 Çukuriçi Höyük Zooarchaeology: (Overview)
14 Barçın Höyük Zooarchaeology: (Overview)
15 Köşk Höyük Faunal Data: (Overview)
16 Erbaba Höyük and Suberde Zooarchaeology: (Overview)
17 Mikt’sqaq Angayuk Finds: (Overview)
18 Asian Stoneware Jars: (Overview)
19 Zooarchaeology of Öküzini Cave: (Overview)
20 Zooarchaeology of Karain Cave B: (Overview)
21 West Stow West Zooarchaeology: (Overview)
22 Murlo: (Overview)
23 Hacksilber Project: (Overview)
24 Kenan Tepe: (Overview)
25 Rough Cilicia: (Overview)
26 Dhiban Excavation and Development Project: (Overview)
27 Tal-e Malyan Zooarchaeology: Tal-e Malyan Zooarchaeology
28 Zooarchaeology of Medieval Emden: (Overview)
29 Chogha Mish Fauna: (Overview)
30 Khirbat al-Mudayna al-Aliya: (Overview)
31 Dove Mountain Groundstone: (Overview)
32 Bade Museum: (Overview)
33 San Diego Archaeological Center: (Overview)
34 Presidio of San Francisco: (Overview)
35 Aegean Archaeomalacology: (Overview)
36 Petra Great Temple Excavations: (Overview)
37 Iraq Heritage Program: (Overview)
38 Lake Carlos Beach Site, 1992 and 1996: (Overview)
39 Corneal Ulceration in South East Asia: (Overview)
40 Harvard Peabody Mus. Zooarchaeology: (Overview)
41 Hazor: Zooarchaeology: (Overview)
42 Hayonim: Micromorphology: (Overview)
43 Geissenklosterle: Micromorphology: (Overview)
44 Pınarbaşı 1994: Animal Bones: (Overview)
45 Domuztepe Excavations: (Overview)