In [1]:
# This imports the OpenContextAPI from the api.py file in the
# opencontext directory.
%run '../opencontext/api.py'
In [2]:
import numpy as np
import pandas as pd
oc_api = OpenContextAPI()
oc_api.set_cache_file_prefix('murlo-objs')
# Make multiple values for non-numbers JSON formated strings
oc_api.multi_value_handle_non_number = 'concat'
oc_api.multi_value_handle_keyed_attribs = {
'Motif': 'json',
'Decorative Technique': 'json',
'Fabric Category': 'json',
}
# Clear old cached records.
oc_api.clear_api_cache()
# This is a search url for bovid tibias.
url = 'https://opencontext.org/subjects-search/Italy?proj=24-murlo&prop=oc-gen-cat-object#14/43.1610/11.3961/18/any/Google-Satellite'
# Fetch the 'standard' (linked data identified) attributes in use with
# data at the url.
stnd_attribs_tuples = oc_api.get_standard_attributes(
url,
)
proj_attribs_tuples = oc_api.get_common_attributes(
url,
min_portion=0.001,
)
# Now display the standard attributes found in this search / query result
for slug, label in stnd_attribs_tuples:
print('Standard: {}, identified by slug: {}'.format(label, slug))
for slug, label in proj_attribs_tuples:
print('Proj attribute: {}, identified by slug: {}'.format(label, slug))
In [3]:
# Make a list of only the slugs from the list of slug, label tuples.
attribs_for_records = [slug for slug, _ in (stnd_attribs_tuples + proj_attribs_tuples)]
# Make a dataframe by fetching result records from Open Context.
# This will be slow until we finish improvements to Open Context's API.
# However, the results get cached by saving as files locally. That
# makes iterating on this notebook much less painful.
df = oc_api.url_to_dataframe(url, attribs_for_records)
In this particular dataset, there are long (sometimes HTML) descriptions of objects. We're caching these locally in the JSON results from the API requests to Open Context. However, for our purposes of making analysis friendly dataframes, we don't need these long free-text attributes. So we'll drop them from the dataframe.
In [7]:
# Define a list of columns to drop.
drop_cols = [
'Fragment Noted',
'Depth Notes',
'Supplement Note',
'Fabric Description',
'Description',
'Size',
]
df.drop(columns=drop_cols, inplace=True)
# The API returns 'False' if a citation URI is not defined, it's better
# practice to make this a null.
df.loc[(df['citation uri'] == False), 'citation uri'] = np.nan
import os
# Now save the results of all of this as a CSV file.
repo_path = os.path.dirname(os.path.abspath(os.getcwd()))
csv_path = os.path.join(
repo_path,
'files',
'oc-api-murlo-objects-multivalue-as-json.csv'
)
df.to_csv(csv_path, index=False)
print('Saved this example as a CSV table at: {}'.format(csv_path))
Using the already cached JSON obtained from the Open Context API, we can make a second dataframe that is "wider" (has many more columns"). This wide dataframe will express multiple values for "Motif", "Decorative Technique", and "Fabric Category" in different columns. We set the dictioary oc_api.multi_value_handle_keyed_attribs to do this.
In [5]:
oc_api.multi_value_handle_non_number = 'concat'
oc_api.multi_value_handle_keyed_attribs = {
'Motif': 'column_val',
'Decorative Technique': 'column_val',
'Fabric Category': 'column_val',
}
df_wide = oc_api.url_to_dataframe(url, attribs_for_records)
The df_wide dataframe handles multiple values for some attributes by making many boolean columns, with each column noting the presense of a given attribute value on a row for an artifact. For example, True values on the column "Motif :: Panther"" indicate the presense of a "Panther" motif observed on an artifact, and True valeus of the column "Motif :: Potnia Theron" indicate a "Potnia Theron" motif on an artifact.
In [6]:
df_wide.drop(columns=drop_cols, inplace=True)
# The API returns 'False' if a citation URI is not defined, it's better
# practice to make this a null.
df_wide.loc[(df_wide['citation uri'] == False), 'citation uri'] = np.nan
csv_wide_path = os.path.join(
repo_path,
'files',
'oc-api-murlo-objects-multivalue-as-cols.csv'
)
df_wide.to_csv(csv_wide_path, index=False)
print('Saved this example wide as a CSV table at: {}'.format(csv_wide_path))