Python provides plenty of tools for reading and visualising different kinds of tabular data. Here we demonstrate using pandas and xlrd to load and manipulate excel data and the display this data in a matplotlib graph. Using a simple custom widget defined in the 'Filtering Widget.ipynb'.
This notebook was developed by the Oznome Project and is available at https://github.com/oznome/jupyter-examples. The filtering widget was produced in conjunction with the Oznome project as part of the CSIRO EUDM project.
In [155]:
import pandas, xlrd, requests, json
from pandas import np
import matplotlib.pyplot as plt
pandas.set_option('display.max_columns', 500)
%matplotlib inline
We request a data record from the CSIRO Knowledge Network
In [82]:
response = requests.get("http://kn.csiro.au/api/dataset?id=http%3A%2F%2Foznome.csiro.au%2Fid%2Fdata%2Fdata-gov-au%2Fsaimos-biological-and-flow-cytometry-data-collected-from-ctd-stations-in-south-australia-i-20142")
json_data = response.json()
Exploring the json respone data we can then figure out how to query this record for futher details. First though lets create a class (from https://stackoverflow.com/questions/18873066/pretty-json-formatting-in-ipython-notebook) to make our json easier to explore.
In [159]:
import uuid
from IPython.display import display_javascript, display_html, display
import json
class RenderJSON(object):
def __init__(self, json_data):
if isinstance(json_data, dict):
self.json_str = json.dumps(json_data)
else:
self.json_str = json_data
self.uuid = str(uuid.uuid4())
def _ipython_display_(self):
display_html('<div id="{}" style="height: 600px; width:100%;"></div>'.format(self.uuid), raw=True)
display_javascript("""
require(["https://rawgit.com/caldwell/renderjson/master/renderjson.js"], function() {
document.getElementById('%s').appendChild(renderjson(%s))
});
""" % (self.uuid, self.json_str), raw=True)
In [160]:
json_data = response.json()
RenderJSON(json_data)
By using a python list comprehension we can query a particular excel workbook that is referenced in the metadata record
In [84]:
url = [resource for resource in json_data["resources"] if "Picophytoplankton" in resource["name"]][0]["url"]
url
Out[84]:
Then using requests we can query the specific excel workbook and open it using xlrd
In [85]:
r = requests.get(url)
book = xlrd.open_workbook(file_contents=r.content)
Then list the various sheets contained in this workbook
In [86]:
book.sheet_names()
Out[86]:
Using pandas we can read one of these sheets and do some exporatory data analysis starting with listing the available columns
In [87]:
dataframe = pandas.read_excel(url,sheetname='Converted_CLEAN')
dataframe.columns
Out[87]:
The dataframe describe function tells us more information about these columns
In [88]:
dataframe.describe(include='all')
Out[88]:
Sometime it is useful to build widgets to help explore data and make a notebook user friendly you can define widgets inline a notebook but it is also easy to define widgets in an adjacent notebook and then run them using %run
In [89]:
%run Filtering\ Widget.ipynb
The above code creates a widget class called FilteringWidget which we can then use to create a widget
In [90]:
filtered = FilteringWidget(dataframe, ['Station', 'Depth (m)'])
Interacting with this widget produces a filtered dataframe available by accessing the dataframe property. For example we can restrict the dataframe to the NRSKAI station
In [91]:
filtered.dataframe
Out[91]:
In [174]:
filtered_frame = filtered.dataframe.replace('-', np.nan)
Thinking about what this data might be we can make some experimental assumptions. Perhaps Rep is experiment repeat number. We can further restrict the data frame to particular organism by specifying a columns
In [175]:
filtered_frame = filtered_frame[[ 'Synechococcus ','Prochlorococus', 'Picoeukaryotes', 'Rep', 'Depth (category)']]
It looks like there is data for all the organisms in "Rep 2" so lets filter to that
In [176]:
filtered_frame = filtered_frame.loc[filtered_frame['Rep'] == 2]
In [177]:
filtered_frame
Out[177]:
The current index looks like a unique identifier for the experiment but we have restricted the data so that depth also provides a unique identifier in our filtered dataset lets get rid of the old index and set the new index to the depth so we can create a visualisation of this data
In [178]:
filtered_frame = filtered_frame.set_index('Depth (category)')
we also won't need the rep column
In [179]:
filtered_frame.pop('Rep')
pass
In [180]:
filtered_frame.plot(kind='bar', stacked=True)
Out[180]: