Data discovery using FITS (FIeld Time Series) database - for sites
In this notebook we will look at discovering what data exists for a site in the FITS (FIeld Time Series) database. Again some functionality from previous notebooks in this series will be imported as functions.
To begin we will create a list of all the different data types available in the FITS database:
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np
# Create list of all typeIDs available in the FITS database
all_type_URL = 'https://fits.geonet.org.nz/type'
all_types = pd.read_json(all_type_URL).iloc[:,0]
all_typeIDs= []
for row in all_types:
all_typeIDs.append(row['typeID'])
Now we will specify the site(s) we want to query for available data types.
In [2]:
# Specify site(s) to get data for
sites = ['RU001', 'WI222']
The next code segment will query the FITS database for data of all types at the given site(s). The output will be a list of data types available at the site(s) specified.
In [3]:
# Ensure list format to sites
if type(sites) != list:
site = sites
sites = []
sites.append(site)
# Prepare data lists
site_data = [[] for j in range(len(sites))]
site_data_types = [[] for j in range(len(sites))]
# Load data from FITS database and parse into data lists
for j in range(len(sites)):
for i in range(len(all_typeIDs)):
# Build query for site, typeID combination
query_suffix = 'siteID=%s&typeID=%s' % (sites[j], all_typeIDs[i])
URL = 'https://fits.geonet.org.nz/observation?' + query_suffix
# Try to load data of the given typeID for the site, if it fails then the data doesn't exist
try:
data = pd.read_csv(URL, names=['date-time', all_typeIDs[i], 'error'], header=0, parse_dates=[0], index_col=0)
if len(data.values) > 1:
site_data[j].append(data)
site_data_types[j].append(all_typeIDs[i])
except:
pass
# Return information to the operator
for i in range(len(site_data_types)):
print('Data types available for ' + sites[i] + ':\n')
for j in range(len(site_data_types[i])):
print(site_data_types[i][j])
print('\n')
While a list of available data types is useful, we need a fast way to view data of a given type and decide if it's what we want. The next code segment deals with plotting some data for each site. As there is a limit to how much data can be displayed on a plot, we will specify which data types we want to display for the site(s). Up to 9 data types can be specified.
In [4]:
plot_data_types = ['t', 'ph', 'NH3-w']
# Determine number and arrangement of subplots (max 9, less for greater legibility)
subplot_number = len(plot_data_types)
if subplot_number / 3 > 1: # if there are more than 3 subplots
rows = '3'
if subplot_number / 6 > 1: # if there are more than 6 subplots
cols = '3'
else:
cols = '2'
else:
rows = str(subplot_number)
cols = '1'
ax = [[] for i in range(len(plot_data_types))]
# Plot data
plt.figure(figsize = (10,8))
for i in range(len(site_data)): # i is site index
for j in range(len(plot_data_types)): # j is data type index
k = site_data_types[i].index(plot_data_types[j]) # match data type chosen to position in data list
if i == 0:
ax[j] = plt.subplot(int(rows + cols + str(j + 1)))
if ((i == 0) and (j == 0)):
# Set initial min/max times
minmintime = min(site_data[i][k].index.values)
maxmaxtime = max(site_data[i][k].index.values)
# Do not plot empty DataFrames (and avoid cluttering the figure legend)
if len(site_data[i][k].values) < 1:
continue
try:
ax[j].plot(site_data[i][k].loc[:, plot_data_types[j]], label = sites[i],
marker='o', linestyle=':', markersize = 1)
except:
continue
# Get min, max times of dataset
mintime = min(site_data[i][k].index.values)
maxtime = max(site_data[i][k].index.values)
# Set y label
ax[j].set_ylabel(plot_data_types[j], rotation = 90, labelpad = 5, fontsize = 12)
if ((i == 1) and (j == 0)):
# Set legend
plot, labels = ax[j].get_legend_handles_labels()
# ^ due to repetitive nature of plotting, only need to do this once
ax[j].legend(plot, labels, fontsize = 12, bbox_to_anchor=(-0.2, 1.3))
# Note: the legend may extend off the figure if there are many sites
# Set title
plot_data_typesstr = ''
for k in range(len(plot_data_types)): plot_data_typesstr += plot_data_types[k] + ', '
plot_data_typesstr = plot_data_typesstr[:-2]
ax[j].set_title('All site data for plot data types : ' + plot_data_typesstr, loc = 'left', y = 1.03,
fontsize = 16)
# Get min, max times of all data
minmintime = min(mintime, minmintime)
maxmaxtime = max(maxtime, maxmaxtime)
# Add x label
plt.xlabel('Time', rotation = 0, labelpad = 5, fontsize = 12)
# Tidy up plot extent and x-axis
for j in range(len(plot_data_types)):
ax[j].set_xlim([minmintime, maxmaxtime])
ax[j].set_xticks(np.arange(minmintime, maxmaxtime + 1000, (maxmaxtime - minmintime) / 3))
plt.show()
In [5]:
# Optionally, save the figure to the current working directory
import os
plt.savefig(os.getcwd() + '/test.png', format = 'png')