Plot multiple volcanic data sets from the FITS (FIeld Time Series) database
In this notebook we will plot data of multiple types from volcano observatory instruments using data from the FITS (FIeld Time Series) database. This notebook assumes that the reader has either read the previous FITS data access and plotting Jupyter Notebook or has a basic understanding of Python. Some of the code from the previous notebook has been brought over in the form of package imports and a function in the first code segment.
To begin, run the following code segment:
In [1]:
# Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Define functions
def build_query(site, data_type):
'''
Take site code and data type and generate a FITS API query for an observations csv file
'''
# Ensure parameters are in the correct format for use with the FITS API
site = str.upper(site) # ensure site code is upper case
# Build a FITS API query by combining parameter:value pairs in the query format
query_suffix = 'siteID=%s&typeID=%s' % (site, data_type)
# Combine the query parameter=value string with the FITS observation data URL
URL = 'https://fits.geonet.org.nz/observation?' + query_suffix
return URL
Next we specify the sites and corresponding data types we want to view. Volcanic data has many types, and FITS database TypeID may not be obvious, so this may be a useful resource if the query isn't working.
To discover what geodetic sites exist, browse the GeoNet Network Maps. Volcanic sites can be found by data type in the FITS GUI.
At the Ruapehu Crater Lake (RU001) lake temperature and Mg2+ concentration are used (in combination with the lake level and wind speed) to model the amount of energy that enters the lake from below. In the next code segment we will gather this data.
In [2]:
# Set sites and respective data types in lists.
sites = ['RU001', 'RU001']
data_types = ['t', 'Mg-w']
# Ensure input is in list format
if type(sites) != list:
site = sites
sites = []
sites.append(site)
if type(data_types) != list:
temp_data_types = data_types
data_types = []
data_types.append(temp_data_types)
# Check that each site has a corresponding data type and vice versa
if len(sites) != len(data_types):
print('Number of sites and data types are not equal!')
# Create a list to store DataFrame objects in
data = [[] for i in range(len(sites))]
# Parse csv data from the FITS database into the data list
for i in range(len(sites)):
URL = build_query(sites[i], data_types[i]) # FITS API query building function
try:
data[i] = pd.read_csv(URL, names=['date-time', data_types[i], 'error'], header=0, parse_dates = [0], index_col = 0)
except:
print('Site or data type does not exist')
The only difference between this code segment and the corresponding segment of the previous notebook is that here we store DataFrame objects in a list and generate them using a for loop. Again we can plot the data in a simple plot:
In [3]:
# Plot the data on one figure
colors = ['red', 'blue']
for i in range(len(data)):
data[i].loc[:, data_types[i]].plot(marker='o', linestyle=':', color = colors[i])
plt.xlabel('Time', fontsize = 12)
plt.ylabel('')
plt.show()
While the above plot may succeed in showing the two data series on the same figure, it doesn't do it in a very useful way. We can use a few features of matplotlib to improve the readability of the figure:
In [4]:
# Generate blank figure to plot onto
fig, ax1 = plt.subplots()
# Plot the first data series onto the figure
data[0].loc[:, data_types[0]].plot(marker='o', linestyle=':', ax = ax1, color = colors[0], label = data_types[0])
# Plot the second data series onto the figure
ax2 = ax1.twinx() # Share x axis between two y axes
data[1].loc[:, data_types[1]].plot(marker='o', linestyle=':', ax = ax2, color = colors[1], label = data_types[1])
# Make a legend for both plots
plot1, labels1 = ax1.get_legend_handles_labels()
plot2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(plot1 + plot2, labels1 + labels2, loc = 0)
# Tidy up plot
ax1.set_xlabel('Time', rotation = 0, labelpad = 15, fontsize = 12)
ax1.set_ylabel(data_types[0], rotation = 0, labelpad = 35, fontsize = 12)
ax2.set_ylabel(data_types[1], rotation = 0, labelpad = 35, fontsize = 12)
plt.title(data_types[0] + ' and ' + data_types[1] + ' data for ' + sites[0] + ' and ' + sites[1], fontsize = 12)
plt.show()
This figure is much easier to compare the data series in, but it is also very cluttered. The next code segment plots each data series in separate subplots within the same figure to maximise readability without reducing the operator's ability to compare the two data series.
In [5]:
# New figure
plt.figure()
# Plot first data series onto subplot
ax1 = plt.subplot(211) # Generate first subplot
data[0].loc[:, data_types[0]].plot(marker='o', linestyle=':', ax = ax1, color = colors[0], label = data_types[0])
plt.title(data_types[0] + ' and ' + data_types[1] + ' data for ' + sites[0] + ' and ' + sites[1], fontsize = 12)
# Plot second data series onto second subplot
ax2 = plt.subplot(212, sharex = ax1)
data[1].loc[:, data_types[1]].plot(marker='o', linestyle=':', color = colors[1], label = data_types[1])
# Tidy up plot
ax2.set_xlabel('Time', rotation = 0, labelpad = 15, fontsize = 12)
ax1.set_ylabel(data_types[0], rotation = 0, labelpad = 35, fontsize = 12)
ax2.set_ylabel(data_types[1], rotation = 0, labelpad = 35, fontsize = 12)
# Remove messy minor x ticks
ax1.tick_params(axis = 'x', which = 'minor', size = 0)
ax2.tick_params(axis = 'x', which = 'minor', size = 0)
plt.show()
Which of these two plot styles is best is data-dependent and a matter of preference. When only two data series are being plotted it is fairly easy to overlay the two, but when more than two are used subplotting quickly becomes favourable.
Another useful dataset used for volcanic activity observation is the CO2/SO2 ratio, as high values of this ratio can indicate a fresh batch of magma beneath a volcano. We will look now at the dataset for monthly airborne measurements of the two gases at White Island. As multiple collection methods exist for these data types, we will need to expand the build_query function to allow methodID specification.
In [6]:
# Define functions
def build_query(site, data_type, method_type):
'''
Take site code and data type and generate a FITS API query for an observations csv file
'''
# Ensure parameters are in the correct format for use with the FITS API
site = str.upper(site) # ensure site code is upper case
# Build a FITS API query by combining parameter:value pairs in the query format
query_suffix = 'siteID=%s&typeID=%s&methodID=%s' % (site, data_type, method_type)
# Combine the query parameter=value string with the FITS observation data URL
URL = 'https://fits.geonet.org.nz/observation?' + query_suffix
return URL
Now we will gather the data, do a few specific modifications, and then present it. If you want to change the data types used here the code will fail. This is because there are hard-coded variable redefinitions that require this particular type of data. Consider this code segment an example, and not a modifiable script like the other segments of this notebook.
In [7]:
# Redefine variables
sites = ['WI000','WI000']
data_types = ['SO2-flux-a', 'CO2-flux-a']
method_types = ['cont', 'cont']
# Check that each site has a corresponding data type and vice versa
if (len(sites) != len(data_types)) or (len(sites) != len(method_types)) or (len(data_types) != len(method_types)):
print('Number of sites, data types, and collection methods are not all equal!')
# Create a list to store DataFrame objects in
data = [[] for i in range(len(sites))]
# Parse csv data from the FITS database into the data list
for i in range(len(sites)):
URL = build_query(sites[i], data_types[i], method_types[i]) # FITS API query building function
try:
data[i] = pd.read_csv(URL, names=['date-time', data_types[i], 'error'], header=0, parse_dates = [0], index_col = 0)
except:
print('Site or data type does not exist')
# Remove non-synchronous measurements in the two data series
data[0] = data[0][data[0].index.isin(data[1].index)]
data[1] = data[1][data[1].index.isin(data[0].index)]
# Hard-code in the ratio calculation
ratio = pd.DataFrame() # make an empty DataFrame
ratio['value'] = data[1]['CO2-flux-a'] / data[0]['SO2-flux-a'] # calculate the ratio between observations and call it value
ratio.index = data[1].index # the ratio index is the CO2 flux index (observation times)
# Plot the dataset
ratio.loc[:,'value'].plot(marker='o', linestyle=':', color='blue')
# Add functional aspects to plot
plt.xlabel('Time', fontsize = 12)
plt.ylabel('Ratio', fontsize = 12)
plt.title('Ratio of ' + data_types[0] + ' and ' + data_types[1] + ' for site ' + sites[0], fontsize = 12)
plt.show()