In [1]:
import pandas as pd
import numpy as np
import latools as la
from IPython.display import HTML
from comparison_tools import helpers, stats, plots
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
HTML(filename="./Parameter_Tables/fossil_forams.html")
Out[2]:
In [3]:
dat = la.analyse('./raw_data/downcore_foram', srm_identifier='610')
In [4]:
sample = '412_1'
_ = dat.data[sample].tplot()
In [5]:
dat.despike()
In [6]:
_ = dat.data[sample].tplot()
In [7]:
dat.autorange(off_mult=[1, 3], on_mult=[3, 1], gwin=7)
# 'gwin' increases the size of the window used to calculate the gradient.
# Increasing it effectively broadens the widths of the transitions.
In [8]:
_ = dat.data[sample].tplot(ranges=True)
In [9]:
dat.bkg_calc_weightedmean(weight_fwhm=1000, bkg_filter=True)
In [10]:
dat.bkg_plot()
Out[10]:
the gap in analyses centered aroun 2000s makes the uncertainties in the background increase a lot. Don't worry about this, as there are no data in this region, so these background will not be used.
In [11]:
dat.bkg_subtract()
In [12]:
_ = dat.data[sample].tplot(ranges=True)
In [13]:
dat.ratio()
In [14]:
_ = dat.data[sample].tplot(ranges=True)
In [15]:
dat.calibrate(srms_used='NIST610')
In [16]:
_ = dat.calibration_plot()
In [17]:
_ = dat.data[sample].tplot(ranges=True)
In [18]:
# make plots of all samples
dat.trace_plots()
In [19]:
fig, axs = dat.crossplot()
In [20]:
dat.filter_clear()
In [21]:
# calculate concentration thresholds, based on maximum feasible compositions from cultured O universa.
dat.filter_threshold('Al27', 0.1e-3) # remove Al/Ca > 100 mmol/mol
dat.filter_threshold('Mn55', 0.2e-3) # remove Mn/Ca > 200 mmol/mol
dat.filter_threshold('Ba138', 10e-6) # remove Ba/Ca > 10 umol/mol
Next, we know that certain elements, like Ba and Sr, should be homogeneous throughout the test. We can remove regions where they aren't homogeneous using a gradient filter. But what gradients should we use in the filter? The specific values are obviously depenent on factors like the size of the moving-window used to calculate the gradient, so the best way to work out gradient cutoffs is is to actually look at the gradients.
In [22]:
dat.gradient_histogram(['Ba138', 'Sr88'], win=21)
Out[22]:
The high and low tails of these histograms represent the variable regions of the data, and we can see that reasonable thresholds would be a Ba/Ca of ~230 nmol/mol/s, and Sr/Ca of ~50 umol/mol/s. Let's calculate these:
In [23]:
# calculate gradient thresholds (note all units are mol/mol/s)
dat.filter_gradient_threshold('Ba138', 250e-9, win=21)
dat.filter_gradient_threshold('Sr88', 50e-6, win=21)
Now we have a range of calculated filters, but none of them are active yet.
In [24]:
dat.filter_status()
Threshold filters alone can be inherently 'fragmented' - i.e. if a region of data oscilates either side of the threshold value, this will produce lots of filter 'fragments'. These aren't particularly useful, and we should remove them. These 'fragmented' regions can be either removed or excluded, using the filter_defragment function.
Threshold filters are also rather naive. For example, we know that in spot measurements that ablate into the sample there is a degree of signal mixing from the side walls of the ablation pit. Therefore, as soon as you hit a contaminant phase, there will be trace of this contaminant signal in all data collected after you've hit it. To get around this, we can exclude data thatare 'down-hole' of contaminant signals. We can do this using the filter_exclude_downhole function.
In these samples, there are two types of elements - those that we know whould be homogeneous (Ba, Sr, Mn, Al) and those that we are expecting to be heterogeneous (Mg). By looking at the trace plots, it's evident that the main source of Mg variability is natural heterogeneity, as there's relatively little correlation with the contaminant indicators (Mn, Al, Ba). Therefore, we're going to use different filter combinations for these types of elements.
For homogeneous-type elements, we'll apply all the filters, defragment, and exclude down-hole regions.|
In [25]:
# make a filter for contaminant-prone elements
dat.filter_defragment(7, filt='Al27_thresh_below & Mn55_thresh_below & Ba138_thresh_below & Ba138_gthresh_below & Sr88_gthresh_below')
dat.filter_exclude_downhole(5, '8_defrag_include_7')
# activate filter
dat.filter_on('9_downhole_excl_5', ['Ba138', 'Al27', 'Mn55'], show_status=True)
In [26]:
_ = dat.data[sample].tplot(filt=True)
For heterogeneous Mg, which seems to be relatively insensitive to the other 'contaminants', we're only going to exclude high Al/Ca regions, which is indicative of ablating through the test into the underlying Al-rich carbon tape they are mounted on.
In [27]:
# make a filter for Mg (less strict)
dat.filter_defragment(15, filt='Al27_thresh_below')
dat.filter_exclude_downhole(5, '12_defrag_include_15')
# # activate filter
dat.filter_on('13_exclude_downhole_5', ['Mg24', 'Mg25', 'Sr88'], show_status=True)
In [28]:
_ = dat.data[sample].tplot(filt=True)
We now have two filters, one active for the homogeneous elements, and one for the heterogeneous Mg/Ca. This will do a pretty good job of selecting appropriate data regions.
Let's check what the filters are actually doing:
In [29]:
dat.trace_plots([a for a in dat.analytes if 'Ca' not in a], filt=True)
In these plots, data that are excluded by filters are greyed out.
The filters are doign a pretty good job, but there's still some variability in the Ba, which suggests some contaminant signals have not been removed.
To fix this, we can go one step further:
Ba/Ca is particularly contaminant-prone in these samples. To remove these contaminants, we will optimise the data selection such that both the mean concentration and RSD of Ba/Ca is minimised. This should identify contaminant-free regions.
We apply this optimised selection to Ba/Ca, Al/Ca, Mn/Ca and Sr/Ca. Sr/Ca is not commonly treated as a contaminant-prone element. However, we know from culture work that 'pristine' Sr/Ca should be homogeneous through the test. Looking at the traces for these samples, Sr/Ca exhibits distinct variability, which tends to correlate with Ba/Ca, Mn/Ca and Al/Ca, and increase with ablation time. Thus, it seems that Sr/Ca is prone to contamination in these samples, and we will select it using our contaminant filter.
(This takes a few minutes to run.)
In [30]:
# Optimise Ba138 selection to minimise mean and rsd
dat.optimise_signal(['Ba138'], 9, x_bias=0.2)
A few warnings here, telling us that it can't find a good region for some of the samples. A good idea to have a look at the optimisation plots, to check things look sensible.
In [31]:
dat.optimisation_plots()
In [32]:
dat.filter_on('optimise_Ba138', ['Ba138'], show_status=True)
In [33]:
_ = dat.data[sample].tplot(filt=True)
In [34]:
dat.sample_stats(stats=['mean'])
In [35]:
ld = dat.getstats().loc['mean'] * 1e3
In [36]:
dat.minimal_export(path='raw_data/downcore_foram_export/minimal_export.zip')
In [37]:
# load reference and test data
rd = helpers.load_reference_data('downcore_reference')
td = helpers.load_reference_data('downcore_test')
# combine all data
df = rd.join(td, lsuffix='_r', rsuffix='_t')
df = df.join(ld)
In [38]:
# number of unique samples
np.unique(rd.index.levels[0]).size
Out[38]:
In [39]:
# in reference data
_, rep_dists, rep_stats, _ = stats.pairwise_reproducibility(rd, plot=True)
In [40]:
rep = pd.DataFrame(rep_stats).T
rep.columns = ['50%', '95%']
rep = rep.loc[['Mg/Ca', 'Sr/Ca', 'Ba/Ca', 'Al/Ca', 'Mn/Ca'],:]
In [41]:
rep.to_csv('reproducibility_quants.csv')
In [42]:
# in test data
_ = stats.pairwise_reproducibility(rd, plot=True)
In [43]:
# in latools data
_ = stats.pairwise_reproducibility(ld, plot=True)
In [44]:
fig, axs = plots.bland_altman_plots(df, rep_stats)
fig.savefig('Figures/downcore_comparison.pdf')
fig.savefig('Figures/downcore_comparison.png', dpi=200)
In [45]:
# fig, axs = plots.comparison_plots(df)
In [46]:
# fig, axs = plots.residual_plots(df, rep_stats)
In [47]:
stat = stats.comparison_stats(df)
In [48]:
stat.to_csv('Stats/downcore_stats.csv')
In [49]:
stat['Test User']
Out[49]:
In [50]:
stat['LAtools']
Out[50]: