In [1]:
import pandas as pd
import numpy as np

import latools as la

from IPython.display import HTML

from comparison_tools import helpers, stats, plots

import matplotlib.pyplot as plt
%matplotlib inline

Data Collection Parameters


In [2]:
HTML(filename="./Parameter_Tables/fossil_forams.html")


Out[2]:
Parameter Value
Laser Instrument Photon Machines G2
Energy Samples: 1.46 J cm-2
NIST: 3.08 J cm-2
Wavelength 193 nm
Pulse Length 4 nm
Repetition Rate 4 Hz
Spot Size 30x30 μm square
Beam Optics Optical Attenuator
Interface Sample Cell Type 2 Volume Helex
Gas Flows Not Recorded
Transfer Devices Squid
Mass Spec Type Agilent 7700x ICP-MS
24Mg 20 ms
25Mg 30 ms
27Al 10 ms
43Ca 30 ms
44Ca 20 ms
55Mn 20 ms
88Sr 30 ms
138Ba 60 ms
Cycle Time 236.2 ms

LAtools Data Processing

First, import the data and perform the basic processing steps and calibration.


In [3]:
dat = la.analyse('./raw_data/downcore_foram', srm_identifier='610')


Loading Data:  14%|█▍        | 8/58 [00:00<00:00, 77.17it/s]
Starting analysis using "UCD-AGILENT" configuration:
Loading Data: 100%|██████████| 58/58 [00:00<00:00, 127.26it/s]
  58 Data Files Loaded: 6 standards, 52 samples
  Analytes: Mg24 Mg25 Al27 Ca43 Ca44 Mn55 Sr88 Ba138
  Internal Standard: Ca43


In [4]:
sample = '412_1'
_ = dat.data[sample].tplot()


Data Processing


In [5]:
dat.despike()


Despiking: 100%|██████████| 58/58 [00:00<00:00, 809.94it/s]

In [6]:
_ = dat.data[sample].tplot()



In [7]:
dat.autorange(off_mult=[1, 3], on_mult=[3, 1], gwin=7)
# 'gwin' increases the size of the window used to calculate the gradient.
# Increasing it effectively broadens the widths of the transitions.


AutoRange: 100%|██████████| 58/58 [00:04<00:00, 14.11it/s]

In [8]:
_ = dat.data[sample].tplot(ranges=True)



In [9]:
dat.bkg_calc_weightedmean(weight_fwhm=1000, bkg_filter=True)

In [10]:
dat.bkg_plot()


Plotting backgrounds: 100%|██████████| 8/8 [00:00<00:00, 15.48it/s]
Out[10]:
(<Figure size 626.4x360 with 1 Axes>,
 <matplotlib.axes._axes.Axes at 0x7f36f53cf908>)

the gap in analyses centered aroun 2000s makes the uncertainties in the background increase a lot. Don't worry about this, as there are no data in this region, so these background will not be used.


In [11]:
dat.bkg_subtract()


Background Subtraction: 100%|██████████| 58/58 [00:04<00:00, 12.78it/s]

In [12]:
_ = dat.data[sample].tplot(ranges=True)



In [13]:
dat.ratio()


Ratio Calculation: 100%|██████████| 58/58 [00:02<00:00, 20.54it/s]

In [14]:
_ = dat.data[sample].tplot(ranges=True)



In [15]:
dat.calibrate(srms_used='NIST610')


Applying Calibrations: 100%|██████████| 58/58 [00:06<00:00,  8.50it/s]

In [16]:
_ = dat.calibration_plot()



In [17]:
_ = dat.data[sample].tplot(ranges=True)



In [18]:
# make plots of all samples
dat.trace_plots()


Drawing Plots: 100%|██████████| 58/58 [00:17<00:00,  3.63it/s]

Data Filtering


In [19]:
fig, axs = dat.crossplot()


Drawing Plots: 100%|██████████| 21/21 [00:00<00:00, 117.48it/s]

In [20]:
dat.filter_clear()

Threshold Filters

Our 'first pass' filters are based on element concentrations, with thresholds defined from a-priori knowledge of reasonable O. universa compositions from culture studies. We'll calculate filters for Al/Ca = 0.1 mmol/mol, Mn/Ca = 0.2 mmol/mol and Ba/Ca = 10 $\mu$mol/mol.


In [21]:
# calculate concentration thresholds, based on maximum feasible compositions from cultured O universa.
dat.filter_threshold('Al27', 0.1e-3)  # remove Al/Ca > 100 mmol/mol
dat.filter_threshold('Mn55', 0.2e-3)  # remove Mn/Ca > 200 mmol/mol
dat.filter_threshold('Ba138', 10e-6)  # remove Ba/Ca > 10 umol/mol


Threshold Filter: 100%|██████████| 52/52 [00:00<00:00, 2828.77it/s]
Threshold Filter: 100%|██████████| 52/52 [00:00<00:00, 2645.51it/s]
Threshold Filter: 100%|██████████| 52/52 [00:00<00:00, 2431.78it/s]

Next, we know that certain elements, like Ba and Sr, should be homogeneous throughout the test. We can remove regions where they aren't homogeneous using a gradient filter. But what gradients should we use in the filter? The specific values are obviously depenent on factors like the size of the moving-window used to calculate the gradient, so the best way to work out gradient cutoffs is is to actually look at the gradients.


In [22]:
dat.gradient_histogram(['Ba138', 'Sr88'], win=21)


Calculating Gradients: 100%|██████████| 52/52 [00:08<00:00,  6.34it/s]
Out[22]:
(<Figure size 864x180 with 4 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f36b2b6e390>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f36b2a80630>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f36b2a9b908>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f36b2a39e48>],
       dtype=object))

The high and low tails of these histograms represent the variable regions of the data, and we can see that reasonable thresholds would be a Ba/Ca of ~230 nmol/mol/s, and Sr/Ca of ~50 umol/mol/s. Let's calculate these:


In [23]:
# calculate gradient thresholds (note all units are mol/mol/s)
dat.filter_gradient_threshold('Ba138', 250e-9, win=21)
dat.filter_gradient_threshold('Sr88', 50e-6, win=21)


Gradient Threshold Filter: 100%|██████████| 52/52 [00:04<00:00, 12.29it/s]
Gradient Threshold Filter: 100%|██████████| 52/52 [00:04<00:00, 13.52it/s]

Now we have a range of calculated filters, but none of them are active yet.


In [24]:
dat.filter_status()


Subset: All Samples

n  Filter Name            Mg24   Mg25   Al27   Ca43   Ca44   Mn55   Sr88   Ba138  
0  Al27_thresh_below      False  False  False  False  False  False  False  False  
1  Al27_thresh_above      False  False  False  False  False  False  False  False  
2  Mn55_thresh_below      False  False  False  False  False  False  False  False  
3  Mn55_thresh_above      False  False  False  False  False  False  False  False  
4  Ba138_thresh_below     False  False  False  False  False  False  False  False  
5  Ba138_thresh_above     False  False  False  False  False  False  False  False  
6  Ba138_gthresh_below    False  False  False  False  False  False  False  False  
7  Ba138_gthresh_above    False  False  False  False  False  False  False  False  
8  Sr88_gthresh_below     False  False  False  False  False  False  False  False  
9  Sr88_gthresh_above     False  False  False  False  False  False  False  False  

Creating Useful Filters

Threshold filters alone can be inherently 'fragmented' - i.e. if a region of data oscilates either side of the threshold value, this will produce lots of filter 'fragments'. These aren't particularly useful, and we should remove them. These 'fragmented' regions can be either removed or excluded, using the filter_defragment function.

Threshold filters are also rather naive. For example, we know that in spot measurements that ablate into the sample there is a degree of signal mixing from the side walls of the ablation pit. Therefore, as soon as you hit a contaminant phase, there will be trace of this contaminant signal in all data collected after you've hit it. To get around this, we can exclude data thatare 'down-hole' of contaminant signals. We can do this using the filter_exclude_downhole function.

Filter Selection

In these samples, there are two types of elements - those that we know whould be homogeneous (Ba, Sr, Mn, Al) and those that we are expecting to be heterogeneous (Mg). By looking at the trace plots, it's evident that the main source of Mg variability is natural heterogeneity, as there's relatively little correlation with the contaminant indicators (Mn, Al, Ba). Therefore, we're going to use different filter combinations for these types of elements.

For homogeneous-type elements, we'll apply all the filters, defragment, and exclude down-hole regions.|


In [25]:
# make a filter for contaminant-prone elements
dat.filter_defragment(7, filt='Al27_thresh_below & Mn55_thresh_below & Ba138_thresh_below & Ba138_gthresh_below & Sr88_gthresh_below')
dat.filter_exclude_downhole(5, '8_defrag_include_7')
# activate filter
dat.filter_on('9_downhole_excl_5', ['Ba138', 'Al27', 'Mn55'], show_status=True)


Subset: All Samples

n  Filter Name            Mg24   Mg25   Al27   Ca43   Ca44   Mn55   Sr88   Ba138  
0  Al27_thresh_below      False  False  False  False  False  False  False  False  
1  Al27_thresh_above      False  False  False  False  False  False  False  False  
2  Mn55_thresh_below      False  False  False  False  False  False  False  False  
3  Mn55_thresh_above      False  False  False  False  False  False  False  False  
4  Ba138_thresh_below     False  False  False  False  False  False  False  False  
5  Ba138_thresh_above     False  False  False  False  False  False  False  False  
6  Ba138_gthresh_below    False  False  False  False  False  False  False  False  
7  Ba138_gthresh_above    False  False  False  False  False  False  False  False  
8  Sr88_gthresh_below     False  False  False  False  False  False  False  False  
9  Sr88_gthresh_above     False  False  False  False  False  False  False  False  
10 defrag_include_7       False  False  False  False  False  False  False  False  
11 downhole_excl_5        False  False  True   False  False  True   False  True   


In [26]:
_ = dat.data[sample].tplot(filt=True)


For heterogeneous Mg, which seems to be relatively insensitive to the other 'contaminants', we're only going to exclude high Al/Ca regions, which is indicative of ablating through the test into the underlying Al-rich carbon tape they are mounted on.


In [27]:
# make a filter for Mg (less strict)
dat.filter_defragment(15, filt='Al27_thresh_below')
dat.filter_exclude_downhole(5, '12_defrag_include_15')
# # activate filter
dat.filter_on('13_exclude_downhole_5', ['Mg24', 'Mg25', 'Sr88'], show_status=True)


Subset: All Samples

n  Filter Name            Mg24   Mg25   Al27   Ca43   Ca44   Mn55   Sr88   Ba138  
0  Al27_thresh_below      False  False  False  False  False  False  False  False  
1  Al27_thresh_above      False  False  False  False  False  False  False  False  
2  Mn55_thresh_below      False  False  False  False  False  False  False  False  
3  Mn55_thresh_above      False  False  False  False  False  False  False  False  
4  Ba138_thresh_below     False  False  False  False  False  False  False  False  
5  Ba138_thresh_above     False  False  False  False  False  False  False  False  
6  Ba138_gthresh_below    False  False  False  False  False  False  False  False  
7  Ba138_gthresh_above    False  False  False  False  False  False  False  False  
8  Sr88_gthresh_below     False  False  False  False  False  False  False  False  
9  Sr88_gthresh_above     False  False  False  False  False  False  False  False  
10 defrag_include_7       False  False  False  False  False  False  False  False  
11 downhole_excl_5        False  False  True   False  False  True   False  True   
12 defrag_include_15      False  False  False  False  False  False  False  False  
13 downhole_excl_5        True   True   False  False  False  False  True   False  


In [28]:
_ = dat.data[sample].tplot(filt=True)


We now have two filters, one active for the homogeneous elements, and one for the heterogeneous Mg/Ca. This will do a pretty good job of selecting appropriate data regions.

Let's check what the filters are actually doing:


In [29]:
dat.trace_plots([a for a in dat.analytes if 'Ca' not in a], filt=True)


Drawing Plots: 100%|██████████| 58/58 [00:13<00:00,  4.53it/s]

In these plots, data that are excluded by filters are greyed out.

The filters are doign a pretty good job, but there's still some variability in the Ba, which suggests some contaminant signals have not been removed.

To fix this, we can go one step further:

Optimise Selection

Ba/Ca is particularly contaminant-prone in these samples. To remove these contaminants, we will optimise the data selection such that both the mean concentration and RSD of Ba/Ca is minimised. This should identify contaminant-free regions.

We apply this optimised selection to Ba/Ca, Al/Ca, Mn/Ca and Sr/Ca. Sr/Ca is not commonly treated as a contaminant-prone element. However, we know from culture work that 'pristine' Sr/Ca should be homogeneous through the test. Looking at the traces for these samples, Sr/Ca exhibits distinct variability, which tends to correlate with Ba/Ca, Mn/Ca and Al/Ca, and increase with ablation time. Thus, it seems that Sr/Ca is prone to contamination in these samples, and we will select it using our contaminant filter.

(This takes a few minutes to run.)


In [30]:
# Optimise Ba138 selection to minimise mean and rsd
dat.optimise_signal(['Ba138'], 9, x_bias=0.2)


Optimising Data selection: 100%|██████████| 52/52 [03:55<00:00,  4.34s/it]
A Few Problems:
412_8_2: Optmisation failed. No contiguous data regions longer than 9 points.
412_11_2: Optmisation failed. No contiguous data regions longer than 9 points.
412_11_3: Optmisation failed. No contiguous data regions longer than 9 points.
412_14_1: Optmisation failed. No contiguous data regions longer than 9 points.
412_14_3: optimisation failed using threshold_mode='kde_first_max', falling back to 'kde_max'
412_17_3: Optmisation failed. No contiguous data regions longer than 9 points.
414_1_2: Optmisation failed. No contiguous data regions longer than 9 points.
414_12_1: Optmisation failed. No contiguous data regions longer than 9 points.
414_12_2: Optmisation failed. No contiguous data regions longer than 9 points.
414_12_3: Optmisation failed. No contiguous data regions longer than 9 points.
414_12_4: Optmisation failed. No contiguous data regions longer than 9 points.
414_13_4: Optmisation failed. No contiguous data regions longer than 9 points.
424_1_1: Optmisation failed. No contiguous data regions longer than 9 points.
424_1_2: Optmisation failed. No contiguous data regions longer than 9 points.
424_2_2: Optmisation failed. No contiguous data regions longer than 9 points.
424_3_1: Optmisation failed. No contiguous data regions longer than 9 points.
424_4_3: Optmisation failed. No contiguous data regions longer than 9 points.
424_5_1: Optmisation failed. No contiguous data regions longer than 9 points.
424_5_3: optimisation failed using threshold_mode='kde_first_max', falling back to 'kde_max'
424_9_1: Optmisation failed. No contiguous data regions longer than 9 points.
424_11_1: Optmisation failed. No contiguous data regions longer than 9 points.
424_11_2: Optmisation failed. No contiguous data regions longer than 9 points.
424_11_3: Optmisation failed. No contiguous data regions longer than 9 points.
424_11_4: Optmisation failed. No contiguous data regions longer than 9 points.
424_12_2: Optmisation failed. No contiguous data regions longer than 9 points.
424_14_2: Optmisation failed. No contiguous data regions longer than 9 points.
424_15_2: Optmisation failed. No contiguous data regions longer than 9 points.
424_19_1: Optmisation failed. No contiguous data regions longer than 9 points.
424_20_2: Optmisation failed. No contiguous data regions longer than 9 points.

  *** Check Optimisation Plots ***

A few warnings here, telling us that it can't find a good region for some of the samples. A good idea to have a look at the optimisation plots, to check things look sensible.


In [31]:
dat.optimisation_plots()


Drawing Plots:  65%|██████▌   | 34/52 [00:52<00:28,  1.59s/it]
Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=9, top=9.0
Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=9, top=9.0
Drawing Plots: 100%|██████████| 52/52 [01:15<00:00,  1.40s/it]

In [32]:
dat.filter_on('optimise_Ba138', ['Ba138'], show_status=True)


Subset: All Samples

n  Filter Name            Mg24   Mg25   Al27   Ca43   Ca44   Mn55   Sr88   Ba138  
0  Al27_thresh_below      False  False  False  False  False  False  False  False  
1  Al27_thresh_above      False  False  False  False  False  False  False  False  
2  Mn55_thresh_below      False  False  False  False  False  False  False  False  
3  Mn55_thresh_above      False  False  False  False  False  False  False  False  
4  Ba138_thresh_below     False  False  False  False  False  False  False  False  
5  Ba138_thresh_above     False  False  False  False  False  False  False  False  
6  Ba138_gthresh_below    False  False  False  False  False  False  False  False  
7  Ba138_gthresh_above    False  False  False  False  False  False  False  False  
8  Sr88_gthresh_below     False  False  False  False  False  False  False  False  
9  Sr88_gthresh_above     False  False  False  False  False  False  False  False  
10 defrag_include_7       False  False  False  False  False  False  False  False  
11 downhole_excl_5        False  False  True   False  False  True   False  True   
12 defrag_include_15      False  False  False  False  False  False  False  False  
13 downhole_excl_5        True   True   False  False  False  False  True   False  
14 optimise_Ba138         False  False  False  False  False  False  False  True   


In [33]:
_ = dat.data[sample].tplot(filt=True)



In [34]:
dat.sample_stats(stats=['mean'])


Calculating Stats: 100%|██████████| 58/58 [00:00<00:00, 113.14it/s]

In [35]:
ld = dat.getstats().loc['mean'] * 1e3

Save minimal export

This produces a .zip file containing all the raw data, the SRM values you used, and a record of everything you did to the data.

the zip file can be opened with latools.reproduce, to recreate your entire analysis.


In [36]:
dat.minimal_export(path='raw_data/downcore_foram_export/minimal_export.zip')

Data Comparison


In [37]:
# load reference and test data
rd = helpers.load_reference_data('downcore_reference')
td = helpers.load_reference_data('downcore_test')

# combine all data
df = rd.join(td, lsuffix='_r', rsuffix='_t')
df = df.join(ld)

In [38]:
# number of unique samples
np.unique(rd.index.levels[0]).size


Out[38]:
53

Characterise inter-replicate reproducibility in Reference data


In [39]:
# in reference data
_, rep_dists, rep_stats, _ = stats.pairwise_reproducibility(rd, plot=True)



In [40]:
rep = pd.DataFrame(rep_stats).T
rep.columns = ['50%', '95%']
rep = rep.loc[['Mg/Ca', 'Sr/Ca', 'Ba/Ca', 'Al/Ca', 'Mn/Ca'],:]

In [41]:
rep.to_csv('reproducibility_quants.csv')

In [42]:
# in test data
_ = stats.pairwise_reproducibility(rd, plot=True)



In [43]:
# in latools data
_ = stats.pairwise_reproducibility(ld, plot=True)


Compare Data


In [44]:
fig, axs = plots.bland_altman_plots(df, rep_stats)

fig.savefig('Figures/downcore_comparison.pdf')
fig.savefig('Figures/downcore_comparison.png', dpi=200)



In [45]:
# fig, axs = plots.comparison_plots(df)

In [46]:
# fig, axs = plots.residual_plots(df, rep_stats)

Comparison Statistics


In [47]:
stat = stats.comparison_stats(df)

In [48]:
stat.to_csv('Stats/downcore_stats.csv')

In [49]:
stat['Test User']


Out[49]:
Residual Summary Residual Regression Kolmogorov-Smirnov
N Median LQ IQR UQ Slope Slope t Slope p Intercept Intercept t Intercept p R2 KS p
Mg/Ca 149 0.278483 0.0947975 0.499073 0.593871 0.0273585 1.20041 0.231912 0.260619 1.87873 0.0622613 0.00970743 0.167785 0.0264765
Sr/Ca 149 -0.00562857 -0.0196323 0.0266532 0.00702097 -0.12196 -3.39093 0.00089479 0.164621 3.28086 0.00129257 0.072546 0.127517 0.164421
Ba/Ca 149 -0.072966 -0.106512 0.189704 0.0831927 0.375683 2.07146 0.0400641 -0.465135 -1.50099 0.135503 0.0283621 0.0939597 0.50647
Al/Ca 149 0.0336077 0.00764255 0.107416 0.115058 0.411013 4.54974 1.11622e-05 0.0554501 3.40844 0.000843259 0.123435 0.248322 0.000153847
Mn/Ca 149 0.0107187 -0.00044312 0.0384314 0.0379883 -0.0649723 -3.1429 0.00202379 0.0289588 5.34251 3.41845e-07 0.0629651 0.228188 0.000671689

In [50]:
stat['LAtools']


Out[50]:
Residual Summary Residual Regression Kolmogorov-Smirnov
N Median LQ IQR UQ Slope Slope t Slope p Intercept Intercept t Intercept p R2 KS p
Mg/Ca 140 0.0162506 -0.32826 0.55705 0.228791 -0.0751463 -1.67476 0.0962464 0.408752 1.51985 0.130837 0.0199199 0.0571429 0.972362
Sr/Ca 140 -0.012362 -0.0468864 0.050416 0.00352955 -0.0832921 -1.18669 0.237389 0.0879236 0.896282 0.371663 0.0101015 0.178571 0.0199533
Ba/Ca 123 -0.0612877 -0.243261 0.289596 0.0463353 -0.320938 -7.03437 1.29397e-10 0.392867 5.23268 7.11755e-07 0.290249 0.170732 0.0490051
Al/Ca 133 -0.0056361 -0.043066 0.0494936 0.00642762 -0.896963 -61.7919 9.10544e-99 0.0333512 13.9097 1.41613e-27 0.966829 0.24812 0.000424295
Mn/Ca 133 -0.0129255 -0.0335687 0.0301464 -0.00342228 -0.563056 -11.8196 2.1683e-22 0.027133 5.74961 5.98648e-08 0.516074 0.293233 1.4807e-05