A python library called pySPM was developped in order to read and parse data from SPM and ToF-SIMS instruments. The first thing is to have python installed.
The quick way is to install anaconda: https://www.anaconda.com/download/ It will install python and all important packages for you.
I personally prefer to install the original python from https://www.python.org/downloads/
Make sure during the installation of python that you click the checkbox "add Python to PATH" or something similar.
The library can be installed with pip. On windows open a terminal (click on ⊞ Win+R and type cmd
). Inside the terminal type:
pip install pySPM
or if you already have it, you can update it to the latest stable version
pip install -U pySPM
The library is also available on github https://github.com/scholi/pySPM
OK, now that the library is installed we will need an interface to type python code. There is many different possibilities depending on the tase of the users. Please have a look at:
Probably the easiest way for newcomer and the best way for data analysis. If you installed python via anaconda you might find a shortcut for jupyter notebook. Otherwise get a windows terminal (click on ⊞ Win+R and type cmd
) and then type:
jupyter notebook
If an error occurs, please try to type pip install notebook
first.
This should opens a new page on your browser. Then click on the top right corner on the button "New ▼", then choose python 3.
This will open a new notebook where you can type python code and run it.
This tutorial is compatible with user having NO experience at all with python. So let's start with the basics.
Python understand each line as a statement. So there is no special character needed to terminate a command.
Tabulations or spaces defines block (we will see that later).
Finally and importantly everything written after a #
is treated as a comment till the end of the line.
Another important thing to undestand with python is a list. They are defined by the square brackets and can be of mixed types:
mylist = [1, 2, "test", 3.2]
You can acces the element of a list by it's index (first element has index 0, the second 1, …)
second_element = mylist[1]
print(second_element)
and the output is 2
Python have a limited number of function which can be runned and they can be extended by libraries, like the pySPM. In order to use them we should import them. There is three different ways of importing a library:
import library1
you can then access all functions of library1 as follow:
library1.func1()
for example in order to access the function func1 of library1
Sometime the name of the library is too long to type it all the time or the name might conflicts with other funtions. In this case you can import a library with an alias.
import librar2 as lib
you can then access all functions of library2 as follow:
lib.func1()
Sometime you might want to avoid to have to type the name of the library at all or you don't want to load the whole library, but only some modules.
from library3 import func1
from library3 import func2 as f
from library4 import *
so now you can dirrectly type func1()
in order to run the function func1 from library3, you can type f()
in order to run func2 from library3 and finally if library4 had a function called fun3, func4 and fun5, you can acces them by typing directly func3()
, func4()
or func5()
There are 3 main libraries that we are going to use in order to handle ToF-SIMS data. Those are:
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import pySPM
print(pySPM.__version__)
%matplotlib inline
Some dummy data are avaible with the pySPM_data library here: https://sftpweb.empa.ch/main.html?download&weblink=f5cddf602b6e8b20325b49a9ec61c48c&realfilename=pySPM_data-0.2.4-py3-none-any.whl You can install it with
pip install -U pySPM_data-0.2.4-py3-none-any.whl
In [2]:
from pySPM_data import get_data
Let's first focus on a simple spectra saved on an ITA file. For the example we are going to investigate the measurement performed on a Cysteine amino acid with Bi₃⁺ with a positive polarity. But you can change the filename with the data of your choice. Please not that in python, like most programming language, the backslash \
is treated as a special character and thus "\n" will be understood as a new line and not as "backslash n". If you run windows and want to copy paste the name of a file, add the letter r in front of the quotes as follow:
filename = r"C:\Users\ols\AppData\Local\Programs\Python\Python36\Lib\site-packages\pySPM_data\data\Cysteine_B3p_p_01_0.ita"
In [3]:
# Let's retriev the filename from the pySPM_Data library with the get_data command
filename = get_data("Cysteine_B3p_p_01_0.ita")
A = pySPM.ITA(filename)
In [4]:
A.show_summary()
You will get the measurement time in H:M:S as well as other important infomations. Then we get 3 to 4 plots on the top. The Secondary Ion image (total ions), then if available the "Video Snapshot" (not present in this particular measurement), then plots of the main parameters during the measurement if available. Those are:
On the very right you should get an image of the stage used with a red cross of the measurement position. The bottom plot should display the mass spectra.
Now let's have a deeper look at the spectra.
In case the mass callibration was perform on the iontof software, you can then plot the spectra directly:
In [5]:
# Create 3 subplots plots with a specific size. 21 seems to be the width of the notebook
fig, ax = plt.subplots(3, 1, figsize=(21, 3*3.5)) # ax is then a list of all axes
A.show_spectrum(ax=ax[0]) # plot it on top. With no other arguments, it plots the whole spectrum
A.show_spectrum(ax=ax[1], low=11.9, high=15.2) # We can zoom in by specifying the low and high limits
A.show_spectrum_around("C", ax=ax[2]); # We can zoom directly at a specific nominal mass with showSpectrumAround.
#It also displays known elements from a non-complete database
Let's try to figure out if the mass calibration was correctly performed. Let's look at well know peaks. The molecule Cysteine has formula HS-CH₂-CH-(NH₂)-COOH Instead of the nominal mass we can also use the formula
In [6]:
# Create 3 subplots plots with a specific size. 21 seems to be the width of the notebook
fig, ax = plt.subplots(2, 3, figsize=(21, 2*3.5)) # ax is then a list of all axes
ax = np.ravel(ax) # ax is a 2D array. Ravel makes it flat which allows us to type ax[0] instead of ax[0,0], ax[1] instead of ax[0,1], etc.
A.show_spectrum_around("C", ax=ax[0]);
A.show_spectrum_around("CH3", ax=ax[1]);
A.show_spectrum_around("HS", ax=ax[2]);
A.show_spectrum_around("CH3S", ax=ax[3]);
A.show_spectrum_around("COOH", ax=ax[4]);
A.show_spectrum_around("NH2COOH", ax=ax[5], include_only=["NH2COOH"]); # Sometimes there are too many peaks and you want to display only some
We can test if the peaks of various isotopes respect the abundancy ratio in order to check if the peak attribution was done correctly. Nevertheless we notice that the mass calibration is not optimale as the peak of C has a center of lower mass while CHS is the opposite.
Fortunately, the pySPM library has an auto callibration procedure to find the sf and k0 term for the mass callibration
In [7]:
fig, ax = plt.subplots(2, 2, figsize=(21,6))
sf, k0 = A.auto_mass_cal(fitting_peaks=["C","CH","CH2","CHS"])
A.show_spectrum_around(12, ax=ax[0,1], sf=sf, k0=k0);
A.show_spectrum_around(45.03, ax=ax[1,1], sf=sf, k0=k0, include=["^13CCH6N"]);
A.show_spectrum_around(12, ax=ax[0,0]);
A.show_spectrum_around(45.03, ax=ax[1,0], include=["^13CCH6N"]);
ax[0,0].set_title("With mass calibration from ITA file")
ax[0,1].set_title("With auto mass calibration");
In [8]:
fig, ax = plt.subplots(1, 2, figsize=(21, 3.5))
A.show_spectrum_around("CHS", ax=ax[0], include_only="CHS", sf=sf, k0=k0)
A.show_spectrum_around("CH^34S", ax=ax[1], include_only="CHS", sf=sf, k0=k0)
pySPM.utils.plot_isotopes("CHS", ax=ax[0]);
pySPM.utils.plot_isotopes("CHS", ax=ax[1], main=ax[0]); # main indicates which axis contains the main peak which will be used to fit amplitude and sig if not in ax
Here we can confirm that the peak CH₃S is well assigned as the amplitude of CH₃³⁴S match well with the isotopic abundancy
In [9]:
fig, ax = plt.subplots(1, 1, figsize=(21, 3.5))
v = A.show_spectrum_around(45.03, sf=sf, k0=k0, include=["^13CCH6N"], exclude=["BO2H2","C2FH2"], dofit=True, ax=ax);
You can now retrieve the fitting parameters (here the variable v). You can also use a pandas DataFram to work more efficiently with the result. Here the Peak intensity will be refected by the Area
In [10]:
import pandas as pd
values = pd.DataFrame(v).transpose()
values
Out[10]:
First we adjust slightly the mass callibration as it was performed only for low masses. Here sf-5. We look at the mass M+H as we measure with a positive polarity. Two molecules can also bind in otder to form cystine which has the form 2(M-H) thus in the positive we should see a peak at 2(M-2)+H=2M-H
In [11]:
from pySPM.utils import Molecule, H
M = pySPM.utils.Molecule("C3H7O2NS")
fig, (ax, ax2) = plt.subplots(2, 1, figsize=(21,7))
A.show_spectrum(low=121, high=125, ax=ax, label="Cycteine: M+H", sf=sf-5, k0=k0);
pySPM.utils.plot_isotopes(M+H, ax=ax);
ax.legend();
A.show_spectrum(low=240.5, high=243.5, ax=ax2, label="Cyctine: 2M-H", sf=sf-5, k0=k0);
pySPM.utils.plot_isotopes(2*M-H, ax=ax2);
ax2.legend();
print(A.polarity)
In [12]:
A.get_intensity().show();
In [13]:
fig, ax = plt.subplots(1,2,figsize=(15,7))
SI = A.get_intensity()
ROI1 = 1-SI.get_bin_threshold(.3)
ROI2 = SI.get_bin_threshold(.6)
SI.show(ax=ax[0],flip=True)
p1 = ax[1].imshow(ROI1, cmap='viridis')
p2 = pySPM.utils.plotMask(ax[1], ROI2, 'g')
lines = [
mpl.lines.Line2D([0], [0], lw=4, color=mpl.cm.viridis.colors[-1]),
mpl.lines.Line2D([0], [0], lw=4, color='g'),
mpl.lines.Line2D([0], [0], lw=4, color=mpl.cm.viridis.colors[0])]
ax[1].legend(lines, ["ROI1", "ROI2","Ignored"],loc='upper center');
In [14]:
AI = pySPM.ITM(get_data("Cysteine_B3p_p_01.itm"))
sf, k0 = A.auto_mass_cal()
try:
# Reconstruction is a slow porcess, so the data are stored in order to speed up the running speed of this notebook
res = pySPM.utils.load("Intro_pySPM_tofsims", 'Cysteine_ROI')
except:
res = AI.get_raw_spectrum(ROI=[ROI1, ROI2], prog=True, sf=sf, k0=k0)
pySPM.utils.save("Intro_pySPM_tofsims", Cysteine_ROI=res)
In [15]:
# Spectrum of ROI1/ROI2 normalized by the number of pixels it contains
m = res[0]
s1 = res[1][:, 0]/np.sum(ROI1)
s2 = res[1][:, 1]/np.sum(ROI2)
fig, ax = plt.subplots(3,1, figsize=(21,7))
mask1 = s1>s2
mask2 = s2>s1
ax[0].plot(m[mask1], s1[mask1], label="ROI1")
ax[0].plot(m[mask2], s2[mask2], alpha=.5, label="ROI2")
ax[0].set_xlim((0, 130));
ax[0].axvline(pySPM.utils.get_mass("Si"), color='r', ls=':');
ax[0].axvline(pySPM.utils.get_mass("C3H8O2NS"), color='r', ls=':');
ax[0].legend()
pySPM.utils.show_peak(m, s2, 28, .05, ax=ax[1], curve_color='C1', label="ROI2");
pySPM.utils.show_peak(m, s1, 28, .05, ax=ax[1], showElts=False, curve_color='C0', label="ROI1");
ax[1].legend()
pySPM.utils.show_peak(m, s2, "C3H8O2NS", .05, ax=ax[2], curve_color='C1', label="ROI2");
pySPM.utils.show_peak(m, s1, "C3H8O2NS", .05, ax=ax[2], showElts=False, curve_color='C0', label="ROI1");
ax[2].legend();
In [16]:
filename2 = get_data("BigSmiley.ita")
B = pySPM.ITA(filename2)
In [17]:
B.show_summary()
We see a list of peaks above. We can also retrieve only the peak list as follow:
In [18]:
B.show_masses()
the ITA file save one image per scan and per channel as well as the sum of all sacan for each channel. The sum of all scan can be retrieved by the getAddedImage functions. Either by mass, by name or by channel ID. ⚠ If you have two overlapping peaks with the same mass, only the first one found will be selected.
The getAddedImageByName is a search function and will use all channels having the pattern given in argument.
In the example below you can see that A.getAddedImageByName("Si")
will return the Si AND the SIH channels.
If one whish to has only the Si, the argument strict=True
can be passed to the function
In [19]:
O = B.get_added_image_by_mass(16)
Six, ch1 = B.get_added_image_by_name("Si") # retrieve the image for Si and the list of all channel selected
Si, ch2 = B.get_added_image_by_name("Si", strict=True)
print("=== channels for image Six ===")
B.show_masses(ch1)
print("\n=== Channels for image Si ===")
B.show_masses(ch2)
fig, ax = plt.subplots(1,3,figsize=(21,7))
O.show(ax=ax[0])
Six.show(ax=ax[1], flip=True); # flip=True, flips the image upside-down
Si.show(ax=ax[2], flip=True, cmap='hot'); # cmap can be used to define the colormap. Try 'hot', 'inferno', 'viridis' and 'gray' are the most common ones
In [20]:
col = pySPM.ITA_collection(filename2)
col.show(ncols=5)
In [21]:
overlay, rgb = col.overlay(["CHO_2-","Au-","Cl-","Si-"], colors=[[1,0,0],[0,1,0],[0,0,1],[.5,.5,.5]])
fig, ax = plt.subplots(1,5,figsize=(21,5))
overlay.show(ax=ax[0], flip=True);
for i, ch in enumerate(rgb):
ch.show(ax=ax[i+1], flip=True);
In [22]:
fig, ax = plt.subplots(1,3,figsize=(21,7))
Au = col['Au-']
Au.show(ax=ax[0], flip=True, cmap='viridis')
Au.plot_profile(20,20,80,80,ax=ax[1],img=ax[0],pixels=False);
p = Au.plot_profile(45,30,70,45,ax=ax[2],img=ax[0],pixels=False,label="profile");
p0 = [0, 54, 6, .5, -54, 18, 1]
ax[2].plot(p['l'], pySPM.utils.fit.CDF(p['l'], *p0), 'C2:', label="hand pre-fit")
p0, _ = pySPM.utils.fit.CDF_fit(p['l'], p['z'], p0)
ax[2].plot(p['l'], pySPM.utils.fit.CDF(p['l'], *p0), 'r', label="fit")
ax[2].legend();
# pySPM.const.gfact is the factor between the σ value and the FWHM of a Gaussian
print("FWHM left-edge:{:.1f}nm right-edge: {:.1f}nm".format(p0[3]*1000*pySPM.const.gfact, p0[6]*1000*pySPM.const.gfact))
As we can see here we have a different edge profile on the left and on the right. Maybe this is due to thermal drift? Let's investigate that by looking at a cross-secion per scan
In [23]:
fig, ax = plt.subplots(2, 3, figsize=(21, 9))
elt = "Au"
B.get_added_image_by_name(elt)[0].show(ax=ax[0,0], flip=True, cmap='viridis', pixels=True);
x1, y1 = 50, 320
x2, y2 = 400, y1
xsec = B.get_xsection_by_mass(x1, y1, x2, y2, pySPM.utils.get_mass(elt), ax=ax[0,0]);
ax[1,0].imshow(xsec)
ax[1,0].set_xlabel("distance along profile [px]")
ax[1,0].set_ylabel("Scan number");
elt = "CHO_2"
B.get_added_image_by_name(elt)[0].show(ax=ax[0,1], flip=True, cmap='viridis', pixels=True);
xsec = B.get_xsection_by_mass(x1, y1, x2, y2, pySPM.utils.get_mass(elt), ax=ax[0,1]);
ax[1,1].imshow(xsec)
ax[1,1].set_xlabel("distance along profile [px]")
ax[1,1].set_ylabel("Scan number");
elt = "Cl"
B.get_added_image_by_name(elt)[0].show(ax=ax[0,2], flip=True, cmap='viridis', pixels=True);
xsec = B.get_xsection_by_mass(x1, y1, x2, y2, pySPM.utils.get_mass(elt), ax=ax[0,2]);
ax[1,2].imshow(xsec)
ax[1,2].set_xlabel("distance along profile [px]")
ax[1,2].set_ylabel("Scan number");
We don't see a real drift during the measurement. We can see that the asymmetry is stronger for the first few scans for the Au channel than for the rest. Let's investigate it by performing a circular_profile
In [24]:
fig, ax = plt.subplots(1, 3, figsize=(21, 7))
Au = col['Au-']
Au.show(ax=ax[0], flip=True, pixels=True)
axP = plt.subplot(133, projection='polar')
Au.circular_profile(280, 320, Ra=45, width=2, axImg=ax[0], axProfile=ax[1], axPolar=axP, A=-130, B=130, N=50, plotProfileEvery=3);
Let's try to have a look at other channels
In [25]:
substrate = B.get_added_image_by_mass([60,61,76,77])
fig, ax = plt.subplots(1,3,figsize=(21,7))
substrate.show(ax=ax[0], flip=True);
axP = plt.subplot(132, projection='polar')
axP2 = plt.subplot(133, projection='polar')
substrate.circular_profile(280, 320, Ra=45, width=3, axImg=ax[0], axPolar=axP, N=50, plotProfileEvery=3);
substrate.circular_profile(190, 320, Ra=45, width=3, axImg=ax[0], axPolar=axP2, N=50, plotProfileEvery=3);
s = 700
phi = np.radians(np.linspace(0,360, 100))
axP.plot(phi, pySPM.utils.ellipse(s*np.sqrt(2),s,phi),'--')
axP.plot(phi, pySPM.const.gfact*pySPM.utils.ellipse(s*np.sqrt(2),s,phi),'--')
axP2.plot(phi, pySPM.utils.ellipse(s*np.sqrt(2),s,phi),'--')
axP2.plot(phi, pySPM.const.gfact*pySPM.utils.ellipse(s*np.sqrt(2),s,phi),'--');
The profiles looks fine with a √2 ratio between the x and y axis for the other channel, so we can assume that it comes from the Au channel. maybe a fabrication articats? In order to prove it the measurement should be redone with sample rotated. There is so much other functions to look at that we are going to investigate something else for now.
In [26]:
# The Cs channel is just noise. So we will remove it for the PCA
if 'Cs-' in col:
del col['Cs-']
L = col.show_pca(4, loadings=True, flip=True)
PCA is very handy in order to identify defects and contamination from a sample. For example here the PC1 clearly separates the substrate Si/SiOₓ with the reste. The PC2 can separate the gold from the smiley with the contaminations The PC3 can distinguish two main contaminants: Chloride (seen in blue) and Carbone based contamination (seen in red). PC4 is just noise (substrate irregularities), so we can assume that we have 4 main composent on the sample: