Example of processing FT-ICR data

First load some libraries


In [1]:
import array
import os.path as op
import numpy as np

import matplotlib as mpl
import matplotlib.pylab as plt
%matplotlib inline

locate the data-set which should be in the repository


In [2]:
ls FTICR-Files/ESI_pos_Ubiquitin_000006.d


ESI_pos_150_3000.m/ desktop.ini*        fid*

so now we can read it


In [3]:
BASE = 'FTICR-Files/ESI_pos_Ubiquitin_000006.d'

In [4]:
F = open(  op.join(BASE, 'fid'), 'rb')
tbuf = F.read()
fid = np.array(array.array('i',tbuf))
F.close()

In [5]:
len(fid)


Out[5]:
4194304

In [6]:
fid[1000:1010]


Out[6]:
array([ 1137,   125,  1623,   952,  -138,  -366,  -857, -1464, -1604, -1247])

In [7]:
plt.plot(fid)


Out[7]:
[<matplotlib.lines.Line2D at 0x10825dad0>]

Fourier processing

small complication

There are 2 fft packages available

  • numpy.fft
    • simple, basic, efficient
  • scipy.fftpack
    • slightly faster, more complex

we are going to use numpy.fft


In [8]:
sp = np.fft.fft(fid)

In [9]:
print sp[0]
len(sp)


(-2681443+0j)
Out[9]:
4194304

In [10]:
# default size of the figures
mpl.rcParams['figure.figsize'] = [10.0,8.0]
# this helps matplotlib to draw the huge vectors we're using (4-8Mpoints)
mpl.rcParams['agg.path.chunksize'] = 100000

In [11]:
plt.plot(sp)


/Users/mad/anaconda/lib/python2.7/site-packages/numpy/core/numeric.py:462: ComplexWarning: Casting complex values to real discards the imaginary part
  return array(a, dtype, copy=False, order=order)
Out[11]:
[<matplotlib.lines.Line2D at 0x10e63ba10>]

3 remarks

  • imaginary discarded (eventually silently)
  • phase
  • symetry

1 syntax remark


In [12]:
from numpy.fft import fft, rfft
sp = rfft(fid)
print sp[0]
len(sp)


(-2681443+0j)
Out[12]:
2097153

In [13]:
plt.plot(abs(sp))     # absolute value is modulus


Out[13]:
[<matplotlib.lines.Line2D at 0x10e73d0d0>]

In [14]:
# let's zoom in
plt.plot(abs(sp[715000:725000]))


Out[14]:
[<matplotlib.lines.Line2D at 0x10e7daed0>]

In [15]:
# without the abs() we see the phase rotating
plt.plot(sp[720000:722000])
plt.figure()
# we rotate by 90° the phase
plt.plot(1j*sp[720000:722000])


Out[15]:
[<matplotlib.lines.Line2D at 0x11bb8e710>]

2 possible improvements

  • improved resolution by apodisation
  • recovery of the full resolution by zerofilling

    ### apodisation


In [16]:
plt.plot(np.hamming(100))


Out[16]:
[<matplotlib.lines.Line2D at 0x11bfcc590>]

there is also blackman hanning bartlett...


In [17]:
np.blackman.func_name


Out[17]:
'blackman'

In [18]:
for apod in (np.blackman, np.hamming, np.hanning, np.bartlett):
    plt.plot(apod(100),label=apod.func_name)
plt.legend()


Out[18]:
<matplotlib.legend.Legend at 0x11be57f10>

In [19]:
sp2 = rfft( np.hamming(len(fid))*fid )

In [20]:
plt.plot(abs(sp2))
plt.figure()
plt.plot(abs(sp2[720000:722000]))


Out[20]:
[<matplotlib.lines.Line2D at 0x11c497c50>]

Notice how lineshape is improved, and the isotopic pattern intensities restored

We can superimpose both processing, zooming on a line you can see that FWHM is sligthly worse when apodized.

Theory is 1.5 for hamming window, but does not really show here.


In [21]:
plt.plot(abs(sp)/max(sp))
plt.plot(abs(sp2)/max(sp2))
plt.xlim(721050,721080)


Out[21]:
(721050, 721080)

zerofilling

zerofilling consists in adding zero at the end of the FID to interpolate points in the spectrum and smooth the result

( and an example of slighlty different numpy arithmetics)


In [22]:
n = len(fid)
fidzf = np.zeros(2*n)

In [23]:
fidzf[:n] = fid[:]
fidzf[:n] *= np.hamming(n)
spzf = abs(rfft( fidzf ))
plt.plot(abs(spzf[2*720000:2*722000]))


Out[23]:
[<matplotlib.lines.Line2D at 0x11c4028d0>]

lineshape and isotopic pattern intensities further improved !


In [24]:
plt.subplot(211)
plt.plot(abs(sp2[721100:721200]), label='direct')
plt.legend()
plt.subplot(212)
plt.plot(abs(spzf[2*721100:2*721200]), label='with zerofilling')
plt.legend()


Out[24]:
<matplotlib.legend.Legend at 0x11f0a5f10>

m/z Calibration

In FT-ICR, the frequency $f$ depends on the magnetic field $B_o$ the charge $z$ and the mass $m$ : $$ f = \frac{B_o z}{m} \quad \Rightarrow \quad m/z = \frac{B_o}{f}$$ We are going to use a sligthly extended equation which takes cares of experimental imperfections: $$ m/z = \frac{M_1}{M_2+f}$$ where $M_1$ and $M_2$ are constants determined precisely by a calibration procedure

parameter files

FT-ICR parameters are stored in the .method files


In [25]:
ls 'FTICR-Files/ESI_pos_Ubiquitin_000006.d/ESI_pos_150_3000.m/apexAcquisition.method'


FTICR-Files/ESI_pos_Ubiquitin_000006.d/ESI_pos_150_3000.m/apexAcquisition.method*

utility

parameter are stored in a XML file, we can look through the file and get the parameters manually, but it is also possible to simply build a dictionary from the entries.

Here a simple mini parser, using standard python library


In [26]:
from xml.dom import minidom
def read_param(filename):
    """
        Open the given file and retrieve all parameters from apexAcquisition.method
        NC is written when no value for value is found
        
        structure : <param name = "AMS_ActiveExclusion"><value>0</value></param>
       
        read_param returns  values in a dictionnary
    """
    xmldoc = minidom.parse(filename)
    
    x = xmldoc.documentElement
    pp = {}
    children = x.childNodes
    for child in children:
        if (child.nodeName == 'paramlist'):
            params = child.childNodes
            for param in params:
                if (param.nodeName == 'param'):
                    k = str(param.getAttribute('name'))
                    for element in param.childNodes:
                       if element.nodeName == "value":
                           try:
                               v = str(element.firstChild.toxml())
                               #print v
                           except: 
                               v = "NC"
                    pp[k] = v
    return pp

In [27]:
pfile = op.join(BASE, 'ESI_pos_150_3000.m', 'apexAcquisition.method')
param = read_param(pfile)
print (param['TD'])


4194304

In [28]:
# print the 10 first entries
print(param.keys()[:10])


['IN_23', 'IN_22', 'IN_21', 'IN_20', 'IN_27', 'IN_26', 'IN_25', 'IN_24', 'VPV2_5', 'VPV2_15']

In [29]:
# we find the parameters in the parameter file
sw = float(param['SW_h_Broadband'])
ml1 = float(param['ML1'])
ml2 = float(param['ML2'])
print('Spectral width: {}'.format(sw) )
print('constant M_1: {}'.format(ml1))
print('constant M_2: {}'.format(ml2))


Spectral width: 625000.0
constant M_1: 184270017.834
constant M_2: 5.03916110231

In [30]:
faxis = np.linspace(0, sw, len(spzf))    # the freq axis from 0 to sw

In [31]:
mzaxis = ml1/(ml2+faxis)    # and the mzaxis

In [32]:
plt.plot(mzaxis, spzf)
plt.xlim(856.5, 858.5)
plt.xlabel("$m/z$")


Out[32]:
<matplotlib.text.Text at 0x11f635c10>

we can zoom on the monoisotopic peak, and try to determine precisely its value.

We know the theorical mass is 856.969496 (for the $z=10$ state).


In [33]:
plt.plot(mzaxis, spzf)
plt.xlim(856.9, 857.0)
plt.ylim(0,6E7)
plt.xlabel("$m/z$")


Out[33]:
<matplotlib.text.Text at 0x11f6fd090>

In [34]:
# these functions convert back and forth from index to m/z
def itomz(val, N):
    """transform index to m/z for N points,
    using current ml1 ml2 and sw
    """
    f = sw*val/N
    return ml1/(ml2+f)
def mztoi(m, N):
    """transform m/z to index for N points,
    using current ml1 ml2 and sw
    """
    f = ml1/m - ml2
    return N*f/sw
theo = 856.9694962104804
def ppm(theorical, measured):
    return 1E6*(measured-theorical)/measured

we can compute the vector coordinates of the previous zoom window


In [35]:
start = int( mztoi(857.0, len(spzf)))
end = int( mztoi(856.9, len(spzf)))
print("start: {}  - end: {}".format(start,end))
plt.plot(spzf[start:end])
top = spzf[start:end].argmax() + start
meas1 = itomz(top,len(spzf))
print("maximum is at: {}, for m/z: {}".format(top,meas1))
print("For an error of {:.3f} ppm".format(ppm(theo, meas1)))


start: 1442924  - end: 1443093
maximum is at: 1442977, for m/z: 856.968940676
For an error of -0.648 ppm

In [36]:
# peak barycenter
bary = 0.0
s = 0
for i in range(-3, +4):
    bary += i*spzf[i+top]
    s += spzf[i+top]
mbary = itomz(bary/s+top, len(spzf))
print ("peak barycenter is at {}".format(mbary))
print("For an error of {:.3f} ppm".format(ppm(theo, mbary)))


peak barycenter is at 856.969219465
For an error of -0.323 ppm

packing it up

Now we can simply build a function doing all this at once :


In [37]:
import glob
def read_fticr(folder):
    """
    load and process the data Solarix Apex FTICR file found in folder
    uses the calibration from the parameter file
    
    eg:
    spectrum,axis = read_fticr('FTICR/Files/bruker ubiquitin file/ESI_pos_Ubiquitin_000006.d')
    """
    # find and load parameters
    L = glob.glob(op.join(folder,"*","apexAcquisition.method"))
    if len(L)>1:
        raise Exception( "You have more than 1 apexAcquisition.method file in the %s folder, using the first one"%folder )
    elif len(L) == 0: 
        raise Exception( "You don't have any apexAcquisition.method file in the  %s folder, please double check the path"%folder )
    param = read_param(L[0])
    
    # load data
    n = int(param['TD'])
    fidzf = np.zeros(2*n)
    with  open(  op.join(BASE, 'fid'), 'r') as F:  # 'with' a better way of reading a file
        tbuf = F.read(4*int(param['TD']))
        fidzf[:n] = np.array(array.array('i',tbuf)) # [:] is less memory intensive

    # process
    fidzf[:n] *= np.hamming(n)
    spectrum = abs( rfft( fidzf ) )

    # calibrate
    sw = float(param['SW_h_Broadband'])
    ml1 = float(param['ML1'])
    ml2 = float(param['ML2'])
    faxis = np.linspace(0, sw, len(spectrum))    # the freq axis from 0 to sw
    mzaxis = ml1/(ml2+faxis)    # and the mzaxis
    return spectrum, mzaxis

In [38]:
spectrum,axis = read_fticr('FTICR-Files/ESI_pos_Ubiquitin_000006.d')

In [39]:
# %matplotlib
plt.plot(axis, spectrum)
plt.xlabel("$m/z$")
plt.xlim(200,2500)


Out[39]:
(200, 2500)

In [40]:
plt.plot(axis, spectrum)
plt.xlabel("$m/z$")
plt.xlim(650,1300)
plt.title("Zooming in")


Out[40]:
<matplotlib.text.Text at 0x11fb4e0d0>

In [41]:
plt.plot(axis, spectrum)
plt.xlabel("$m/z$")
plt.xlim(855,865)
plt.title("Zooming further")


Out[41]:
<matplotlib.text.Text at 0x11f45b310>

Doing the same thing with SPIKE

SPIKE is a complete software suite we're building to do this, and much more

It is distributed here : https://bitbucket.org/delsuc/spike

However, the program is not fully operational yet, stay tuned !


In [42]:
import sys
sys.path.append('/Users/mad/NPKV2/')  # adapt this to your own set-up
import spike
from spike.File import Solarix
%matplotlib inline


    ========================
          SPIKE
    ========================
    Version     : 0.6.4
    Date        : 12-03-2015
    ========================
Importing plugin << bcorr >>
Importing plugin << Bruker_NMR_FT >>
Importing plugin << Bucketing >>
Importing plugin << DOSY >>
Importing plugin << fastclean >>
Importing plugin << Linear_prediction >>
Importing plugin << Peaks >>
Importing plugin << rem_ridge >>
Importing plugin << sg >>
Importing plugin << test >>
Importing plugin << urQRd >>

In [43]:
File = 'FTICR-Files/ESI_pos_Ubiquitin_000006.d'
dd = Solarix.Import_1D(File)

In [44]:
dd.unit = "sec"
dd.display()


Out[44]:
Dim 1
Axis F1 : FT-ICR axis at 625.000000 kHz,  4194304 real points,  from mz = 3000.000   to m/z =  231.024  R max (M=400) = 2422464

In [45]:
dd.hamming().zf(2).rfft().modulus()
dd.unit = "m/z"
dd.display(zoom=(500,2000))


Out[45]:
Dim 1
Axis F1 : FT-ICR axis at 625.000000 kHz,  4194304 real points,  from mz = 3000.000   to m/z =  231.024  R max (M=400) = 2422464

In [46]:
# wrong calibration !
dd.axis1.calibA = float(dd.params["ML1"])
dd.axis1.calibB = float(dd.params["ML2"])
dd.display(zoom=(500,2000))


Out[46]:
Dim 1
Axis F1 : FT-ICR axis at 625.000000 kHz,  4194304 real points,  from mz = 3000.000   to m/z =  294.830  R max (M=400) = 3091537

In [47]:
dd.pp(threshold=1E7)
print ("detected %d peaks"%len(dd.peaks))


10000000.0
detected 709 peaks

In [48]:
dd.display(zoom=(856.5, 858.5))
dd.display_peaks(zoom=(856.5, 858.5))



In [49]:
p = dd.peaks[603]

In [50]:
p.report()


Out[50]:
'603, 603, 1438613.00000, 19899454.35191, 0.00'

In [51]:
p.label = "Peak 103"
p.report()  # in index
p.report(f=dd.axis1.itomz)  # provides in m/z
# string is :
# id, label, m/z, intensity, width (0 if not determined)


Out[51]:
'603, Peak 103, 859.56807, 19899454.35191, 0.00'

In [52]:
dd.centroid()
for p in dd.peaks:
    p.label = "%.5f"%(dd.axis1.itomz(p.pos))


/Users/mad/anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.py:604: OptimizeWarning: Covariance of the parameters could not be estimated
  category=OptimizeWarning)

In [53]:
# this switches to a different display mode
dd.display(zoom=(856.5, 858.5))
dd.display_peaks(zoom=(856.5, 858.5), peak_label=True)