This notebook provides a demonstration of the spatial metabolomics pipeline applied to a single molecule from a database. During deployment the process is repeated for all 10K database entries.

The pipeline can be broadly split into two stages:

  1. Predict spectral pattern of a molecule
  2. Search data and score results

In [35]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
import sys
sys.path.append('/Users/palmer/Documents/python_codebase/')

Predict Spatial Pattern

An entry from the database is parsed and the sumformula is extracted. We take this and the database key so we can link back later.

We also define a set of 'adducts' these are molecules that can become attached during the mass spectrometry process and so change the mass of the molecule. It's possible that some or all of the adducts are detected in the data but there's no way of knowing a priori which ones.


In [36]:
sum_formula = 'C23H45NO4' #Glycocholic acid: http://89.223.39.196:2347/substance/00138
adducts = ('H','Na','K')
charge = 1

We simulate a mass spectrum for each sum formula/adduct combination. This generates a set of isotope patterns (see http://www.mi.fu-berlin.de/wiki/pub/ABI/QuantProtP4/isotope-distribution.pdf) which can provide additional informaiton on the molecule detected. This gives us a list of m/z centres for the molecule


In [37]:
from pyMS.pyisocalc import pyisocalc
leg_str=[]
isotope_ms={}
for adduct in adducts:
    isotope_ms[adduct] = pyisocalc.isodist(sum_formula+adduct,plot=False,sigma=0.01,charges=charge,resolution=200000.0,do_centroid=True)

# show spectra that we just produced
plt.figure(figsize=(20,10))
for adduct in adducts:
    plt.plot(isotope_ms[adduct].get_spectrum(source='profile')[0],isotope_ms[adduct].get_spectrum(source='profile')[1])
    plt.plot(isotope_ms[adduct].get_spectrum(source='centroids')[0],isotope_ms[adduct].get_spectrum(source='centroids')[1],'x')
    leg_str.append('Predicted spectrum: '+adduct)
    leg_str.append('Centroid points: '+adduct)
plt.xlabel('m/z')
plt.ylabel('Intenisty')
plt.legend(leg_str)
plt.show()


Search data and score results

We parse a dataset and then search it one adduct at a time using the following sequence:

1. generate images for the peaks
2. score this image using a measure of spatial chaos (checks if image is likley noise)
3. score the distributions (should correlate with the monoisotopic)
4. score the isotope ratio (average image intensity, should match predicted intensities)
5. use scores to filter whether molecule is present 

In [38]:
# Parse data
from pyIMS.hdf5.IMSdataset import IMSdataset
from pyIMS.image_measures import level_sets_measure
filename_in = '/Users/palmer/Documents/tmp_data/Ctrl3s2_SpheroidsCtrl_DHBSub_centroids_IMS.hdf5' #using a temporary hdf5 based format
IMS_dataset=IMSdataset(filename_in)


False
file loaded

In [57]:
ppm = 2.; #parts per million -  a measure of how accuracte the mass spectrometer is
nlevels = 30 # parameter for measure of chaos
measure_tol = 0.9 #heuristic tolerances for filter
iso_corr_tol = 0.6
iso_ratio_tol = 0.6
measure_value_score={}
iso_correlation_score = {}
iso_ratio_score = {}
for adduct in adducts:
    # 1. Geneate ion images
    mz_list = np.asarray(isotope_ms[adduct].get_spectrum(source='centroids')[0]) #get centroid mz values
    tol = mz_list*ppm/1e6 # turn ppm accuracy into a mass range
    ion_datacube = IMS_dataset.get_ion_image(mz_list,tol) #for each spectrum, sum the intensity of all peaks within tol of mz_list
    
    # 2. Spatial Chaos 
    measure_value_score[adduct] = 1-level_sets_measure.measure_of_chaos(ion_datacube.xic_to_image((0,)),nlevels)
    
    # 3. Score correlation with monoiso
    notnull_monoiso = ion_datacube.xic[:,0] > 0 # only compare pixels with values in the monoisotopic (otherwise high correlation for large empty areas)
    print isotope_ms[adduct].get_spectrum(source='centroids')[1][1:]
    iso_correlation_score[adduct] = np.average([np.corrcoef(ion_datacube.xic[notnull_monoiso,0],ion_datacube.xic[notnull_monoiso,ii])[0,1] for ii in range(1,len(mz_list))] ,weights=isotope_ms[adduct].get_spectrum(source='centroids')[1][1:])
    
    # 4. Score isotope ratio
    isotope_intensity = np.asarray(isotope_ms[adduct].get_spectrum(source='centroids')[1])
    image_intensities = [sum(ion_datacube.xic[:,ii]) for ii in range(0,len(mz_list))]
    iso_ratio_score[adduct] = 1-np.mean(abs( isotope_intensity/np.linalg.norm(isotope_intensity) - image_intensities/np.linalg.norm(image_intensities)))
    
    ##-- Show results --##
    fig, ax = plt.subplots(1,len(mz_list))
    fig.set_figwidth(20)
    for ii in range(0,len(mz_list)):
        ax[ii].imshow(ion_datacube.xic_to_image((ii,)))
        ax[ii].set_title('m/z: {:3.4f}'.format(mz_list[ii]))
    plt.show()
    print '{} [M+{}]. Measure of Chaos: {:3.4f}, Image Correlation: {:3.4f} Isotope Score: {:3.4f}'.format(sum_formula,adduct,measure_value_score[adduct],iso_correlation_score[adduct],iso_ratio_score[adduct])
    if measure_value_score[adduct] > measure_tol and iso_correlation_score[adduct] > iso_corr_tol and iso_ratio_score[adduct] > iso_ratio_tol:
        print 'Passed filters: annotate data as present'
    else:
        print 'Failed filters: not present'


inconsistent
Warning:     No more knots can be added because the number of B-spline coefficients
    already exceeds the number of data points m. Probably causes: either
    s or m too small. (fp>s)
	kx,ky=1,1 nx,ny=42,42 m=1580 fp=35860932.355238 s=0.000000
[ 24.96822204   3.64074852   0.34201998]
C23H45NO4 [M+H]. Measure of Chaos: 0.9997, Image Correlation: 0.6912 Isotope Score: 0.9858
Passed filters: annotate data as present
inconsistent
Warning:     No more knots can be added because the number of B-spline coefficients
    already exceeds the number of data points m. Probably causes: either
    s or m too small. (fp>s)
	kx,ky=1,1 nx,ny=19,19 m=280 fp=22432.146756 s=0.000000
[ 24.9583166    3.64074561   0.34202009]
C23H45NO4 [M+Na]. Measure of Chaos: 0.9992, Image Correlation: -0.0020 Isotope Score: 0.6839
Failed filters: not present
inconsistent
Warning:     No more knots can be added because the number of B-spline coefficients
    already exceeds the number of data points m. Probably causes: either
    s or m too small. (fp>s)
	kx,ky=1,1 nx,ny=19,16 m=229 fp=3351.638609 s=0.000000
[  2.62606183e+01   1.02700166e+01   2.12985507e+00   2.69403259e-01
   1.47568951e-02]
C23H45NO4 [M+K]. Measure of Chaos: 0.9994, Image Correlation: 0.0292 Isotope Score: 0.6247
Failed filters: not present