This notebook provides a demonstration of the spatial metabolomics pipeline applied to a single molecule from a database. During deployment the process is repeated for all 10K database entries.
The pipeline can be broadly split into two stages:
In [35]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
import sys
sys.path.append('/Users/palmer/Documents/python_codebase/')
An entry from the database is parsed and the sumformula is extracted. We take this and the database key so we can link back later.
We also define a set of 'adducts' these are molecules that can become attached during the mass spectrometry process and so change the mass of the molecule. It's possible that some or all of the adducts are detected in the data but there's no way of knowing a priori which ones.
In [36]:
sum_formula = 'C23H45NO4' #Glycocholic acid: http://89.223.39.196:2347/substance/00138
adducts = ('H','Na','K')
charge = 1
We simulate a mass spectrum for each sum formula/adduct combination. This generates a set of isotope patterns (see http://www.mi.fu-berlin.de/wiki/pub/ABI/QuantProtP4/isotope-distribution.pdf) which can provide additional informaiton on the molecule detected. This gives us a list of m/z centres for the molecule
In [37]:
from pyMS.pyisocalc import pyisocalc
leg_str=[]
isotope_ms={}
for adduct in adducts:
isotope_ms[adduct] = pyisocalc.isodist(sum_formula+adduct,plot=False,sigma=0.01,charges=charge,resolution=200000.0,do_centroid=True)
# show spectra that we just produced
plt.figure(figsize=(20,10))
for adduct in adducts:
plt.plot(isotope_ms[adduct].get_spectrum(source='profile')[0],isotope_ms[adduct].get_spectrum(source='profile')[1])
plt.plot(isotope_ms[adduct].get_spectrum(source='centroids')[0],isotope_ms[adduct].get_spectrum(source='centroids')[1],'x')
leg_str.append('Predicted spectrum: '+adduct)
leg_str.append('Centroid points: '+adduct)
plt.xlabel('m/z')
plt.ylabel('Intenisty')
plt.legend(leg_str)
plt.show()
We parse a dataset and then search it one adduct at a time using the following sequence:
1. generate images for the peaks
2. score this image using a measure of spatial chaos (checks if image is likley noise)
3. score the distributions (should correlate with the monoisotopic)
4. score the isotope ratio (average image intensity, should match predicted intensities)
5. use scores to filter whether molecule is present
In [38]:
# Parse data
from pyIMS.hdf5.IMSdataset import IMSdataset
from pyIMS.image_measures import level_sets_measure
filename_in = '/Users/palmer/Documents/tmp_data/Ctrl3s2_SpheroidsCtrl_DHBSub_centroids_IMS.hdf5' #using a temporary hdf5 based format
IMS_dataset=IMSdataset(filename_in)
In [57]:
ppm = 2.; #parts per million - a measure of how accuracte the mass spectrometer is
nlevels = 30 # parameter for measure of chaos
measure_tol = 0.9 #heuristic tolerances for filter
iso_corr_tol = 0.6
iso_ratio_tol = 0.6
measure_value_score={}
iso_correlation_score = {}
iso_ratio_score = {}
for adduct in adducts:
# 1. Geneate ion images
mz_list = np.asarray(isotope_ms[adduct].get_spectrum(source='centroids')[0]) #get centroid mz values
tol = mz_list*ppm/1e6 # turn ppm accuracy into a mass range
ion_datacube = IMS_dataset.get_ion_image(mz_list,tol) #for each spectrum, sum the intensity of all peaks within tol of mz_list
# 2. Spatial Chaos
measure_value_score[adduct] = 1-level_sets_measure.measure_of_chaos(ion_datacube.xic_to_image((0,)),nlevels)
# 3. Score correlation with monoiso
notnull_monoiso = ion_datacube.xic[:,0] > 0 # only compare pixels with values in the monoisotopic (otherwise high correlation for large empty areas)
print isotope_ms[adduct].get_spectrum(source='centroids')[1][1:]
iso_correlation_score[adduct] = np.average([np.corrcoef(ion_datacube.xic[notnull_monoiso,0],ion_datacube.xic[notnull_monoiso,ii])[0,1] for ii in range(1,len(mz_list))] ,weights=isotope_ms[adduct].get_spectrum(source='centroids')[1][1:])
# 4. Score isotope ratio
isotope_intensity = np.asarray(isotope_ms[adduct].get_spectrum(source='centroids')[1])
image_intensities = [sum(ion_datacube.xic[:,ii]) for ii in range(0,len(mz_list))]
iso_ratio_score[adduct] = 1-np.mean(abs( isotope_intensity/np.linalg.norm(isotope_intensity) - image_intensities/np.linalg.norm(image_intensities)))
##-- Show results --##
fig, ax = plt.subplots(1,len(mz_list))
fig.set_figwidth(20)
for ii in range(0,len(mz_list)):
ax[ii].imshow(ion_datacube.xic_to_image((ii,)))
ax[ii].set_title('m/z: {:3.4f}'.format(mz_list[ii]))
plt.show()
print '{} [M+{}]. Measure of Chaos: {:3.4f}, Image Correlation: {:3.4f} Isotope Score: {:3.4f}'.format(sum_formula,adduct,measure_value_score[adduct],iso_correlation_score[adduct],iso_ratio_score[adduct])
if measure_value_score[adduct] > measure_tol and iso_correlation_score[adduct] > iso_corr_tol and iso_ratio_score[adduct] > iso_ratio_tol:
print 'Passed filters: annotate data as present'
else:
print 'Failed filters: not present'