A demonstration of how to use the MS2LDA annotation function


In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
basedir = '/Users/joewandy/git/ms2ldaviz/ms2ldaviz/'
sys.path.append(basedir)

import annotation.annotate_methods as annot

Batch spectra annotation

Prepare some example spectra


In [2]:
parentmass_1 = 282.1194
spectrum_1 = [
    (55.0294, 1621769.0), 
    (57.0450, 1439267.4),
    (82.0399, 2838900.0),
    (92.0243, 1328631.9),  
    (94.0398, 3758106.0),
    (106.0400, 1654438.4),  
    (108.0431, 1235484.9),  
    (108.0552, 1236242.4),
    (109.0509, 9854405.0),  
    (119.0351, 3175437.8),  
    (123.0665, 1405464.4),  
    (133.0509, 8247930.5),  
    (135.0538, 2464020.5),
    (150.0774, 152486528.0),
    (282.1199, 1746824.8),
]

# should see adenine substructure here
parentmass_2 = 382.1357
spectrum_2 = [
    (67.0544, 26619.8), 
    (94.0400, 29748.1), 
    (95.0490, 22403.4), 
    (119.0352, 120205.6), 
    (136.0618, 1572447.9), 
    (137.0458, 79233.2), 
    (188.1280, 44683.2), 
]

spectra = {
    parentmass_1: spectrum_1,
    parentmass_2: spectrum_2
}
db_name = 'MASSBANK' # either 'MASSBANK' or 'GNPS'
# db_name = 'GNPS' # either 'MASSBANK' or 'GNPS'

Call the batch annotation function on these spectra


In [3]:
results = annot.batch_annotate(spectra, db_name)

Annotation results are produced as a JSON object, which can be converted into a dictionary. The keys are:

  • A status field ('status')
  • The parent masses for each spectrum

In [4]:
for key in results:
    if key == 'status':
        print key, ' --> ', results[key]
    else:
        print key, ' --> ', results[key].keys()


status  -->  OK
382.1357  -->  [u'sub_term_probs', u'loss_match', u'taxa_term_probs', u'fragment_match', u'fragment_intensity_match', u'motif_theta_overlap', u'loss_intensity_match']
282.1194  -->  [u'sub_term_probs', u'loss_match', u'taxa_term_probs', u'fragment_match', u'fragment_intensity_match', u'motif_theta_overlap', u'loss_intensity_match']

Here we retrieve the annotation result for the first spectra


In [5]:
first = str(results.keys()[2])
spectra_annotations = results[first]
print first


282.1194

Print how many fragment and loss features can be matched for the annotation of that spectra


In [6]:
print spectra_annotations['fragment_match']
print spectra_annotations['loss_match']


13
11

Print the taxonomy terms for this spectra with probability > 0.5


In [7]:
for taxa_term, prob in spectra_annotations['taxa_term_probs']:
    if prob > 0.5:
        print taxa_term, prob


Organic compounds 0.942111481445
Chemical entities 0.642534684883

Print the substituent terms for this spectra with probability > 0.5


In [8]:
for taxa_term, prob in spectra_annotations['sub_term_probs']:
    if prob > 0.5:
        print taxa_term, prob


Hydrocarbon derivative 0.942112752614
Organooxygen compound 0.914505774734
Organic oxygen compound 0.91396304375
Azacycle 0.803701133233
Heteroaromatic compound 0.79799235299
Organonitrogen compound 0.792012537802
Organic nitrogen compound 0.791767736784
Organopnictogen compound 0.771861894106
Aromatic heteropolycyclic compound 0.66936486296
Azole 0.651194199143
Pyrimidine 0.649651442994
Oxacycle 0.629137459498
Imidolactam 0.58782404204
Organoheterocyclic compound 0.517486236384

Print the Mass2Motif annotations for this spectra with probability > 0.01


In [9]:
for motif, annotation, theta, overlap in spectra_annotations['motif_theta_overlap']:
    if theta > 0.01:
        print motif, annotation, theta, overlap


motif_126 None 0.0219303194989 0.0601832876239
motif_204 None 0.0256772367995 0.0238084786768
motif_209 None 0.0292949560077 0.0148136916315
motif_23 [Pentose (C5-sugar)-H2O] related loss –  indicative for conjugated pentose sugar - EF fits 0.406109737823 1.0
motif_231 None 0.0109650855138 0.00976855670289
motif_39 Fragments indicative adenine (C5H6N5) substructure 0.0659020425061 0.041003574207
motif_45 None 0.4061083935 0.00173971685779

In [ ]: