SBML annotation

Annotation of model components with meta-information is an important step during model building. Annotation is the process of adding metadata to the model and model components. These metadata are mostly from biological ontologies or biological databases.

sbmlutils provides functionality for annotating SBML models which can be used during model creation or later on to add annotations to SBML models. Annotations have the form of RDF triples consisting of the model component to annotate (subject), the relationship between model component and annotation term (predicate), and a term which describes the meaning of the component (object), which often comes from an ontology of defined terms.

The predicates come from a clearly defined set of predicates, the MIRIAM qualifiers (https://co.mbine.org/standards/qualifiers). Ideally the objects, i.e. annotations, are defined in an ontology which is registered at https://identifiers.org (see https://registry.identifiers.org/registry for available resources).

For more information of the importance of model annotations and best practises we refer to

Neal, M.L., König, M., Nickerson, D., Mısırlı, G., Kalbasi, R., Dräger, A., Atalag, K., Chelliah, V., Cooling, M.T., Cook, D.L. and Crook, S., 2019. Harmonizing semantic annotations for computational models in biology. Briefings in bioinformatics, 20(2), pp.540-550. 10.1093/bib/bby087

Le Novère, N., Finney, A., Hucka, M., Bhalla, U.S., Campagne, F., Collado-Vides, J., Crampin, E.J., Halstead, M., Klipp, E., Mendes, P. and Nielsen, P., 2005. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature biotechnology, 23(12), pp.1509-1515. https://www.nature.com/articles/nbt1156

Annotations in sbmlutils consist of associating (predictate, object) tuples to model components. For instance to describe that a species in the model is a certain entry from CHEBI, we associate (BQB.IS, "chebi/CHEBI:28061") with the species. In addition the special subset of annotations to the Systems Biology Ontology (SBO) can be directly set on all model components via the sboTerm attribute.


In [1]:
%load_ext autoreload
%autoreload 2

Annotate during model creation

In the example the model is annotated during the model creation process. Annotations are encoded as simple tuples consisting of MIRIAM identifiers terms and identifiers.org parts. The list of tuples is provided on object creation. In the example we annotate a species


In [4]:
from sbmlutils.units import *
from sbmlutils.factory import *
from sbmlutils.annotation import *
from sbmlutils.modelcreator.creator import CoreModel
from sbmlutils.validation import check_doc

model_dict = {
    'mid': 'example_annotation',
    'compartments': [
        Compartment(sid="C", value=1.0, sboTerm=SBO_PHYSICAL_COMPARTMENT)
    ],
    'species': [
        Species(sid='gal', compartment='C', initialConcentration=3.0,
                name='D-galactose', sboTerm=SBO_SIMPLE_CHEMICAL,
                annotations=[
                    (BQB.IS, "bigg.metabolite/gal"),  # galactose
                    (BQB.IS, "chebi/CHEBI:28061"),  # alpha-D-galactose
                    (BQB.IS, "vmhmetabolite/gal"),
                ]
        )
    ]
}

# create model and print SBML
core_model = CoreModel.from_dict(model_dict=model_dict)
print(core_model.get_sbml())

# validate model
check_doc(core_model.doc, units_consistency=False);


ERROR:root:Model units should be provided for a model, i.e., set the 'model_units' field on model.
<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" xmlns:comp="http://www.sbml.org/sbml/level3/version1/comp/version1" level="3" version="1" comp:required="true">
  <model metaid="meta_example_annotation" id="example_annotation" name="example_annotation">
    <listOfCompartments>
      <compartment sboTerm="SBO:0000290" id="C" spatialDimensions="3" size="1" constant="true"/>
    </listOfCompartments>
    <listOfSpecies>
      <species metaid="meta_gal" sboTerm="SBO:0000247" id="gal" name="D-galactose" compartment="C" initialConcentration="3" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false">
        <annotation>
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
            <rdf:Description rdf:about="#meta_gal">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="https://identifiers.org/bigg.metabolite/gal"/>
                  <rdf:li rdf:resource="https://identifiers.org/chebi/CHEBI:28061"/>
                  <rdf:li rdf:resource="https://identifiers.org/vmhmetabolite/gal"/>
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
      </species>
    </listOfSpecies>
  </model>
</sbml>


--------------------------------------------------------------------------------
<SBMLDocument>
valid                    : TRUE
check time (s)           : 0.001
--------------------------------------------------------------------------------

For a more complete example see model_with_annotations.py which creates annotations of the form

    Species(sid='e__gal', compartment='ext', initialConcentration=3.0,
                substanceUnit=UNIT_KIND_MOLE, boundaryCondition=True,
                name='D-galactose', sboTerm=SBO_SIMPLE_CHEMICAL,
                annotations=[
                    (BQB.IS, "bigg.metabolite/gal"),  # galactose
                    (BQB.IS, "chebi/CHEBI:28061"),  # alpha-D-galactose
                    (BQB.IS, "vmhmetabolite/gal"),
                ]
            ),

In [5]:
import os
from sbmlutils.modelcreator.creator import Factory
factory = Factory(modules=['model_with_annotations'],
                  target_dir='./models')
[_, _, sbml_path] = factory.create()

# check the annotations on the species
import libsbml
doc = libsbml.readSBMLFromFile(sbml_path)  # type: libsbml.SBMLDocument
model = doc.getModel()  # type: libsbml.Model
s1 = model.getSpecies('e__gal')  # type: libsbml.Species
print(s1.toSBML())


WARNING:sbmlutils.annotation.annotator:https://en.wikipedia.org/wiki/Cytosol does not conform to http(s)://identifiers.org/collection/id

--------------------------------------------------------------------------------
/home/mkoenig/git/sbmlutils/docs_builder/notebooks/models/annotation_example_8.xml
valid                    : TRUE
check time (s)           : 0.011
--------------------------------------------------------------------------------

SBML report created: ./models/annotation_example_8.html
<species metaid="meta_e__gal" sboTerm="SBO:0000247" id="e__gal" name="D-galactose" compartment="ext" initialConcentration="3" substanceUnits="mole" hasOnlySubstanceUnits="false" boundaryCondition="true" constant="false">
  <annotation>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
      <rdf:Description rdf:about="#meta_e__gal">
        <bqbiol:is>
          <rdf:Bag>
            <rdf:li rdf:resource="https://identifiers.org/bigg.metabolite/gal"/>
            <rdf:li rdf:resource="https://identifiers.org/chebi/CHEBI:28061"/>
            <rdf:li rdf:resource="https://identifiers.org/vmhmetabolite/gal"/>
          </rdf:Bag>
        </bqbiol:is>
      </rdf:Description>
    </rdf:RDF>
  </annotation>
</species>

Annotate existing model

An alternative approach is to annotate existing models from external annotation files. For instance we can define the annotations in an external file which we then add to the model based on identifier matching. The following annotations are written to the ./annotations/demo.xml based on pattern matching.

Annotations are written for the given sbml_type for all SBML identifiers which match the given pattern.


In [6]:
from sbmlutils.annotation.annotator import ModelAnnotator
df = ModelAnnotator.read_annotations_df("./annotations/demo_annotations.xlsx", format="xlsx")
df


Out[6]:
pattern sbml_type annotation_type qualifier resource name
0 NaN document rdf BQM_IS sbo/SBO:0000293 non-spatial continuous framework
1 ^demo_\d+$ model rdf BQM_IS go/GO:0008152 metabolic process
3 e compartment rdf BQB_IS sbo/SBO:0000290 physical compartment
4 e compartment rdf BQB_IS go/GO:0005615 extracellular space
5 e compartment rdf BQB_IS fma/FMA:70022 extracellular space
7 m compartment rdf BQB_IS sbo/SBO:0000290 physical compartment
8 m compartment rdf BQB_IS go/GO:0005886 plasma membrane
9 m compartment rdf BQB_IS fma/FMA:63841 plasma membrane
11 c compartment rdf BQB_IS sbo/SBO:0000290 physical compartment
12 c compartment rdf BQB_IS go/GO:0005623 cell
13 c compartment rdf BQB_IS fma/FMA:68646 cell
15 ^Km_\w+$ parameter rdf BQB_IS sbo/SBO:0000027 Michaelis constant
16 ^Keq_\w+$ parameter rdf BQB_IS sbo/SBO:0000281 equilibrium constant
17 ^Vmax_\w+$ parameter rdf BQB_IS sbo/SBO:0000186 maximal velocity
19 ^\w{1}__A$ species rdf BQB_IS sbo/SBO:0000247 simple chemical
20 ^\w{1}__B$ species rdf BQB_IS sbo/SBO:0000247 simple chemical
21 ^\w{1}__C$ species rdf BQB_IS sbo/SBO:0000247 simple chemical
22 ^\w{1}__\w+$ species formula NaN C6H12O6 NaN
23 ^\w{1}__\w+$ species charge NaN 0 NaN
24 ^b\w{1}$ reaction rdf BQB_IS sbo/SBO:0000185 transport reaction
25 ^v\w{1}$ reaction rdf BQB_IS sbo/SBO:0000176 biochemical reaction

In [7]:
from sbmlutils.annotation.annotator import annotate_sbml_file

# create SBML report without performing units checks
annotate_sbml_file(f_sbml="./annotations/demo.xml", 
                   f_annotations="./annotations/demo_annotations.xlsx", 
                   f_sbml_annotated="./annotations/demo_annotated.xml")

In [ ]: