In medusa
, ensembles of genome-scale metabolic network reconstructions (GENREs) are represented using the medusa.Ensemble
class. To demonstrate the functionality and attributes of this class, we'll, load a test ensemble. Here, we use a function that takes the E. coli core metabolism reconstruction from cobrapy and randomly removes components to generate ensemble members.
In [1]:
import medusa
from medusa.test.test_ensemble import construct_textbook_ensemble
example_ensemble = construct_textbook_ensemble()
Each Ensemble
has three key attributes that specify the structure of the ensemble, which we'll describe below. This schematic also summarizes the structure of Ensemble
and how each attribute relates to cobrapy objects:
In [2]:
from IPython.display import Image
Image(filename='medusa_structure.png', width=500)
Out[2]:
The first is the base_model
, which is a cobra.Model
object that represents all the possible states of an individual member within the ensemble. Any reaction, metabolite, or gene that is only present in a subset of ensemble members will be present in the base_model
for an Ensemble
. You can inspect the base_model
and manipulate it just like any other cobra.Model
object.
In [3]:
extracted_base_model = example_ensemble.base_model
extracted_base_model
Out[3]:
The second attribute that each Ensemble
has is a structure called members
. Ensemble.members
maps an identifier for each individual GENRE in the ensemble to a medusa.Member
object, which holds information about a single member (where a "single member" is an individual GENRE within an ensemble).
Ensemble.members
is represented by a custom class implemented in cobrapy called a DictList, which is essentially a standard dictionary in python that can also be accessed using integer indices like a list (e.g. dictlist[0] returns the first element in the dictlist).
In [4]:
# looks like a list when we print it
example_ensemble.members
Out[4]:
In [5]:
# Get the first member with integer indexing
first_member = example_ensemble.members[0]
Each Member
within the Ensemble.members
DictList has a handful of attributes as well. You can check the ensemble that the member belongs to, the id of the member, and the network states for that member (we'll discuss states more below).
In [6]:
print(first_member.ensemble)
print(first_member.id)
print(first_member.states)
The states printed above are directly connected to the third attribute that Ensemble
contains, Ensemble.features
, which is also a DictList object. Ensemble.features
contains medusa.Feature
entries, which specify the components of the Ensemble.base_model
that vary across the entire ensemble.
In [7]:
example_ensemble.features
Out[7]:
Here, we see that this Ensemble
has 8 features. Each Feature
object specifies a network component that has a variable parameter value in at least one member of the ensemble (e.g. at least one ensemble member is missing the reaction).
In this case, there are features for 4 reactions, ACALDt
,ACKr
,ACONTb
, and ACt2r
. There are two Feature
objects for each reaction, corresponding to the lower and upper bound for that reaction. A feature will be generated for any component of a cobra.Model
(e.g. Reaction
, Gene
) that has an attribute value (e.g. Reaction.lower_bound
, Reaction.gene_reaction_rule
) that varies across the ensemble. As you can see from this result, a feature is created at the level of the specific attribute that varies, not the model component (e.g. we created a Feature
for each bound of each Reaction
, not for the Reaction
objects themselves).
This information can be inferred from feature ID (medusa.Feature.id
), but each Feature
also has a set of attributes that encode the information. Some useful attributes, described in the order printed below: getting the Ensemble
that the Feature
belongs to, the component in the Ensemble.base_model
that the Feature
describes, the attribute of the component in the Ensemble.base_model
whose value the Feature
specifies, and the ID of the Feature
:
In [8]:
first_feature = example_ensemble.features[0]
print(first_feature.ensemble)
print(first_feature.base_component)
print(first_feature.component_attribute)
print(first_feature.id)
Just as each member
has an attribute, states
, that returns the value of every feature for that member
, each feature
has a states
dictionary that maps each member.id
to the value of the feature
in the corresponding member
, e.g.:
In [9]:
print(first_feature.states)
Where possible, we use conventions from cobrapy for accessing information about attributes. In cobrapy, the Model
object has multiple containers in the form of DictLists: Model.reactions
,Model.metabolites
,Model.genes
. Equivalently in medusa, each Ensemble
has similarly constructed containers: Ensemble.members
and Ensemble.features
.
As such, information about specific Member
and Feature
objects can be accessed just like Reaction
, Metabolite
, and Gene
objects in cobrapy:
In [10]:
# Remember, our Ensemble holds a normal cobrapy Model in base_model
extracted_base_model = example_ensemble.base_model
# Accessing object by id is common in cobrapy
rxn = extracted_base_model.reactions.get_by_id('ACALDt')
# We can do the same thing for features:
feat = example_ensemble.features.get_by_id('ACALDt_lower_bound')
print(rxn)
print(feat.base_component)
print(feat.component_attribute)
# And for members:
memb = example_ensemble.members.get_by_id('first_textbook')
print('\nHere are the states for this member:')
print(memb.states)
These DictList objects are all iterables, meaning that any python operation that acts on an iterable can take them as input. This is often convenient when working with either cobrapy Model
s or medusa Ensemble
s. For example, suppose we are interested in getting the list of all components described by features in the Ensemble
:
In [11]:
components = []
for feat in example_ensemble.features:
components.append(feat.base_component)
print(components)
# or, use the one-liner which gives the same result:
components = [feat.base_component for feat in example_ensemble.features]
print(components)