**Authors**: Thierry D.G.A Mondeel, Stefania Astrologo, Ewelina Weglarz-Tomczak & Hans V. Westerhoff
University of Amsterdam
2017
Note: Some of the material in this tutorial is inspired by and adapted from the cell factory design course from The Novo Nordisk Foundation Center for Biosustainability https://biosustain.github.io/cell-factory-design-course/
Questions
Objectives
See the publication: http://doi.org/10.1038/nbt.2488
**Assignment (5 min):** To refresh your memory of the lecture read the abstract and the first paragraph of the introduction of this paper.
In [ ]:
import cobra
from cobra.flux_analysis import pfba
import pandas as pd # for tables
pd.set_option('display.max_colwidth', -1) # don't constrain the content of the tables
pd.options.display.max_rows = 9999
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
%matplotlib inline
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
M = cobra.io.load_json_model('models/recon_2_2_simple_medium.json')
model = M.copy() # this way we can edit model but leave M unaltered
Above we loaded the human metabolic reconstruction model. You can think of this model as having two key components:
As a consequence these models allow you to investigate the contents of the network and perturb them by doing gene knockouts. Since the model contains the information linking a specific gene to the reactions its enzyme catalyzes it becomes possible to see how such knockouts interfere with the metabolic network.
In Python you can interact with the model through broadly 3 different kinds of object: metabolites, reactions and genes. We will look at these below.
The stoichiometry matrix completely defines the network. The model for the human metabolic network contains 5324 metabolites and roughly 7785 reactions. The stoichiometry matrix therefore has 5324 rows and 7785 columns. You can see that below in the image. A blue dot in a certain row & column means that the metabolite that row refers to partakes in the reaction that column refers to.
If metabolite number 5 is involved in reaction 99 with a stoichiometry coefficient of -1 (i.e. it is a substrate) in that reaction, then the stoichiometry matrix will have the value -1 in the rows 5 column 99.
One metabolite like g3p maybe engages in 10 reactions or so. Whereas ATP might engage in hundreds of reactions. However, no metabolite gets even close to 7000 reactions. As a consequence the stoichiometry matrix mostly consists of zeros.
In [ ]:
S = model.copy().to_array_based_model().S.todense()
S
In [ ]:
import matplotlib.pyplot as plt
plt.spy(S, precision=0.01, markersize=.1)
plt.show()
In [ ]:
model.metabolites[0:10]
How many metabolites are there?
In [ ]:
len(model.metabolites)
One can access a specific metabolite using dot notation.
In [ ]:
model.metabolites.atp_c
Warning: One cannot use dot notation to access metabolites, reactions, or genes if their identifiers do not resemble proper Python variable names.
In [ ]:
model.metabolites.10fthf_c
Solution: Use the method get_by_id
instead!
In [ ]:
model.metabolites.get_by_id('10fthf_c')
Metabolites are associated with compartments in the cell. Glyceraldehyde 3-phosphate (g3p_c) is associated with the c (Cytosol) compartment.
In [ ]:
model.metabolites.g3p_c.compartment
The reactions in the human metabolic reconstruction are categorized in various compartments.
In [ ]:
model.compartments
**Assignment (3 min):** Do you know the biological role of all these compartments?
See here for more information: http://www.ncbi.nlm.nih.gov/books/NBK26907/
Some metabolites (like Glucose for example) can be associated with multiple compartments.
In [ ]:
model.metabolites.glc_D_c.compartment
model.metabolites.glc_D_g.compartment
The full name of the metabolite is available via the .name attribute.
In [ ]:
model.metabolites.glc_D_c.name
One can look up the molecular formula of glucose and its weight.
The .elements attribute returns a dictionary representation of the formula.
In [ ]:
model.metabolites.glc_D_c.formula
model.metabolites.glc_D_c.formula_weight
model.metabolites.g3p_c.elements
Metabolites are not isolated things. They participate in reactions as substrates and products.
In [ ]:
model.metabolites.g3p_c.reactions
If you are not sure which ID the metabolite you are looking has, you can use a query to look for it. Suppose we want to find lactate
In [ ]:
for metabolite in model.metabolites.query('lactate','name'):
print(metabolite.id, metabolite.name,metabolite.compartment)
In [ ]:
len(model.reactions)
In [ ]:
for reaction in model.reactions[1000:1010]:
print(reaction.id,reaction.name)
Let's take a closer look at the reactions associated with Glyceraldehyde 3-phosphate (g3p). Reactions like metabolites have both a short ID, a longer name attribute and an attribute called reaction that contains the chemical reactions equation. Lets see all of these for the reactions g3p engages in.
In [ ]:
for reaction in model.metabolites.g3p_c.reactions:
print(reaction.id, reaction.name, reaction.reaction)
**Assignment (3 min):** In the cell above try to also add the "subsystem" attribute of the reactions to the print statement. You may find this under reaction.subsystem.
This will give you some more clarity on the pathway these reactions function in.
In constraint-based (FBA) models reactions are associated with two bounds, a lower bound and an upper bound. These constrain the amount of flux that is allowed to run through a certain reaction.
The most important way these are used is to take into account thermodynamics of reactions. If a certain reaction has a very high negative $\Delta G^{'0}$ then this reaction is practically irreversible. As such its lower bound is often made zero, meaning that the flux cannot run in the backward direction because it goes against thermodynamics. This makes the model predictions more realistic.
This page has all you need to know: https://www.khanacademy.org/science/chemistry/thermodynamics-chemistry/gibbs-free-energy/a/gibbs-free-energy-and-spontaneity
Let's look at an example: pyruvate kinase has a lower bound of zero in our model.
In [ ]:
model.reactions.PYK.name
model.reactions.PYK.reaction
model.reactions.PYK.bounds
**Assignment (3 min):** Check that this makes sense thermodynamically. Go to: http://equilibrator.weizmann.ac.il/ and type in "pyruvate kinase". Click the reaction corresponding to the one above.
Look at the estimates for the $\Delta G^{'0}$. Does this match with the model?
In [ ]:
len(model.genes)
**Assignment (1 min):** Why is this such a small number? What about all the other human genes? How many genes do humans actually have in total?
Glyceraldehyde-3-phosphate dehydrogenase is associated with two genes.
In [ ]:
model.reactions.GAPD.gene_reaction_rule
These gene identifiers refer to the Hugo Gene Nomenclature Committee database: http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=4141
**Assignment (3 min):** Is this annotated correctly. Why two genes?
Here a very complicated gene to reaction mapping (ATP synthase).
In [ ]:
model.reactions.ATPS4m.name
model.reactions.ATPS4m.reaction
model.reactions.ATPS4m.gene_reaction_rule
The next key component of the model is the "medium", i.e. the set of metabolites that is allowed to be taken up into the cell. For human cells this should entail the essential amino acids, glucose, oxygen etc. The way this is done in the model is by having so-called "exchange reactions" that transport such a metabolite into the extracellular space (the "e" compartment we came across above). Typically each of these reactions start with 'EX_' which stands for exchange reaction. The word exchange refers to the fact that this is an exchange of metabolites with the environment or the biofluids surrounding the cell.
Exchange reactions are defined as: X <=> . So a negative (to the left) flux means uptake of metabolite X. Positive (to the right) flux would mean X is produced by the cell. This would typically be the case for lactate or CO2 etc.
**Assignment (3 min):** Make sure you understand the difference between an exchange reaction and a transport reaction. They are not the same thing!
Exchange $$X <=> $$ Transport $$ X_c <=> X_e$$
Below we print the medium of the model. The table shows the bound on each such inward flux. The bounds on these fluxes are actually negative but are here shown as positive values.
In [ ]:
pd.DataFrame(list(model.medium.items()),columns=['Reaction','Inward flux']).set_index('Reaction')
You may ignore the biomass_other reaction.
**Assignment (3 min):** Would you say this is a reasonable medium for a human cell to live in? What are the limiting medium components currently? Is there an oxygen or carbon limitation? What is the carbon source?
We will use the constraint-based modeling technique called flux balance analysis (FBA) (http://doi.org/10.1038/nbt.1614). Given the metabolic network this method computationally calculates the flux distribution, meaning the fluxes through all reactions in the model at steady-state, that is optimal according to a specified objective function. In our case, the objective function will usually be the flux through the biomass reaction. This reaction contains (when known) experimentally determined ratio's of metabolites that make of the composition of a cell.
FBA has traditionally been applied to metabolism as we are doing here. FBA primarily makes use of the stoichiometry matrix $S$. This matrix is of size $m \times r$ where $m$ is the number of species or metabolites in the system and $r$ is the number of reactions. Every row of $S$ specifies for a specific metabolite in what quantity it partakes in each reaction. Therefore, each element $(i,j)$ of $S$ contains the stoichiometric coefficient of metabolite $i$ in reaction $j$.
The word flux is used to describe a reaction rate at steady state. In FBA the aim is to find the flux distributions of the network that satisfy (1) the steady-state condition, this is based on the assumption that metabolism occurs on a fast time-scale compared to gene regulatory events and thus reaction rates are constant, (2) thermodynamic feasibility , i.e. some reactions are known to be irreversible, (3) maximal flux constraints when these are known and (4) maximize some objective like growth.
Mathematically, this can be concisely summarized as \begin{align*} \text{Maximize } Z &=c^T v, \text{ such that } \\ Sv &= 0 \\ \alpha_k \leq & ~ v_k \leq \beta_k, \text{ for all k}. \end{align*} Here $Z$ is the objective, $c$ is a column vector with coefficients for reactions in the objecte, $v$ is the column vector of fluxes representing all reactions in the model. As such $c^T v$ is a linear combination of fluxes, the objective. $S$ is the stoichiometry matrix, $\alpha_k$ is the lower bound for reaction $k$ and similarly $\beta_k$ is the upper bound for reaction $k$.
This can be thought of as first constraining all possible solutions to the ones that allow a steady state and satisfy the bounds ( this results in a multi dimensional cone within the null space of S) and then finding the optimal solution among the remaining degrees of freedom.
We will perform in-silico experiments with genetic manipulations by changing the third line in the above equation, i.e. the lower and upper bounds for the reactions. By setting both the lower bound $\alpha_k$ and the upper bound $\beta_k$ to zero we in effect knock out a reaction. Similarly, by setting it to a specific value we can fix a certain flux level for a specific reaction.
Let us first check what objective is set for our model.
In [ ]:
model.objective.expression
So the objective is biomass. Don't get confused by the subtraction of biomass_reaction_reverse. The details do not matter now, but internally the two directions are handled separately and if the flux is "reversed" with a flux of 5 you would want the objective score to be -5. That is why it is subtracted.
**Assignment (5 min):** Is optimal growth a realistic assumption for human cells? For bacterial cells? Discuss...
In [ ]:
solution = model.optimize()
"solution" now contains the result of our simulation. Let's see what we can do with this. The most important attributes here are:
In [ ]:
solution.status
solution.objective_value
solution.fluxes['GAPD']
A very nice summary of what goes in and out of the cell in this solution can be viewed with the summary attribute of the model object.
In [ ]:
model.summary()
**Assignment (3 min):** Investigate the metabolites that are produced (i.e. in the OUT FLUXES column). Are these what you would have expected?
**Assignment (5 min):** What would happen if we force the cell to produce something else than lactate? Try this by setting the upper_bound of the exchange reaction of lactate 'EX_lac_D_e' to zero. Do this using the . notation to access the upper_bound attribute of the lactate exchange reaction. Then rerun model.optimize() and look at the model.summary() again.
What new products have been produced? Did the growth rate remain the same?
If you want repeat this process a couple times. Be sure to block multiple metabolites simultaneously.
In [ ]:
model = M.copy()
# change some upper bounds here
model.optimize()
model.summary()
**Assignment (5 min):** Above you might have realized that the model can grow optimally while only producing CO2 as a carbon product.
Can you imagine why this happens? What could we change about the medium to force the model into producing other carbon exits? Try something below.
What happens now when you block the carbon exits like lactate?
In [ ]:
model = M.copy()
# change some upper/lower bounds here
model.optimize()
model.summary()
Below we first remove the amino acids from the medium in order to just ask for ATP production from glucose and oxygen.
Then we make the ATP maintenance reaction: atp -> adp + pi, the objective reaction.
**Assignment (3 min):** Before running the cell below. Based on your textbook knowledge: what would you expect the maximum number of ATP to be given 1 unit of glucose and unlimited oxygen? How much oxygen would be used for this?
Now execute the cell and compare the results.
In [ ]:
model = M.copy()
for rxn in ['EX_his_L_e','EX_ile_L_e','EX_leu_L_e','EX_lys_L_e','EX_met_L_e','EX_phe_L_e','EX_thr_L_e','EX_trp_L_e','EX_val_L_e']:
model.reactions.get_by_id(rxn).lower_bound = 0
model.objective = model.reactions.ATPM
model.optimize()
model.summary()
print()
model.metabolites.atp_c.summary()
print()
model.metabolites.atp_m.summary()
print()
model.metabolites.h_i.summary()
In [ ]:
model = M.copy()
model.optimize()
model.summary()
print()
model.metabolites.atp_c.summary()
print()
model.metabolites.atp_m.summary()
print()
model.metabolites.h_i.summary()
**Assignment (3 min):** Reflect for a moment on the differences compared to the ATPM objective. Why is the ATPS4 flux down? Why is the oxygen/glucose ratio down?
(Lewis et al., 2010) http://doi.org/10.1038/msb.2010.47
pFBA is an often-used extension of normal flux balance balance analysis. It performs FBA as usual but with an additional optimization that minimizes the total flux in the network. Intuitively this means that if there are two solutions that achieve the same growth rate but one solution is a lot longer than the other, i.e. it has more steps and therefore the total flux is higher, pFBA will return the shorter solution.
Biologically, this may also be interpreted as giving you an optimal solution that is efficient in its enzyme usage.
Note: There is one pitfall with pFBA, the 'objective_value' returned refers to the total flux in the network not the objective flux. You can get the fluxes, as with model.optimize() through the 'fluxes' attribute
In [ ]:
model_pfba = M.copy()
sol = pfba(model_pfba)
print('Solution status: {}'.format(sol.status))
print('Objective value: {}'.format(sol.fluxes['biomass_reaction']))
print('Total flux in pFBA solution: {}'.format(sol.objective_value))
**Assignment (3 min):** The cell below calculates the total flux, i.e. the sum of all fluxes in a regular FBA solution. How does that compare to the pFBA solution. Does this make sense?
In [ ]:
model_fba = M.copy()
sol = model_fba.optimize()
# total_flux = calculate the total flux here, make use of sol.fluxes
# make sure you take the absolute value of the flux (use abs()) so that negative fluxes (going in the backward direction)
# are not reducing the total flux value
print('Total flux in FBA solution: {}'.format(total_flux))
**Assignment (3 min):** Using the cell below inspect the input and output of the networks in the FBA and pFBA solutions. Are the growth rates still the same? What happens to the amount of flux going in and out of the network
In [ ]:
model_fba.summary()
print() # empty line
model_pfba.summary()
Questions
Objectives
Flux balance analysis gives you one optimal steady state solution for the model you are simulating. Above you saw that you could alter the network, e.g. by blocking one lactate exit, and still get an optimal solution. This means that the FBA solution is not uniquely determined.
(Mahadevan and Schilling, 2003) http://doi.org/10.1016/j.ymben.2003.09.002
Flux variability analysis is closely related to flux balance analysis.
Remember that FBA calculate one flux distribution such that the whole network is at steady state, mass-balanced, thermodynamically feasible and optimal.
In contrast, FVA calculates for each reaction the range of feasible flux that is consistent with these constraints FBA uses. So for a given reaction $v$, FVA will tell you what the minimal and maximal attainable flux is through that reaction while keeping the solution at steady state and optimal.
Technically, FVA for one reaction, for instance pyruvate kinase, involves once performing FBA with the objective of maximizing the flux through pyruvate kinase and once performing FBA to minimize its flux. It then returns this interval [minimum_feasible flux, maximum_feasible_flux].
cobra.io.flux_variability_analysis calculates all the minimum and maximum fluxes that all reactions in a model can attain.
flux_variability_analysis has a keyword argument fraction_of_optimum that allows one to specify a fraction of the model’s objective that needs to be achieved, by default set to 1, i.e. 100% (optimal).
In [ ]:
model = M.copy()
# a short list of reactions to calculate to reduce computational time. Feel free to adjust the list
interesting_reactions = ['EX_bhb_e','EX_acac_e','EX_urea_e','PHETHPTOX2','PYK','SUCD1m','ATPS4m',]
result = cobra.flux_analysis.flux_variability_analysis(model,reaction_list=interesting_reactions)
result[['minimum','maximum']]
In [ ]:
result.plot.bar(stacked=True)
One of the uses of FVA is to predict essential reactions. If we require growth to be non-zero then reactions that are essential will have an associated FVA interval that does not contain zero.
**Assignment (3 min):** Make sure you understand the statement made above. A helpful example is to think of glucuse exchange reaction. What would happen to this flux when the cell tries to grow optimally? Does this reaction need to carry flux? What does that say about the FVA interval?
The cell below returns a table with essential reactions predicted by FVA. It might take a while to run...
In [ ]:
model = M.copy()
# first perform pFBA to find a list of potentially essential reactions
# reasoning: if reaction v is essential it ALWAYS has to carry flux. So also in this pFBA solution.
# doing this first allows us to perform FVA on a smaller set of reactions
sol = pfba(model)
# take only those reactions that carry flux
reactions_with_flux = sol.fluxes[sol.fluxes.abs() > 1e-9].index.tolist()
# perform fba on the flux carrying reactions
fva_sol = cobra.flux_analysis.flux_variability_analysis(model,reaction_list=reactions_with_flux,fraction_of_optimum=0.8)
# filter out
fva_sol = fva_sol.loc[(fva_sol['minimum'] > 1e-9) | (fva_sol['maximum'] < -1e-9) ]
df = fva_sol.copy()
for r in df.index.tolist():
df = df.set_value(r,'reaction',model.reactions.get_by_id(r).reaction)
df = df.set_value(r,'genes',model.reactions.get_by_id(r).gene_reaction_rule)
df
**Assignment (2 min):** Search to see if the glucose exchange reaction is in the table above. Is it essential? What is the FVA interval allowed under the condition of optimality?
**Assignment (5 min):** Rerun the cell above but change the 'fraction_of_optimum' parameter to less than 1. 1 means exactly optimal. 0.8 would mean you allow the solution to be at least 80% of the optimal growth rate. etc. What happens to the glucose exchange reaction interval when you decrease the 'fraction_of_optimum' parameter? Is there a point where it bomes non-essential?
Make sure you are able to explain in your own words:
In [ ]: