Introduction to the human metabolic reconstruction and FBA and FVA

**Authors**: Thierry D.G.A Mondeel, Stefania Astrologo, Ewelina Weglarz-Tomczak & Hans V. Westerhoff
University of Amsterdam
2017

Note: Some of the material in this tutorial is inspired by and adapted from the cell factory design course from The Novo Nordisk Foundation Center for Biosustainability https://biosustain.github.io/cell-factory-design-course/

Questions

How do I explore the content (metabolites, reactions, genes, …) in a model?
How do I perform flux balance analysis?
What kind of questions can I answer by performing FBA experiments?
What is FVA and what can I use it for?

Objectives

Understand the basic data structures of models.
Set a biological objective and simulate a model with FBA.
Manipulate bounds for defining media conditions and knocking out reactions/genes
Find essential reactions with FVA

Refresher: the human metabolic reconstruction: RECON 2

See the publication: http://doi.org/10.1038/nbt.2488

**Assignment (5 min):** To refresh your memory of the lecture read the abstract and the first paragraph of the introduction of this paper.

Loading the model

This cell loads the COBRA python module and load the human metabolic reconstruction. The specifics of the code do not matter. Just execute this this cell.



In [ ]:

    
import cobra
from cobra.flux_analysis import pfba
import pandas as pd # for tables
pd.set_option('display.max_colwidth', -1) # don't constrain the content of the tables
pd.options.display.max_rows = 9999

import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
%matplotlib inline

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

M = cobra.io.load_json_model('models/recon_2_2_simple_medium.json')
model = M.copy() # this way we can edit model but leave M unaltered

Model components

Above we loaded the human metabolic reconstruction model. You can think of this model as having two key components:

The metabolic network of metabolites that are connected through reactions catalyzed by enzymes. This is captured mathematically as the "stoichiometry matrix".
Coupling of reactions to an identifier for the gene(s) that encode the enzyme catalyzing the reaction.

As a consequence these models allow you to investigate the contents of the network and perturb them by doing gene knockouts. Since the model contains the information linking a specific gene to the reactions its enzyme catalyzes it becomes possible to see how such knockouts interfere with the metabolic network.

In Python you can interact with the model through broadly 3 different kinds of object: metabolites, reactions and genes. We will look at these below.

The stoichiometry matrix

The stoichiometry matrix completely defines the network. The model for the human metabolic network contains 5324 metabolites and roughly 7785 reactions. The stoichiometry matrix therefore has 5324 rows and 7785 columns. You can see that below in the image. A blue dot in a certain row & column means that the metabolite that row refers to partakes in the reaction that column refers to.

If metabolite number 5 is involved in reaction 99 with a stoichiometry coefficient of -1 (i.e. it is a substrate) in that reaction, then the stoichiometry matrix will have the value -1 in the rows 5 column 99.

One metabolite like g3p maybe engages in 10 reactions or so. Whereas ATP might engage in hundreds of reactions. However, no metabolite gets even close to 7000 reactions. As a consequence the stoichiometry matrix mostly consists of zeros.



In [ ]:

    
S = model.copy().to_array_based_model().S.todense()
S



In [ ]:

    
import matplotlib.pyplot as plt
plt.spy(S, precision=0.01, markersize=.1)
plt.show()

Metabolites

The model contains a list of metabolites. Here are the first ten.



In [ ]:

    
model.metabolites[0:10]

How many metabolites are there?



In [ ]:

    
len(model.metabolites)

One can access a specific metabolite using dot notation.



In [ ]:

    
model.metabolites.atp_c

Warning: One cannot use dot notation to access metabolites, reactions, or genes if their identifiers do not resemble proper Python variable names.



In [ ]:

    
model.metabolites.10fthf_c

Solution: Use the method get_by_id instead!



In [ ]:

    
model.metabolites.get_by_id('10fthf_c')

Metabolites are associated with compartments in the cell. Glyceraldehyde 3-phosphate (g3p_c) is associated with the c (Cytosol) compartment.



In [ ]:

    
model.metabolites.g3p_c.compartment

The reactions in the human metabolic reconstruction are categorized in various compartments.



In [ ]:

    
model.compartments

**Assignment (3 min):** Do you know the biological role of all these compartments?

See here for more information: http://www.ncbi.nlm.nih.gov/books/NBK26907/

Some metabolites (like Glucose for example) can be associated with multiple compartments.



In [ ]:

    
model.metabolites.glc_D_c.compartment
model.metabolites.glc_D_g.compartment

The full name of the metabolite is available via the .name attribute.



In [ ]:

    
model.metabolites.glc_D_c.name

One can look up the molecular formula of glucose and its weight.

The .elements attribute returns a dictionary representation of the formula.



In [ ]:

    
model.metabolites.glc_D_c.formula
model.metabolites.glc_D_c.formula_weight

model.metabolites.g3p_c.elements

Metabolites are not isolated things. They participate in reactions as substrates and products.



In [ ]:

    
model.metabolites.g3p_c.reactions

If you are not sure which ID the metabolite you are looking has, you can use a query to look for it. Suppose we want to find lactate



In [ ]:

    
for metabolite in model.metabolites.query('lactate','name'):
    print(metabolite.id, metabolite.name,metabolite.compartment)

Reactions

Reactions have similar properties as metabolites

How many metabolites are there?



In [ ]:

    
len(model.reactions)



In [ ]:

    
for reaction in model.reactions[1000:1010]:
    print(reaction.id,reaction.name)

Let's take a closer look at the reactions associated with Glyceraldehyde 3-phosphate (g3p). Reactions like metabolites have both a short ID, a longer name attribute and an attribute called reaction that contains the chemical reactions equation. Lets see all of these for the reactions g3p engages in.



In [ ]:

    
for reaction in model.metabolites.g3p_c.reactions:
    print(reaction.id, reaction.name, reaction.reaction)

**Assignment (3 min):** In the cell above try to also add the "subsystem" attribute of the reactions to the print statement. You may find this under reaction.subsystem.

This will give you some more clarity on the pathway these reactions function in.

Reaction bounds

In constraint-based (FBA) models reactions are associated with two bounds, a lower bound and an upper bound. These constrain the amount of flux that is allowed to run through a certain reaction.

The most important way these are used is to take into account thermodynamics of reactions. If a certain reaction has a very high negative $\Delta G^{'0}$ then this reaction is practically irreversible. As such its lower bound is often made zero, meaning that the flux cannot run in the backward direction because it goes against thermodynamics. This makes the model predictions more realistic.

If you need a refresher on thermodynamics

This page has all you need to know: https://www.khanacademy.org/science/chemistry/thermodynamics-chemistry/gibbs-free-energy/a/gibbs-free-energy-and-spontaneity

Let's look at an example: pyruvate kinase has a lower bound of zero in our model.



In [ ]:

    
model.reactions.PYK.name
model.reactions.PYK.reaction
model.reactions.PYK.bounds

**Assignment (3 min):** Check that this makes sense thermodynamically. Go to: http://equilibrator.weizmann.ac.il/ and type in "pyruvate kinase". Click the reaction corresponding to the one above.

Look at the estimates for the $\Delta G^{'0}$. Does this match with the model?

Genes and their link to reactions

Remember that the core of the model is made up of the stoichiometry matrix which links metabolites and reactions. The second layer of the model consists of gene coupling to reactions.



In [ ]:

    
len(model.genes)

**Assignment (1 min):** Why is this such a small number? What about all the other human genes? How many genes do humans actually have in total?

See: http://doi.org/10.1126/science.1058040

Glyceraldehyde-3-phosphate dehydrogenase is associated with two genes.



In [ ]:

    
model.reactions.GAPD.gene_reaction_rule

These gene identifiers refer to the Hugo Gene Nomenclature Committee database: http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=4141

**Assignment (3 min):** Is this annotated correctly. Why two genes?

Here a very complicated gene to reaction mapping (ATP synthase).



In [ ]:

    
model.reactions.ATPS4m.name
model.reactions.ATPS4m.reaction
model.reactions.ATPS4m.gene_reaction_rule

The medium

The next key component of the model is the "medium", i.e. the set of metabolites that is allowed to be taken up into the cell. For human cells this should entail the essential amino acids, glucose, oxygen etc. The way this is done in the model is by having so-called "exchange reactions" that transport such a metabolite into the extracellular space (the "e" compartment we came across above). Typically each of these reactions start with 'EX_' which stands for exchange reaction. The word exchange refers to the fact that this is an exchange of metabolites with the environment or the biofluids surrounding the cell.

Exchange reactions are defined as: X <=> . So a negative (to the left) flux means uptake of metabolite X. Positive (to the right) flux would mean X is produced by the cell. This would typically be the case for lactate or CO2 etc.

**Assignment (3 min):** Make sure you understand the difference between an exchange reaction and a transport reaction. They are not the same thing!

Exchange $$X <=> $$ Transport $$ X_c <=> X_e$$

Below we print the medium of the model. The table shows the bound on each such inward flux. The bounds on these fluxes are actually negative but are here shown as positive values.



In [ ]:

    
pd.DataFrame(list(model.medium.items()),columns=['Reaction','Inward flux']).set_index('Reaction')

You may ignore the biomass_other reaction.

**Assignment (3 min):** Would you say this is a reasonable medium for a human cell to live in? What are the limiting medium components currently? Is there an oxygen or carbon limitation? What is the carbon source?

Performing Flux Balance Analysis

What is FBA again?

We will use the constraint-based modeling technique called flux balance analysis (FBA) (http://doi.org/10.1038/nbt.1614). Given the metabolic network this method computationally calculates the flux distribution, meaning the fluxes through all reactions in the model at steady-state, that is optimal according to a specified objective function. In our case, the objective function will usually be the flux through the biomass reaction. This reaction contains (when known) experimentally determined ratio's of metabolites that make of the composition of a cell.

FBA has traditionally been applied to metabolism as we are doing here. FBA primarily makes use of the stoichiometry matrix $S$. This matrix is of size $m \times r$ where $m$ is the number of species or metabolites in the system and $r$ is the number of reactions. Every row of $S$ specifies for a specific metabolite in what quantity it partakes in each reaction. Therefore, each element $(i,j)$ of $S$ contains the stoichiometric coefficient of metabolite $i$ in reaction $j$.

The word flux is used to describe a reaction rate at steady state. In FBA the aim is to find the flux distributions of the network that satisfy (1) the steady-state condition, this is based on the assumption that metabolism occurs on a fast time-scale compared to gene regulatory events and thus reaction rates are constant, (2) thermodynamic feasibility , i.e. some reactions are known to be irreversible, (3) maximal flux constraints when these are known and (4) maximize some objective like growth.

For the brave

Mathematically, this can be concisely summarized as \begin{align*} \text{Maximize } Z &=c^T v, \text{ such that } \\ Sv &= 0 \\ \alpha_k \leq & ~ v_k \leq \beta_k, \text{ for all k}. \end{align*} Here $Z$ is the objective, $c$ is a column vector with coefficients for reactions in the objecte, $v$ is the column vector of fluxes representing all reactions in the model. As such $c^T v$ is a linear combination of fluxes, the objective. $S$ is the stoichiometry matrix, $\alpha_k$ is the lower bound for reaction $k$ and similarly $\beta_k$ is the upper bound for reaction $k$.

This can be thought of as first constraining all possible solutions to the ones that allow a steady state and satisfy the bounds ( this results in a multi dimensional cone within the null space of S) and then finding the optimal solution among the remaining degrees of freedom.

What you will do

We will perform in-silico experiments with genetic manipulations by changing the third line in the above equation, i.e. the lower and upper bounds for the reactions. By setting both the lower bound $\alpha_k$ and the upper bound $\beta_k$ to zero we in effect knock out a reaction. Similarly, by setting it to a specific value we can fix a certain flux level for a specific reaction.

What is our in-silico cell optimizing?

Let us first check what objective is set for our model.



In [ ]:

    
model.objective.expression

So the objective is biomass. Don't get confused by the subtraction of biomass_reaction_reverse. The details do not matter now, but internally the two directions are handled separately and if the flux is "reversed" with a flux of 5 you would want the objective score to be -5. That is why it is subtracted.

**Assignment (5 min):** Is optimal growth a realistic assumption for human cells? For bacterial cells? Discuss...

Doing FBA is simple...

Actually performing FBA with cobrapy once you have loaded a model is really easy and fast.



In [ ]:

    
solution = model.optimize()

"solution" now contains the result of our simulation. Let's see what we can do with this. The most important attributes here are:

status => did we find an optimial solution?
objective_value => what is the biomass flux?
fluxes => the flux through every reaction in the model solution



In [ ]:

    
solution.status
solution.objective_value
solution.fluxes['GAPD']

A very nice summary of what goes in and out of the cell in this solution can be viewed with the summary attribute of the model object.



In [ ]:

    
model.summary()

**Assignment (3 min):** Investigate the metabolites that are produced (i.e. in the OUT FLUXES column). Are these what you would have expected?

**Assignment (5 min):** What would happen if we force the cell to produce something else than lactate? Try this by setting the upper_bound of the exchange reaction of lactate 'EX_lac_D_e' to zero. Do this using the . notation to access the upper_bound attribute of the lactate exchange reaction. Then rerun model.optimize() and look at the model.summary() again.

What new products have been produced? Did the growth rate remain the same?

If you want repeat this process a couple times. Be sure to block multiple metabolites simultaneously.



In [ ]:

    
model = M.copy()

# change some upper bounds here

model.optimize()
model.summary()

**Assignment (5 min):** Above you might have realized that the model can grow optimally while only producing CO2 as a carbon product.

Can you imagine why this happens? What could we change about the medium to force the model into producing other carbon exits? Try something below.

What happens now when you block the carbon exits like lactate?



In [ ]:

    
model = M.copy()

# change some upper/lower bounds here


model.optimize()
model.summary()

How is ATP generated?

When maximizing ATPM using only glucose and o2

Below we first remove the amino acids from the medium in order to just ask for ATP production from glucose and oxygen.

Then we make the ATP maintenance reaction: atp -> adp + pi, the objective reaction.

**Assignment (3 min):** Before running the cell below. Based on your textbook knowledge: what would you expect the maximum number of ATP to be given 1 unit of glucose and unlimited oxygen? How much oxygen would be used for this?

Now execute the cell and compare the results.



In [ ]:

    
model = M.copy()

for rxn in ['EX_his_L_e','EX_ile_L_e','EX_leu_L_e','EX_lys_L_e','EX_met_L_e','EX_phe_L_e','EX_thr_L_e','EX_trp_L_e','EX_val_L_e']:
    model.reactions.get_by_id(rxn).lower_bound = 0

model.objective = model.reactions.ATPM
model.optimize()

model.summary()
print()
model.metabolites.atp_c.summary()
print()
model.metabolites.atp_m.summary()
print()
model.metabolites.h_i.summary()

When maximizing biomass

Below we do the same again but for biomass as the objective reaction.



In [ ]:

    
model = M.copy()

model.optimize()

model.summary()
print()
model.metabolites.atp_c.summary()
print()
model.metabolites.atp_m.summary()
print()
model.metabolites.h_i.summary()

**Assignment (3 min):** Reflect for a moment on the differences compared to the ATPM objective. Why is the ATPS4 flux down? Why is the oxygen/glucose ratio down?

Parsimonious enzyme-usage flux balance analysis (pFBA)

(Lewis et al., 2010) http://doi.org/10.1038/msb.2010.47

pFBA is an often-used extension of normal flux balance balance analysis. It performs FBA as usual but with an additional optimization that minimizes the total flux in the network. Intuitively this means that if there are two solutions that achieve the same growth rate but one solution is a lot longer than the other, i.e. it has more steps and therefore the total flux is higher, pFBA will return the shorter solution.

Biologically, this may also be interpreted as giving you an optimal solution that is efficient in its enzyme usage.

Note: There is one pitfall with pFBA, the 'objective_value' returned refers to the total flux in the network not the objective flux. You can get the fluxes, as with model.optimize() through the 'fluxes' attribute



In [ ]:

    
model_pfba = M.copy()
sol = pfba(model_pfba)
print('Solution status: {}'.format(sol.status))
print('Objective value: {}'.format(sol.fluxes['biomass_reaction']))
print('Total flux in pFBA solution: {}'.format(sol.objective_value))

**Assignment (3 min):** The cell below calculates the total flux, i.e. the sum of all fluxes in a regular FBA solution. How does that compare to the pFBA solution. Does this make sense?



In [ ]:

    
model_fba = M.copy()

sol = model_fba.optimize()

# total_flux =  calculate the total flux here, make use of sol.fluxes
# make sure you take the absolute value of the flux (use abs()) so that negative fluxes (going in the backward direction) 
# are not reducing the total flux value

print('Total flux in FBA solution: {}'.format(total_flux))

**Assignment (3 min):** Using the cell below inspect the input and output of the networks in the FBA and pFBA solutions. Are the growth rates still the same? What happens to the amount of flux going in and out of the network



In [ ]:

    
model_fba.summary()
print() # empty line
model_pfba.summary()

Flux variability analysis (FVA)

Questions

How uniquely determined is the flux distribution returned by flux balance analysis?

Objectives

Learn how to determine flux capacities using flux variability analysis.
Learn how to generate phenotypic phase planes for different fluxes.

Flux balance analysis gives you one optimal steady state solution for the model you are simulating. Above you saw that you could alter the network, e.g. by blocking one lactate exit, and still get an optimal solution. This means that the FBA solution is not uniquely determined.

What is flux variability analysis?

(Mahadevan and Schilling, 2003) http://doi.org/10.1016/j.ymben.2003.09.002

Flux variability analysis is closely related to flux balance analysis.

Remember that FBA calculate one flux distribution such that the whole network is at steady state, mass-balanced, thermodynamically feasible and optimal.

In contrast, FVA calculates for each reaction the range of feasible flux that is consistent with these constraints FBA uses. So for a given reaction $v$, FVA will tell you what the minimal and maximal attainable flux is through that reaction while keeping the solution at steady state and optimal.

Technically, FVA for one reaction, for instance pyruvate kinase, involves once performing FBA with the objective of maximizing the flux through pyruvate kinase and once performing FBA to minimize its flux. It then returns this interval [minimum_feasible flux, maximum_feasible_flux].

Why would we do FVA?

Flux Balance Analysis solutions are not necessariliy unique. Flux Variablity Analysis is a good tool for estimating the space of alternative optimal solutions. If you see that there is an entire range of feasible fluxes in the optimal growth state for pyruvate kinase. It means there are multiple solutions that are equally optimal but with different pyruvate kinase fluxes.
Below we will look at how to find essential reactions with FVA
We will see later in the tutorial that FVA may be used to predict biomarkers of network perturbations such as inborn-errors of metabolism.

How to perform FVA with cobrapy?

cobra.io.flux_variability_analysis calculates all the minimum and maximum fluxes that all reactions in a model can attain.

flux_variability_analysis has a keyword argument fraction_of_optimum that allows one to specify a fraction of the model’s objective that needs to be achieved, by default set to 1, i.e. 100% (optimal).



In [ ]:

    
model = M.copy()

# a short list of reactions to calculate to reduce computational time. Feel free to adjust the list
interesting_reactions = ['EX_bhb_e','EX_acac_e','EX_urea_e','PHETHPTOX2','PYK','SUCD1m','ATPS4m',]

result = cobra.flux_analysis.flux_variability_analysis(model,reaction_list=interesting_reactions)
result[['minimum','maximum']]



In [ ]:

    
result.plot.bar(stacked=True)

Are essential amino acids essential?

We will now look into which reactions the model predicts to be essential for growth. Are the "essential" amino acids predicted to be essential?

Calculating the essential genes using flux variability analysis

One of the uses of FVA is to predict essential reactions. If we require growth to be non-zero then reactions that are essential will have an associated FVA interval that does not contain zero.

**Assignment (3 min):** Make sure you understand the statement made above. A helpful example is to think of glucuse exchange reaction. What would happen to this flux when the cell tries to grow optimally? Does this reaction need to carry flux? What does that say about the FVA interval?

The cell below returns a table with essential reactions predicted by FVA. It might take a while to run...



In [ ]:

    
model = M.copy()

# first perform pFBA to find a list of potentially essential reactions
# reasoning: if reaction v is essential it ALWAYS has to carry flux. So also in this pFBA solution. 
# doing this first allows us to perform FVA on a smaller set of reactions
sol = pfba(model)

# take only those reactions that carry flux
reactions_with_flux = sol.fluxes[sol.fluxes.abs() > 1e-9].index.tolist()

# perform fba on the flux carrying reactions
fva_sol = cobra.flux_analysis.flux_variability_analysis(model,reaction_list=reactions_with_flux,fraction_of_optimum=0.8)

# filter out 
fva_sol = fva_sol.loc[(fva_sol['minimum'] > 1e-9) | (fva_sol['maximum'] < -1e-9) ]
df = fva_sol.copy()
for r in df.index.tolist():
    df = df.set_value(r,'reaction',model.reactions.get_by_id(r).reaction)
    df = df.set_value(r,'genes',model.reactions.get_by_id(r).gene_reaction_rule)
df

**Assignment (2 min):** Search to see if the glucose exchange reaction is in the table above. Is it essential? What is the FVA interval allowed under the condition of optimality?

**Assignment (5 min):** Rerun the cell above but change the 'fraction_of_optimum' parameter to less than 1. 1 means exactly optimal. 0.8 would mean you allow the solution to be at least 80% of the optimal growth rate. etc. What happens to the glucose exchange reaction interval when you decrease the 'fraction_of_optimum' parameter? Is there a point where it bomes non-essential?

Self-check questions

Make sure you are able to explain in your own words:

What the two core components of the human metabolic reconstruction model are
How thermodynamics of reactions may be taken into account in the reaction bounds
What flux balance analysis is
What FBA actually computates
How pFBA differs from FBA
How FVA is different from FBA
What you can use FVA for



In [ ]: