In [1]:
from cameo import config
config.default_view = config.SequentialView()
from pandas import options
options.display.max_rows = 8


Using meta-heuristics to search for knockout strategies.

Cameo uses Evolutionary Algorithms, which allow the search of near-optimal solutions very fast by combining linear programming to simulate flux distributions and Evolutionary Algorithms to find combinations that improve, for example the yield of a desired product. The OptKnock[1] method uses this approach.

The evolutionary algorithms are iterative algorithms. In each iteration multiple solutions are evaluated and assigned a fitness value. The solutions that improve the objective are kept. Some of them are altered and reassigned to the next iteration. After some rounds the objective has been improved and if the algorithm is ran long enought all possible solutions will be covered.

Thanks to the inspyred library, we implemented a low level interface to allow the implementation of more elaborate strategies than OptGene.

Load model

The first step is to load a model as usual.


In [4]:
from cameo import models
iJO1366 = models.bigg.iJO1366

Define objective functions

An objective function defines how the fitness of a solution is computed in the evaluation phase.

Cameo comes with prebuilt objective functions, e.g., Biomass-Product coupled yield[1].


In [5]:
from cameo.strain_design.heuristic.evolutionary.objective_functions import biomass_product_coupled_yield
of = biomass_product_coupled_yield(iJO1366.reactions.BIOMASS_Ec_iJO1366_core_53p95M,
                                   iJO1366.reactions.EX_ac_e,
                                   iJO1366.reactions.EX_glc__D_e)
of


Out[5]:
$$bpcy = \frac{(BIOMASS\_Ec\_iJO1366\_core\_53p95M * EX\_ac\_e)}{EX\_glc\_\_D\_e}$$

Other fitness functions such as yield or number of knockouts are also available in cameo.


In [6]:
from cameo.strain_design.heuristic.evolutionary.objective_functions import product_yield, number_of_knockouts

Costumized objectives can be implemented by extending the base class ObjectiveFunction.

During the evaluation phase, all objective functions will be called with the follwing parameters of(model, flux_distribution, decoded_solution).

Search for gene knockouts with single objective

In this example, we are looking for gene knockouts leading to biomass coupled acetate production with E. coli through iJO1366 model. This is very similar to OptGene.

Setup optimization strategy for gene knockouts

There are multiple configurations for this strategy. The most basic configuration requires a model and an objective funtion. More parameters can be used, such as the method to simulate the flux distributions (FBA is the defualt), the Evolutionary Computation (Genetic Algorithm inspyred.ec.GAis the default). The implemetation removes the essential genes from the search, as they won't yield soutions that are viable. More genes can be removed from the search if defined (either due to biological knowlege or user strategy).


In [8]:
from cameo.strain_design.heuristic.evolutionary import GeneKnockoutOptimization
ko = GeneKnockoutOptimization(model=iJO1366, objective_function=of)

Run optimization for gene knockouts with single objective


In [9]:
res1 = ko.run(max_evaluations=15000, view=config.default_view)


Starting optimization at Thu, 14 Jan 2016 13:49:55
Using saved session configuration for http://localhost:5006/
To override, pass 'load_from_config=False' to Session
/Users/joao/.virtualenvs/cameo-py3/lib/python3.4/site-packages/bokeh/session.py:318 UserWarning: You need to start the bokeh-server to see this example.
 0%
Finished after 01:59:48

In [10]:
res1


Out[10]:

Result:

  • model: iJO1366
  • heuristic: GA
  • objective function: $$bpcy = \frac{(BIOMASS\_Ec\_iJO1366\_core\_53p95M * EX\_ac\_e)}{EX\_glc\_\_D\_e}$$
  • simulation method: pfba
  • type: gene
    • reactions knockouts fitness
      0 (PETNT161pp, ATPS4rpp, TRPAS2, PSP_L, PETNT181pp) (b0752, b3708, b4485, b4268, b3546, b4388, b43... 0.575018
      1 (ATPS4rpp, MDH3, TRPAS2, GTHRDHpp, MDH2, PSP_L... (b3447, b2498, b3708, b4268, b2210, b4388, b43... 0.575018
      2 (PETNT161pp, ATPS4rpp, TRPAS2, PSP_L, UPPRT, P... (b2498, b4485, b3708, b3917, b3546, b4388, b43... 0.575018
      3 (PETNT161pp, ATPS4rpp, TRPAS2, PSP_L, UPPRT, P... (b2498, b4485, b3708, b3679, b3546, b4388, b43... 0.575018
      ... ... ... ...
      96 (UDPGD, METabcpp, ECA4OALpp, ATPS4rpp, TRPAS2,... (b4485, b3708, b0198, b3622, b4388, b1533, b43... 0.575018
      97 (NADH18pp, FCLK, PETNT161pp, ATPS4rpp, TRPAS2,... (b2498, b2285, b3708, b3546, b4388, b2803, b38... 0.575018
      98 (RNDR2b, NTRIR2x, ATPS4rpp, RNDR4b, RNDR1b, RN... (b3708, b2676, b4388, b3875, b3365, b4390, b3731) 0.575018
      99 (ATPS4rpp, TRPAS2, PSP_L) (b3708, b4138, b4388, b3731) 0.575018

      100 rows × 3 columns

Search for gene knockouts with multiple objectives

In this example, we are looking for gene knockouts for biomass coupled succinate prodution with S. cerevisiae through iMM904 model. Number of mutations necesary to generate the predicted strain is the secondary objective. This allows searching for strategies that minimize the number of changes.

Load model


In [6]:
iMM904 = models.bigg.iMM904

Define objective function for search (multiple objectives)


In [7]:
objective1 = biomass_product_coupled_yield(iMM904.reactions.BIOMASS_SC5_notrace,
                                           iMM904.reactions.EX_succ_e,
                                           iMM904.reactions.EX_glc__D_e)

objective2 = number_of_knockouts()

multi_objective = [objective1, objective2]


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-6fc5e3dcbf5c> in <module>()
      3                                            iMM904.reactions.EX_glc__D_e)
      4 
----> 5 objective2 = number_of_knockouts()
      6 
      7 objective = [objective1, objective2]

NameError: name 'number_of_knockouts' is not defined

Setup optimization strategy for gene knockouts

Because we are using more then one objective, there are evolutionary algorithms that have been designed for this purpose and have been implemented in inspyred. In this examples we use Pareto archived evolution strategy (PAES)[2]


In [ ]:
ko = GeneKnockoutOptimization(model=model,
                              objective_function=multi_objective,
                              heuristic_method=inspyred.ec.emo.PAES)

Run optimization for gene knockouts with multiple objective


In [ ]:
res2 = ko.run(max_evaluations=15000)

In [ ]:
res2

Run other simulation methods

The implemented approach makes use of linear programming in the evaluation phase, which means that different methods can be used to compute the flux distributions.

All methods found in cameo.flux_analysis.simulation can be used as a simulation method.

Alternativly users can give any method as long as they follow the signture simulation_method(model, **kwargs).

The required keyword arguments can be preset on the Optimization class. besides those arguments a ProblemCache will be passed as cache=cache_object for optimized performance.


In [8]:
from cameo.flux_analysis.simulation import lmoma

In [ ]:
ko.simulation_method = lmoma

In [ ]:
ko.simulation_kwargs

In [ ]:
res3 = ko.run(max_evaluations=15000)

Search for reaction knockouts with single objective

In this example, we are looking for reaction knockouts for biomass coupled succinate prodution with E. coli through iJO1366 model.

Select model and define objective function for reaction knockout search (single objective)


In [ ]:
model = iJO.copy()
of = biomass_product_coupled_yield(model.reactions.Ec_biomass_iJO1366_core_53p95M,
                                   model.reactions.EX_glu__L_e,
                                   model.reactions.EX_glc__D_e)

Setup the optimization - ATP maintenance reaction is removed from targets


In [ ]:
ko = ReactionKnockoutOptimization(model=model, 
                                  objective_function=of, 
                                  heuristic_method=inspyred.ec.GA
                                  essential_reactions=["ATPM"])

The knockout search using reactions will try to remove the Maintenance ATP reaction. The Maintenance ATP reaction represents the non-growth associated ATP cost. It is not essential to growth, but it is relevant to keep the model predictability. For that reason is added as essential.

Run reaction knockout optimization

The optimization can be run with several parameters. The mutation_rate and indel_rate, for example, change the frequency of changes in the Evolutionary Computation.


In [ ]:
results_3 = ko.run(max_evaluations=5000, mutation_rate=0.15, indel_rate=0.185)

In [ ]:
results_3

Search for reaction knockouts with multiple objectives

In this example, we are looking for reaction knockouts for high yield acetate prodution with E. coli through iJO1366 model. As before, number of knockouts necessary to construct the strain is a secondary objective.


In [ ]:
of1 = product_yield(model.reactions.EX_ac_e.id, model.reactions.EX_glc__D_e.id)
of2 = number_of_knockouts()

Setup the optimization


In [ ]:
ko = ReactionKnockoutOptimization(model=model, 
                                  objective_function=[of1, of2],
                                  simulation_method=fba, 
                                  heuristic_method=inspyred.ec.emo.NSGA2)

Run reaction knockout optimization

More parameteres can be set on the run method. For more information see inspyred documentation.


In [ ]:
results_4 = ko.run(max_evaluations=5000, n=1, mutation_rate=0.3, populations_size=100, crossover_rate=0.2)

In [ ]:
results_4

In [ ]:


In [ ]:

References

[1] Patil, K. R., Rocha, I., Förster, J., & Nielsen, J. (2005). Evolutionary programming as a platform for in silico metabolic engineering. BMC Bioinformatics, 6, 308. doi:10.1186/1471-2105-6-308

[2] Knowles, J., & Corne, D. (n.d.). The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406) (pp. 98–105). IEEE. doi:10.1109/CEC.1999.781913


In [ ]: