How does micom model communities

There are many frameworks for microbial modeling and all make their own specific assumptions in order to fit microbial communities into a mathematical representation. This is an overview about the assumptions micom makes and how it translates the community into mathematical terms. As it happens all formulations here agree with the formulation used in the OptCom and SteadyCom papers.

Exchanges and community growth rate

One of the things that can easily overlooked when using FBA for communities is that growth rates and fluxes are usually given in unit mass/(abundance time), for instance mmmol/(gDW h). Thus, all fluxes are realtive to the abundance of the single bacteria they describe. However, in a community all bacteria might have different abundances so we have to take care to balance fluxes. For instance let us take a system contianing only one bacteria $i$ which imports metabolite X from the external medium. In the medium metabolite X enters the system with an unscaled flux $v^m_x$ which has units mmmol/h. Within the bacteria the metabolite is consumed with the sclaed flux $v^i_x$ which has units mmol/(gDW * h). $v^i_x$ described the flux that can be realized by 1 gDW of the respective bacteria. However, the abundance for bacteria $i$, $b_i$ (in gDW), might be different than that. In order to have balanced fluxes we have to enforce that $v^m_x = b_i \cdot v^i_x$ (overall influx equals overall consumption). If many bacteria import the respective metabolite we thus need to enforce

$$ v^m_x = \sum_i b_i \cdot v^i_x $$

The actual abundances $b_i$ are usually not known, but we can divide the equation by the total bacterial abundance $B = \sum_i b_i$ and obtain

$$ \tilde{v}^m_x = v^m_x/B = \sum_i b_i/B \cdot v^i_x $$

$\tilde{v}^m_x$ is now a scaled flux in the medium relative to the overall bacterial biomass in the community. The relative abundances $\tilde{b}_i = b_i/B$ can be taken from metagenomic studies such as 16S rRNA quantities. This is how micom uses abundance data.

In a similar manner the unscaled community growth rate (total biomass production) is given by

$$ v_{biomass} = \sum_i b_i\cdot v^i_{biomass} $$

and dividing by the total biomass $B$ yields

$$ \tilde{v}_{biomass} = \sum_i b_i/B v^i_{biomass} $$

where $\tilde{v}_{biomass}$ now again is a scaled biomass flux relative to the total community biomass.

Steady states in the community

micom has be designed with the gut microbiota in mind. One of the major problems when trying to apply flux balance analysis (FBA) to microbial community data especially metagenomic data are paradoxical assumptions about community growth. Flux balance analysis usually assumes a maximization of the growth rate, or at least the realization of one particular growth rate $\mu$, however metagenomic experiments usually only quantify the microbial compositions at one particular time point, assuming that the microbial abundance does not change. Those two assumptions are not compatible per sé. If members of the community grow with a constant rate they will accumulate exponentially over time, however we know that this is not the case in systems as the intestine (otherwise we would probably explode due to an overpopulation of bacteria and fungi). This disagreement can be aleviated by accounting for dilution of the microbiota. For instance in the gut bacteria are constantly removed in small amounts by death (when arriving at their specific life span) and in larger amounts by defecation. The sum of all processes removing bacteria from the system is what we call dilution here. In micom we assume the follwing about the dilution process:

  1. It is relative to the bacterial abundance (the more you have in your system the more is removed by dilution)
  2. It may be specific to the bacterial strain
  3. It may specific to the sample

(1) is known to be true for the gut microbiome since we know that a higher concentration of a bacteria in the gut is usually associated with a higher concentration in stool samples (which is the major dilution contributor). This is the same assumption made in the recent SteadyCom publication. (2) is based on the observation that bacteria may have distinct spatial arrangements which make it easier or hader to be diluted. (3) is based on the assumption that the respective systems may be different (no gut is the same :D), however it is one of the assumptions we are currently trying to validate. As a consequence we assume that the abundance for bacteria $i$, $b_i$ occurs with a growth rate $\mu_i$ and balances with a linear dilution process $d_i$ as

$$ \frac{d b_i(t)}{dt} = \mu_i b_i(t) - d_i b_i(t) = (\mu_i - d_i)b_i(t) $$

As we can see absolute bacterial abundance can only be in steady state if $\mu_i = d_i$ for all $i$. Additionally one could also formulate the problem in terms of relative concentrations to the total community abundance $B = \sum_i b_i$ using the quotient rule, which yields:

$$ \frac{d \tilde{b}_i}{dt} = \frac{d b_i/B}{dt} = \tilde{b}_i\cdot\left(\mu_i - d_i - \sum_k (\mu_k - d_k)\tilde{b}_k \right) $$

As one can see, this equation has a steady state if all differences are the same $\mu_i - d_i = C$ for any constant C:

\begin{align} \frac{d \tilde{b}_i}{dt} &= \tilde{b}_i\cdot\left(C - \sum_k C \tilde{b}_k \right)\\ & = \tilde{b}_i\left(C - C\cdot 1) \right) = 0 \end{align}

However, in this the total abundance $B$ would increase indefinitely in time. In particular it holds that $\frac{dB}{dt} = C$.

Okay, what can we conclude from that? Most importantly that any abundance $b_i$ or relative abundance $\tilde{b}_i$ can be a valid steady state abundance as long as the respective growth and dilution rates are balanced. This means on can not directly derive the growth rate of a bacteria just from its abundance (e.g. large abundance does not mean large growth rate or vicer versa).

Optimization

micom follows the idea from the original OptCom publication in that there are two major sources of pressure to dictate the community growth:

  1. the community growth rate $\tilde{v}_{biomass}$
  2. individual (egoistic) growth rates $v^i_{biomass}$

The major challenge is to find the tradeoff between the two. One could optimize only the community growth, however that might yield many bacterial species that do not grow (competition) which might stand in contrast to metagenomic studies where the bacteria was actually found in the sample. Also, bacteria are seldom evolved to sacrifice their own growth in favour of community growth.

The original OptCom algorithm solves the multi-objective problem:

$$ \begin{align} \text{maximize } & \tilde{v}_{biomass} = \sum_i b_i/B v^i_{biomass}\\ & s.t. \forall i: \text{ maximize } v^i_{biomass} \end{align} $$

micom can solve this problem using a dualization approach where the dual formulation is appended to the linear problem and strong duality is enforced to transform objectives into additional constraints. However, that is slow and there may be weak pareto optimality.

As quick alternative one could define lower bounds for the the individual biomass objectives and enforce those, yielding a single objective

$$ \begin{align} \text{maximize } & \tilde{v}_{biomass} = \sum_i b_i/B v^i_{biomass}\\ & s.t. \forall i: v^i_{biomass} \geq lb^i_{biomass} \end{align} $$

In order to simplify the problem a bit, micom defines an additional partial objective called "cooperativity cost" which quantifies how much of their own growth the entire population has to sacrifice in order to optimize the growth of the community. Given the individual maximal growth $v^i_{max}$ for each member of the community cooperativity cost is defined as:

$$ \sum_i (v^i_{max} - v^i_{biomass})^2. $$

The quadratic term favours small deviations over larger ones. Alternatively, the cooperativity cost can also be formulated as a linear term

$$ \sum_i \left|v^i_{max} - v^i_{biomass}\right| = \sum_i (v^i_{max} - v^i_{biomass}). $$

micom can again solve the multi-objective case where community growth is maximized at the same time where the cooperativity cost is minimized. However, this again uses dualization and is slow. A faster approach can be obtained by using scalarization approach. Here, we combine both objectives into a single one:

$$ \text{maximize } (1 - \alpha)\cdot\tilde{v}_{biomass} - \alpha\cdot\sum_i (v^i_{max} - v^i_{biomass})^2 $$

Here, $0 \geq \alpha \leq 1$ defines the tradeoff between individual and community growth. Again one could also use the linear term for the cooperativity cost. This does not require dualization and is fast. In fact, even trying out several possible values for $\alpha$ will usually be faster than the dualized version allowing to plot the tradeoff against the community growth and the cooperativity cost (thus evaluating the entire Pareto front).