Shen-Orr, S. S., et al. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64-8. http://dx.doi.org/10.1038/ng881
Alon, U. 2007. Network motifs: theory and experimental approaches. Nat Rev Genet, 8(6):450–61. http://dx.doi.org/10.1038/nrg2102.
To gain some intuition for how systems biologists build mathematical models of gene networks we're going to use computer simulations to explore the dynamical behavior of simple transcriptional networks.
In each of our simulations we will keep track of the the concentration of a different genes of interest as they change over time. The basic approach we will use to calculate changes in the quantity of different molecules are differential equations, which are simply a way of describing the instanteous change in a quantity of interest.
All of our differential equations will be of this form:
\begin{eqnarray*} \frac{dY}{dt} = \mbox{rate of production} - \mbox{rate of decay} \end{eqnarray*}To state this in words -- the amount of gene $Y$ changes over time is a function of two things: 1) a growth term which represents the rate at which the gene is being transcribed and translated; and 2) a decay term which gives the rate at which $Y$ trascsripts and protein are being degraded.
In general we will assume that the "rate of production" is a function of the concentration of the genes that regulate $Y$(i.e. it's inputs in the transcriptional network), while the "rate of decay" is a proportional to the amount of $Y$ that is present. So the above formula will take the following structure:
$$ \frac{dY}{dt} = f(X_1, X_2, \ldots) - \alpha Y $$The $f(X_1, X_2, \ldots)$ term represents the growth term and is a function of the transcription factors that regulate $Y$. The term, $\alpha Y$ represents the rate at which $Y$ is being broken down or diluted. Notice that the decay rate is a proportional to the amount of $Y$ that is present.
If $\frac{dy}{dt}$ is positive than the concentration of gene $Y$ is increasing, if $\frac{dy}{dt}$ is negative the concentration of $Y$ is decreasing, and if $\frac{dy}{dt} = 0$ than $Y$ is at steady state.
An appropriate approach for modeling the rate of production of a protein, $Y$, as a function of it's inputs, $X_1, X_2,..$, is a with the "Hill Function". The Hill Function for a single transcriptional activator is:
$$ f(X) = \frac{\beta X^n}{K^n + X^n} $$$X$ represents the concentration of a transcriptional activator and $f(X)$ represents the the combined transcription and translation of the gene $Y$ that is regulated by $X$.
In [ ]:
# import statements to make numeric and plotting functions available
%matplotlib inline
from numpy import *
from matplotlib.pyplot import *
In [ ]:
def hill_activating(X, B, K, n):
""" Hill function for an activator"""
return (B * X**n)/(K**n + X**n)
In [ ]:
## generate a plot using the hill_activating function defined above
# setup paramters for our simulation
# CHANGE THESE VALUES TO EXPLORE THE EFFECT OF THESE PARAMTERS
# (see questions below)
n = 1
B = 5
K = 10
# generate a range of x values, representing a range of concentrations of our
# transcription factor X
x = linspace(0,30,200) # generate 200 evenly spaced points between 0 and 30
# calculating corresponding rates of production of Y
y = hill_activating(x, B, K, n)
# plot the hill fxn with the user set parameters
plot(x, y, label='B = {}, K = {}, n={}\n(user specified)'.format(B, K, n))
# plot the hill fxn with a set of reference parameters to facilitate comparison
plot(x, hill_activating(x, 5, 10, 1), label='B = 5, K = 10, n=1\n(reference)', alpha=0.75)
xlabel('Concentration of X')
ylabel('Rate of production of Y')
legend(loc='best')
pass # suppress further output
If rather than stimulating the production of $Y$, $X$ "represses" $Y$, we can write the corresponding Hill function as:
$$ f(X) = \frac{\beta}{1 + (X/K)^n} $$Remember that both of these Hill functions (activating and repressing) describe the production of $Y$ as a function of the levels of $X$, not the temporal dynamics of $Y$ which we'll look at after developing a few more ideas.
In [ ]:
## Python implementation of repressive Hill function
def hill_repressing(X, B, K, n):
return B/(1.0 + (X/K)**n)
In [ ]:
## generate a plot using the hill_repressing function defined above
# CHANGE THESE VALUES TO EXPLORE THE EFFECT OF THESE PARAMTERS
# (see questions below)
n = 1
B = 5
K = 10
# generate a range of x values, representing a range of concentrations of our
# transcription factor X
x = linspace(0,30,200) # generate 200 evenly spaced points between 0 and 30
# calculating corresponding rates of production of Y
y = hill_repressing(x, B, K, n)
# plot the hill fxn with the user set parameters
plot(x, y, label='B = {}, K = {}, n={}\n(user specified)'.format(B, K, n))
# plot the hill fxn with a set of reference parameters to facilitate comparison
plot(x, hill_repressing(x, 5, 10, 1), label='B = 5, K = 10, n=1\n(reference)')
xlabel('Concentration of X')
ylabel('Rate of production of Y')
legend(loc='best')
pass # suppress further output
As before change the values of the simulation to answer these questions
To simplify analysis it's often convenient to approximate step-like sigmoidal functions like those produced by the Hill equation with functions using logic approximations.
We'll assume that when the transcription factor, $X$, is above a threshold, $K$, then gene $Y$ is transcribed at a rate, $\beta$. When $X$ is below the threshold, $K$, gene $Y$ is not be transcribed. To represent this situation, we can rewrite the formula for $Y$ as:
$$ f(X) = \beta\ \Theta(X > K) $$where the function $\Theta = 0$ if the statement inside the parentheses is false and $\Theta = 1$ if the statement is true. An alternate way to write this is: $$ f(X) = \begin{cases} 0, &\text{if $X > K$;} \\ \beta, &\text{otherwise.} \end{cases} $$
When $X$ is a repressor we can write:
$$ f(X) = \beta\ \Theta(X < K) $$
In [ ]:
def logic_activating(X, B, K):
if X > K:
theta = 1
else:
theta = 0
return B*theta
def logic_repressing(X, B, K):
if X < K:
theta = 1
else:
theta = 0
return B*theta
And now we can generate some plot to compare the logic approximation to the Hill function, for the activating case:
In [ ]:
## generate plots using your hill_activating and logic_activating functions defined above
## For X values range from 0 to 30
# CHANGE THESE VALUES TO EXPLORE THE EFFECT OF THESE PARAMTERS
n = 4
B = 5
K = 10
x = linspace(0, 30, 200)
plot(x, hill_activating(x, B, K, n),
label='B = {}, K = {}, n={}'.format(B, K, n))
logicx = [logic_activating(i, B, K) for i in x]
plot(x, logicx, label='logic approximation')
xlabel('Concentration of X')
ylabel('Rate of production of Y')
legend(loc='best')
ylim(-0.5, B*1.1)
pass
As before change the values of the simulation to answer these questions
What if a gene needs two or more activator proteins to be transcribed? We can describe the amount of $Z$ transcribed as a function of active forms of $X$ and $Y$ with a function like:
$$ f(X,Y) = \beta\ \Theta(X > K_x \land Y > K_y) $$The above equation describes "AND" logic (i.e. both X and Y have to be above their threshold levels, $K_x$ and $K_y$, for Z to be transcribed). In a similar manner we can define "OR" logic:
$$ f(X,Y) = \beta\ \Theta(X > K_x \lor Y > K_y) $$A SUM function would be defined like this:
$$ f(X,Y) = \beta_x \Theta(X > K_x) + \beta_y \Theta (Y > K_y) $$Up until this point we've been considering how the rate of production of a protein $Y$ changes with the concentration of a transcriptional activator/repressor that regulates $Y$. Now we want to turn to the question of how the absolute amount of $Y$ changes over time.
As we discussed at the beginning of this notebook, how the amount of $Y$ changes over time is a function of two things: 1) a growth term which represents the rate of production of $Y$; and 2) a decay term which gives the rate at which $Y$ is degraded. A differential equation describing this as follows:
$$ \frac{dY}{dt} = f(X_1, X_2, \ldots) - \alpha Y $$The $f(X_1, X_2, \ldots)$ term represents the growth term and is a function of the transcription factors that regulate $Y$. We've already seen a couple of ways to model the rate of producting -- using the Hill function or its logic approximation. For the sake of simplicity we'll use the logic approximation to model the growth term. For example, in the case $Y$ is regulated by a single input we might use $f(X) = \beta \theta(X > K_1)$. For the equivalent function where $Y$ was regulated by two transcription factor, $X_1$ and $X_2$, and both are required to be above the respective threshold, we could use the function $f(X_1, X_2) = \beta \theta (X_1 > K_1 \land X_2 > K_2)$.
The second term, $\alpha Y$ represents the rate at which $Y$ is being broken down or diluted. Notice that the decay rate is a proportional to the amount of $Y$ that is present.
Now let's explore a simple model of regulation for the two gene network, $X \longrightarrow Y$. Here we assume that at time 0 the activator, $X$, rises above the threshold, $K$, necessary to induce transcription of $Y$ at the rate $\beta$. $X$ remains above this threshold for the entire simulation. Therefore, we can write $dY/dt$ as:
$$ \frac{dY}{dt} = \beta - \alpha Y $$Write a Python function to represent the change in $Y$ in a given time increment, under this assumption of constant activation:
In [ ]:
## write a function to represent the simple differential equation above
def dYdt(B, K, a, X, Y):
production = logic_activating(X, B, K)
decay = a*Y
return production - decay
In [ ]:
## generate a plot of conc of Y over time using your dY function defined above
## Evaluated over 200 time units
B = 5
K = 10
X = K + 1
Y = [0] # initial value of Y
a = 0.05
nsteps = 200
for i in range(nsteps):
yprevious = Y[-1]
deltay = dYdt(B, K, a, X, yprevious)
ynew = Y[-1] + deltay
Y.append(ynew)
plot(Y)
xlabel('Time units')
ylabel('Concentration of Y')
ylim(0, (B/a)*1.1)
pass
The concentration of $Y$ eventually reaches a steady state, $Y_{st}$. How does $Y_{st}$ relate to $\beta$ and $\alpha$?
The response time of a dynamical system, $T_{1/2}$is defined as the time it takes for it to go half-way between it's initial and final value.
Estimate the response time as you vary the parameter $\alpha$ and see if you can create a plot (in Python, or R, or Excel) showing the relationship between $\alpha$ and response time, for $0.01 \leq \alpha \leq 0.1$.
In [ ]:
B = 5
K = 10
a = 0.05
# setup pulse of X
# off (0) for first 50 steps, on for next 100 steps, off again for last 100 steps
X = [0]*50 + [3*K]*100 + [0]*100
Y = [0] # initial value of Y
nsteps = 250
for i in range(1, nsteps):
xnow = X[i]
yprevious = Y[-1]
deltay = dYdt(B, K, a, xnow, yprevious)
ynew = yprevious + deltay
Y.append(ynew)
plot(X, color='red', linestyle='dashed', label="X")
plot(Y, color='blue', label="Y")
ylim(0, max(max(X)*1.1, (B/a)*1.1))
xlabel('Time units')
ylabel('Concentration')
legend(loc="best")
pass
We're now going to use some of these tools to look at a class of network motifs (small network topologies), called Feed Forward Loops (FFLs), found in signaling and regulatory networks. FFLs involve interactions between three components, with the basic topology illustrated below. Depending on the signs of the edges (whether activating or repressing) we can classify FFLs as "coherent" or "incoherent." We'll take a look at an example of each class.
The most common type of coherent FFL is illustrated in the figure below. In this system $X$ is an activator of $Y$ and both $X$ and $Y$ regulate the production of $Z$ with AND logic (i.e. both $X$ and $Y$ must be above particular thresholds in order to trigger the production of $Z$).
In [ ]:
## We'll specify the behavior of X as a series of pulse of different length
## so we'll define a function to generate pulses
def pulse(ontime, offtime, ntimes, onval=1):
if ontime >= offtime:
raise Exception("Invalid on/off times.")
signal = np.zeros(ntimes)
signal[ontime:offtime] = onval
return signal
In [ ]:
nsteps = 150
short_pulse = pulse(20, 23, nsteps) # 5 sec pulse
long_pulse = pulse(50, 100, nsteps) # 50 sec pulse
X = short_pulse + long_pulse # we can then add the pulses to create
# a single time trace
plot(X, color='black')
xlabel('Time units')
ylabel('Amount of Gene Product')
ylim(0, 1.5)
pass
In [ ]:
def dYdt(B, K, a, X, Y):
if X > K:
theta = 1
else:
theta = 0
return B * theta - a * Y
def dZdt(B, Kx, Ky, a, X, Y, Z):
theta = 0
if (X > Kx) and (Y > Ky):
theta = 1
return B * theta - a * Z
In [ ]:
## Plot X, Y, and Z on the same time scale
nsteps = 150
short_pulse = pulse(20, 23, nsteps) # 5 sec pulse
long_pulse = pulse(50, 100, nsteps) # 50 sec pulse
X = short_pulse + long_pulse
# setup parameters for Y and Z
Y = [0]
betay, alphay = 0.2, 0.1
Kxy = 0.5
Z = [0]
betaz, alphaz = 0.2, 0.1
Kxz = 0.5
Kyz = 1
for i in range(nsteps):
xnow = X[i]
ynow, znow = Y[-1], Z[-1]
ynew = ynow + dYdt(betay, Kxy, alphay, xnow, ynow)
znew = znow + dZdt(betaz, Kxz, Kyz, alphaz, xnow, ynow, znow)
Y.append(ynew)
Z.append(znew)
plot(X, 'k--', label='X', linewidth=1.5)
plot(Y, 'b', label='Y')
plot(Z, 'r', label='Z')
ylim(-0.1, 2.5)
xlabel("Time")
ylabel("Concentration")
legend()
pass
How do the dynamics of $Y$ and $Z$ differ in the simulation above?
Try varying the length of the first (short) pulse. How does changing the length of the pulse affect the dynamics of $Y$ and $Z$?
In [ ]:
nsteps = 150
p1start = 10
p1duration = 5
p2start = 50
p2duration = 50
short_pulse = pulse(p1start, p1start + p1duration, nsteps) # short pulse
long_pulse = pulse(p2start, p2start + p2duration, nsteps) # long pulse
X = short_pulse + long_pulse
# change this `scale` argument to increase/decrease noise
noise = np.random.normal(loc=0, scale=0.2, size=nsteps) # mean=0, sd=0.2
X = X + noise
# setup parameters for Y and Z
Y = [0]
betay, alphay = 0.2, 0.1
Kxy = 0.5
Z = [0]
betaz, alphaz = 0.2, 0.1
Kxz = 0.5
Kyz = 1
for i in range(nsteps):
xnow = X[i]
ynow, znow = Y[-1], Z[-1]
ynew = ynow + dYdt(betay, Kxy, alphay, xnow, ynow)
znew = znow + dZdt(betaz, Kxz, Kyz, alphaz, xnow, ynow, znow)
Y.append(ynew)
Z.append(znew)
# draw each trace as a subfigure
# subfigures stacked in a vertical grid
subplot2grid((3,1),(0,0))
plot(X, 'k', label='X', linewidth=1)
legend()
subplot2grid((3,1),(1,0))
plot(Y, 'b', label='Y', linewidth=2)
legend()
subplot2grid((3,1),(2,0))
plot(Z, 'r', label='Z', linewidth=2)
vlines(p1start, min(Z),max(Z)*1.1,color='black',linestyle='dashed')
annotate("pulse 1 on", xy=(p1start,1),xytext=(40,20),
textcoords='offset points',
horizontalalignment="center",
verticalalignment="bottom",
arrowprops=dict(arrowstyle="->",color='black',
connectionstyle='arc3,rad=0.5',
linewidth=1))
vlines(p2start, min(Z),max(Z)*1.1,color='black',linestyle='dashed')
annotate("pulse 2 on", xy=(p2start,1),xytext=(-40,0),
textcoords='offset points',
horizontalalignment="center",
verticalalignment="bottom",
arrowprops=dict(arrowstyle="->",color='black',
connectionstyle='arc3,rad=0.5',
linewidth=1))
legend()
pass
In the code cell above, try changing the duration of the first pulse and the scale
of the noise (see comments in code) to get a sense of how good a filter the FFL is. Is there a bias to the filtering with respect to Z turning on versus Z turning off?
As before we can solve for Y as a function of time and calculate what its steady state value will be:
$$ Y(t) = Y_{st}(1-e^{-\alpha_{y}t}) $$and
$$ Y_{st}=\frac{\beta_y}{\alpha_y} $$Since $Z$ is governed by an AND function it needs both $X$ and $Y$ to be above their respective thresholds, $K_{xz}$ and $K_{yz}$. For the sake of simplicity let's assume that both $Y$ and $Z$ have the same threshold with respect to $X$, i.e. $K_{xy} = K_{xz}$. This allows us just to consider how long it takes for $Y$ to reach the threshold value $K_{yz}$. Given this we can calculate the delay before $Z$ turns on, $T_{\mathrm{on}}$ as follows.
$$ Y(T_{\mathrm{on}}) = Y_{st}(1-e^{-\alpha_y T_{\mathrm{on}}}) = K_{yz} $$and solving for $T_{\mathrm{on}}$ we find:
$$ T_{\mathrm{on}} = \frac{1}{\alpha_y} \log\left[\frac{1}{(1-K_{yz}/Y_{st})}\right] $$Thus we see that the delay before $Z$ turns on is a function of the degradation rate of $Y$ and the ratio between $Y_{st}$ and $K_{yz}$.
From the above formula, we see that there are two parameters that affect the turn-on time of $Z$ -- $\alpha_y$ (the scaling factor for the decay rate of $Y$) and the compound parameter $K_{yz}/Y_{st}$ (the threshold concentration where $Y$ activate $Z$ relative to the steady state of $Y$). To explore the two-dimensional parameter space of $Z's$ $T_on$ we can create a contour plot.
In [ ]:
def Ton(alpha, KYratio):
return (1.0/alpha) * log(1.0/(1.0-KYratio))
## Create a contour plot for a range of alpha and Kyz/Yst
x = alpha = linspace(0.01, 0.2, 100)
y = KYratio = linspace(0.01, 0.99, 100)
X,Y = meshgrid(x, y)
Z = Ton(X,Y)
levels = MaxNLocator(nbins=20).tick_values(Z.min(), Z.max())
im = contourf(X,Y,Z, cmap=cm.afmhot_r, levels=levels)
contour(X, Y, Z, levels,
colors=('k',),
linewidths=(0.5,))
colorbar(im)
xlabel('alpha')
ylabel("Kyz/Yst")
pass
As discussed in the article by Shen-Orr et al. a feed forward loop of the type we've just discussed can act as a type of filter -- a sign-sensitive delay that keeps $Z$ from firing in response to transient noisy signals from $X$, but shuts down $Z$ immediately once the signal from $X$ is removed.
Consider the FFL illustrated in the figure below.
In this incoherent FFL, the logic function that regulates $Z$ is "X AND NOT Y". That is $Z$ turns on once $X$ is above a given threshold, but only stays on fully as long as $Y$ is below another threshold. Again for simplicity we assume $K_{xy} = K_{yz}$.
To describe $Z$ we consider two phases - 1) while $Y < K_{yz}$ and 2) while $Y > K_{yz}$.
For the first phase:
$$ \frac{dZ}{dt} = \beta_z\ \Theta(X > K_{xz}) - \alpha_{z}Z $$and
$$ Z(t) = Z_{m}(1-e^{-\alpha_{z}t}) $$As we did in the case of the coherent FFL, we can calculate the time until $Y$ reaches the treshold $K_{yz}$. We'll call this $T_{\mathrm{rep}}$ and it is the same formula we found for $T_{\mathrm{on}}$ previously.
$$ T_{\mathrm{rep}} = \frac{1}{\alpha_y \log[\frac{1}{1-K_{yz}/Y_{st}}]} $$After a delay, $T_{\mathrm{rep}}$, $Y$ starts to repress the transcription of $Z$ and $Z$ decays to a new lower steady state, $Z_{st} = \beta_{z}^{'}/\alpha$. The value of $\beta_{z}^{'}$ depends on how leaky the repression of $Z$ is by $Y$.
The dynamics of $Z$ in Phase 2 is given by:
$$ Z(t) = Z_{st} + (Z_0 - Z_{st})e^{-\alpha_{z}(t-T_{\mathrm{rep}})} $$where $$ Z_0 = Z_{m}(1-e^{-\alpha_{z}T_{\mathrm{rep}}}) $$
In [ ]:
## A Python function that represents dZ/dt for the Incoherent FFL
## our dY function previously defined stays the same
def dZ_incoh(B1,B2,Kx,Ky,a,X,Y,Z):
pass # define the function here
def dZ_incoh(B1,B2,Kx,Ky,a,X,Y,Z):
theta = 0
B = 0
if (X > Kx) and (Y < Ky):
theta = 1
B = B1
elif (X > Kx) and (Y >= Ky):
theta = 1
B = B2
return B * theta - a * Z
In [ ]:
nsteps = 150
short_pulse = pulse(20, 25, nsteps) # 5 sec pulse
long_pulse = pulse(50, 100, nsteps) # 50 sec pulse
X = short_pulse + long_pulse
# setup parameters for Y and Z
Y = [0]
betay, alphay = 0.2, 0.1
Kxy = 0.5
Z = [0]
betaz1, betaz2 = 0.2, 0.001
alphaz = 0.1
Kxz = 0.5
Kyz = 0.5
for i in range(nsteps):
xnow = X[i]
ynow, znow = Y[-1], Z[-1]
ynew = ynow + dYdt(betay, Kxy, alphay, xnow, ynow)
znew = znow + dZ_incoh(betaz1, betaz2, Kxz, Kyz, alphaz, xnow, ynow, znow)
Y.append(ynew)
Z.append(znew)
# draw each trace as a subfigure
# subfigures stacked in a vertical grid
subplot2grid((3,1),(0,0))
plot(X, 'k', label='X', linewidth=1)
legend()
ylim(0,1.1)
subplot2grid((3,1),(1,0))
plot(Y, 'b', label='Y', linewidth=2)
legend()
ylim(0,2.1)
subplot2grid((3,1),(2,0))
plot(Z, 'r', label='Z', linewidth=2)
legend()
ylim(0,0.7)
pass
Note that the stimulus amount of $Z$ in the system initially increases, but then decreases to a lower steady even when the initial stimulus persists. This system thus generates pulse-like dynamics to a persistent signal. How pulse-like the signal is depends on the ratio of $\beta_z$ to $\beta_{z}^{'}$. We define the repression factor, $F$, as follows:
$$ F = \frac{\beta_z}{\beta_{z}^{'}} = \frac{Z_m}{Z_{st}} $$Alon (2007) illustrates the possible dynamics of several FFLs coupled together (see figure below):
Answer the questions below with short-answers and/or figures where helpful.
1) In the network above (left subfigure), there are several coupled Feed-Forward Loops.
2) Alon writes "The FFLs in this network are combined in a way that utilizes their delay and pulse-generating features to generate a temporal programme of gene expression." This is illustrated in the figure on the right. For the questions below, assume there is a constant level of the transcription factor $X_1$.
3) Gene knockouts
4) Additional regulation
In [ ]: