Causal discovery with `TIGRAMITE`

TIGRAMITE is a time series analysis python module. It allows to reconstruct graphical models (conditional independence graphs) from discrete or continuously-valued time series based on the PCMCI method and create high-quality plots of the results.

PCMCI is described here: J. Runge, P. Nowack, M. Kretschmer, S. Flaxman, D. Sejdinovic, Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5, eaau4996 (2019) https://advances.sciencemag.org/content/5/11/eaau4996

This tutorial explains the Mediation class. See the following paper for theoretical background: Runge, Jakob, Vladimir Petoukhov, Jonathan F. Donges, Jaroslav Hlinka, Nikola Jajcay, Martin Vejmelka, David Hartman, Norbert Marwan, Milan Paluš, and Jürgen Kurths. 2015. “Identifying Causal Gateways and Mediators in Complex Spatio-Temporal Systems.” Nature Communications 6: 8502. https://doi.org/10.1038/ncomms9502.

Last, the following Nature Communications Perspective paper provides an overview of causal inference methods in general, identifies promising applications, and discusses methodological challenges (exemplified in Earth system sciences): https://www.nature.com/articles/s41467-019-10105-3



In [2]:

    
# Imports
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline     
## use `%matplotlib notebook` for interactive figures
# plt.style.use('ggplot')
import sklearn

import tigramite
from tigramite import data_processing as pp
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests import ParCorr, GPDC, CMIknn, CMIsymb
from tigramite.models import LinearMediation, Prediction

Causal effects and mediation

The preceding sections were concerned with estimating causal links. In this section we discuss how the estimated time series graph can be used to evaluate causal effects and causal mediation in a linear framework as discussed in more detail in Runge et al, Nature Communications 2015. Consider the following model of a simple causal chain:

\begin{align*} X_t &= \eta^X_t \\ Y_t &= 0.5 X_{t-1} + \eta^Y_t \\ Z_t &= 0.5 Y_{t-1} + \eta^Z_t \end{align*}



In [3]:

    
np.random.seed(42)
# links_coeffs = {0: [],
#                 1: [((0, -1), 0.5)],
#                 2: [((1, -1), 0.5)],
#                 }
links_coeffs = {0: [((0,-1), 0.8)],
                1: [((1,-1), 0.8), ((0, -1), 0.5)],
                2: [((2,-1), 0.8), ((1, -1), 0.5)],
                }
var_names = [r"$X$", r"$Y$", r"$Z$"]
    
data, true_parents = pp.var_process(links_coeffs, T=1000)

# Initialize dataframe object, specify time axis and variable names
dataframe = pp.DataFrame(data, 
                         var_names=var_names)
med = LinearMediation(dataframe=dataframe)
med.fit_model(all_parents=true_parents, tau_max=4)

We fit the linear mediation model based on the true parents here, in practice these are estimated with PCMCI. If the assumption of a linear model is justified and causal sufficiency is fulfilled, the link coefficient of $X_{t-\tau}\to Y_t$ estimated from standardized time series (default in LinearMediation class) corresponds to the change in the expected value of $Y_t$ (in units of its standard deviation) caused by a perturbation of one standard deviation in $X_{t-\tau}$. Let's check the link coefficient of $X_{t-2}\to Z_t$



In [4]:

    
print ("Link coefficient (0, -2) --> 2: ", med.get_coeff(i=0, tau=-2, j=2))









    



Link coefficient (0, -2) --> 2:  0.0

The link coefficient is non-zero only for direct links. The causal effect evaluates also indirect effects and is now simply computed by summing over the products of link coefficients along all possible paths between the two variables. For example, here the causal effect of $X_{t-2}\to Z_t$ is



In [5]:

    
print ("Causal effect (0, -2) --> 2: ", med.get_ce(i=0, tau=-2, j=2))









    



Causal effect (0, -2) --> 2:  0.06386220811405569

The mediated causal effect quantifies the contribution of an intermediate variable to the causal effect. For example, let's look at the contribution of $Y$ on the causal effect of $X_{t-2}\to Z_t$.



In [6]:

    
print ("Mediated Causal effect (0, -2) --> 2 through 1: ", med.get_mce(i=0, tau=-2, j=2, k=1))









    



Mediated Causal effect (0, -2) --> 2 through 1:  0.06386220811405569

Since $Y$ mediates all of the causal effect, MCE is the same as CE here. Mediation analysis is a powerful tool to quantify not only direct causal links, but also indirect pathways. This can answer questions such as "How important is one process for the causal mechanism between two others?". In the tigramite.plotting module are functions to visualize causal pathways both in the aggregated network and in the time series graph. Here we look at all causal pathways between $X_{t-4}$ and $Z_t$:



In [7]:

    
i=0; tau=4; j=2
graph_data = med.get_mediation_graph_data(i=i, tau=tau, j=j)
tp.plot_mediation_time_series_graph(
    var_names=var_names,
    path_node_array=graph_data['path_node_array'],
    tsg_path_val_matrix=graph_data['tsg_path_val_matrix']
    )
tp.plot_mediation_graph(
                    var_names=var_names,
                    path_val_matrix=graph_data['path_val_matrix'], 
                    path_node_array=graph_data['path_node_array'],
                    )

In both plots, the node color depicts the mediation of a variable and the link colors depict the link coefficients. The graph plot in the bottom panel is easier to visualize for more complex pathways, but it's harder to see the pathways across variables and in time.

Causal effect and mediation analysis can also be used for more aggregate measures. The Average Causal Effect (ACE) quantifies how strong the effect of a single variable on the whole system is and the Average Causal Susceptibility (ACS) quantifies how strong a single variable is effected by perturbations entering elsewhere in the system. Last, the Average Mediated Causal Effect (AMCE) quantifies how much a single variable mediates causal effects between any two other processes. In Runge et al, Nature Communications (2015), these measures are compared with conventional complex network measures to show that causal effect measures are better interpretable alternatives. For example, betweenness centrality gives the average number of shortest paths going through a particular node. However, causal effects do not necessarily take shortest paths and betweenness also does not properly take into account the causal effect weights.



In [8]:

    
print ("Average Causal Effect X=%.2f, Y=%.2f, Z=%.2f " % tuple(med.get_all_ace()))
print ("Average Causal Susceptibility X=%.2f, Y=%.2f, Z=%.2f " % tuple(med.get_all_acs()))
print ("Average Mediated Causal Effect X=%.2f, Y=%.2f, Z=%.2f " % tuple(med.get_all_amce()))









    



Average Causal Effect X=0.39, Y=0.24, Z=0.00 
Average Causal Susceptibility X=0.00, Y=0.27, Z=0.36 
Average Mediated Causal Effect X=0.00, Y=0.12, Z=0.00

Note that per default self-loops are excluded in these measures.



In [ ]:

Causal discovery with TIGRAMITE

Causal effects and mediation

Causal discovery with `TIGRAMITE`