Causal discovery with TIGRAMITE

TIGRAMITE is a time series analysis python module. It allows to reconstruct graphical models (conditional independence graphs) from discrete or continuously-valued time series based on the PCMCI method and create high-quality plots of the results.

PCMCI is described here: J. Runge, P. Nowack, M. Kretschmer, S. Flaxman, D. Sejdinovic, Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5, eaau4996 (2019) https://advances.sciencemag.org/content/5/11/eaau4996

This tutorial explains the causal assumptions and gives walk-through examples. See the following paper for theoretical background: Runge, Jakob. 2018. “Causal Network Reconstruction from Time Series: From Theoretical Assumptions to Practical Estimation.” Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (7): 075310.

Last, the following Nature Communications Perspective paper provides an overview of causal inference methods in general, identifies promising applications, and discusses methodological challenges (exemplified in Earth system sciences): https://www.nature.com/articles/s41467-019-10105-3


In [24]:
# Imports
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline     
## use `%matplotlib notebook` for interactive figures
# plt.style.use('ggplot')
import sklearn

import tigramite
from tigramite import data_processing as pp
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests import ParCorr, GPDC, CMIknn, CMIsymb
from tigramite.models import LinearMediation, Prediction

Causal assumptions

Having introduced the basic functionality, we now turn to a discussion of the assumptions underlying a causal interpretation:

  • Faithfulness / Stableness: Independencies in data arise not from coincidence, but rather from causal structure or, expressed differently, If two variables are independent given some other subset of variables, then they are not connected by a causal link in the graph.

  • Causal Sufficiency: Measured variables include all of the common causes.

  • Causal Markov Condition: All the relevant probabilistic information that can be obtained from the system is contained in its direct causes. or, expressed differently, If two variables are not connected in the causal graph given some set of conditions (see Runge Chaos 2018 for further definitions), then they are conditionally independent.

  • No contemporaneous effects: There are no causal effects at lag zero.

  • Stationarity

  • Parametric assumptions of independence tests (these where already discussed in basic tutorial)

Faithfulness

Faithfulness, as stated above, is an expression of the assumption that the independencies we measure come from the causal structure, i.e., the time series graph, and cannot occur due to some fine tuning of the parameters. Another unfaithful case are processes containing purely deterministic dependencies, i.e., $Y=f(X)$, without any noise. We illustrate these cases in the following.

Fine tuning

Suppose in our model we have two ways in which $X^0$ causes $X^2$, a direct one, and an indirect effect $X^0\to X^1 \to X^2$ as realized in the following model:

\begin{align*} X^0_t &= \eta^0_t\\ X^1_t &= 0.6 X^0_{t-1} + \eta^1_t\\ X^2_t &= 0.6 X^1_{t-1} - 0.36 X^0_{t-2} + \eta^2_t\\ \end{align*}

In [2]:
np.random.seed(1)
data = np.random.randn(500, 3)
for t in range(1, 500):
#     data[t, 0] += 0.6*data[t-1, 1]
    data[t, 1] += 0.6*data[t-1, 0]
    data[t, 2] += 0.6*data[t-1, 1] - 0.36*data[t-2, 0]
    
var_names = [r'$X^0$', r'$X^1$', r'$X^2$', r'$X^3$']
dataframe = pp.DataFrame(data, var_names=var_names)
# tp.plot_timeseries(dataframe)

Since here $X^2_t = 0.6 X^1_{t-1} - 0.36 X^0_{t-2} + \eta^2_t = 0.6 (0.6 X^0_{t-2} + \eta^1_{t-1}) - 0.36 X^0_{t-2} + \eta^2_t = 0.36 X^0_{t-2} - 0.36 X^0_{t-2} + ...$, there is no unconditional dependency $X^0_{t-2} \to X^2_t$ and the link is not detected in the condition-selection step:


In [3]:
parcorr = ParCorr()
pcmci_parcorr = PCMCI(
    dataframe=dataframe, 
    cond_ind_test=parcorr,
    verbosity=1)
all_parents = pcmci_parcorr.run_pc_stable(tau_max=2, pc_alpha=0.2)


##
## Running Tigramite PC algorithm
##

Parameters:
independence test = par_corr
tau_min = 1
tau_max = 2
pc_alpha = 0.2
max_conds_dim = None
max_combinations = 1



## Variable $X^0$

## Variable $X^1$

## Variable $X^2$

## Resulting condition sets:

    Variable $X^0$ has 0 parent(s):

    Variable $X^1$ has 1 parent(s):
        ($X^0$ -1): max_pval = 0.00000, min_val = 0.534

    Variable $X^2$ has 1 parent(s):
        ($X^1$ -1): max_pval = 0.00000, min_val = 0.434

However, since the other parent of $X^2$, namely $X^1_{t-1}$ is detected, the MCI step conditions on $X^1_{t-1}$ and can reveal the true underlying graph (in this particular case):


In [4]:
results = pcmci_parcorr.run_pcmci(tau_max=2, pc_alpha=0.2)
pcmci_parcorr.print_significant_links(
        p_matrix = results['p_matrix'], 
        val_matrix = results['val_matrix'],
        alpha_level = 0.01)


##
## Running Tigramite PC algorithm
##

Parameters:
independence test = par_corr
tau_min = 1
tau_max = 2
pc_alpha = 0.2
max_conds_dim = None
max_combinations = 1



## Variable $X^0$

## Variable $X^1$

## Variable $X^2$

## Resulting condition sets:

    Variable $X^0$ has 0 parent(s):

    Variable $X^1$ has 1 parent(s):
        ($X^0$ -1): max_pval = 0.00000, min_val = 0.534

    Variable $X^2$ has 1 parent(s):
        ($X^1$ -1): max_pval = 0.00000, min_val = 0.434

##
## Running Tigramite MCI algorithm
##

Parameters:

independence test = par_corr
tau_min = 0
tau_max = 2
max_conds_py = None
max_conds_px = None

## Significant links at alpha = 0.05:

    Variable $X^0$ has 0 link(s):

    Variable $X^1$ has 1 link(s):
        ($X^0$ -1): pval = 0.00000 | val = 0.536 | conf = (0.000, 0.000)

    Variable $X^2$ has 2 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 0.543 | conf = (0.000, 0.000)
        ($X^0$ -2): pval = 0.00000 | val = -0.365 | conf = (0.000, 0.000)

## Significant links at alpha = 0.01:

    Variable $X^0$ has 0 link(s):

    Variable $X^1$ has 1 link(s):
        ($X^0$ -1): pval = 0.00000 | val = 0.536

    Variable $X^2$ has 2 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 0.543
        ($X^0$ -2): pval = 0.00000 | val = -0.365

Note, however, that this is not always the case and such cancellation, even though a pathological case, can present a problem especially for smaller sample sizes.

Deterministic dependencies

Another violation of faithfulness can happen due to purely deterministic dependencies as shown here:


In [5]:
np.random.seed(1)
data = np.random.randn(500, 3)
for t in range(1, 500):
    data[t, 0] = 0.4*data[t-1, 1]
    data[t, 2] += 0.3*data[t-2, 1] + 0.7*data[t-1, 0]
dataframe = pp.DataFrame(data, var_names=var_names)
tp.plot_timeseries(dataframe)


Out[5]:
(<Figure size 432x288 with 3 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fe094724240>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe0884642e8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe08849a278>],
       dtype=object))

In [7]:
parcorr = ParCorr()
pcmci_parcorr = PCMCI(
    dataframe=dataframe, 
    cond_ind_test=parcorr,
    verbosity=2)
results = pcmci_parcorr.run_pcmci(tau_max=2, pc_alpha=0.2)
pcmci_parcorr.print_significant_links(
        p_matrix = results['p_matrix'], 
        val_matrix = results['val_matrix'],
        alpha_level = 0.01)
link_matrix = pcmci_parcorr.return_significant_parents(pq_matrix=results['p_matrix'],
                        val_matrix=results['val_matrix'], alpha_level=0.01)['link_matrix']
# Plot time series graph
tp.plot_time_series_graph(
    figsize=(6, 3),
    val_matrix=results['val_matrix'],
    link_matrix=link_matrix,
    var_names=var_names,
    link_colorbar_label='MCI',
    )


##
## Running Tigramite PC algorithm
##

Parameters:
independence test = par_corr
tau_min = 1
tau_max = 2
pc_alpha = 0.2
max_conds_dim = None
max_combinations = 1



## Variable $X^0$

Iterating through pc_alpha = [0.2]:

# pc_alpha = 0.2 (1/1):

Testing condition sets of dimension 0:

    Link ($X^0$ -1) --> $X^0$ (1/6):
    Combination 0:  --> pval = 0.14650 / val = -0.065
    No conditions of dimension 0 left.

    Link ($X^0$ -2) --> $X^0$ (2/6):
    Combination 0:  --> pval = 0.82012 / val = -0.010
    Non-significance detected.

    Link ($X^1$ -1) --> $X^0$ (3/6):
    Combination 0:  --> pval = 0.00000 / val = 1.000
    No conditions of dimension 0 left.

    Link ($X^1$ -2) --> $X^0$ (4/6):
    Combination 0:  --> pval = 0.14650 / val = -0.065
    No conditions of dimension 0 left.

    Link ($X^2$ -1) --> $X^0$ (5/6):
    Combination 0:  --> pval = 0.58756 / val = -0.024
    Non-significance detected.

    Link ($X^2$ -2) --> $X^0$ (6/6):
    Combination 0:  --> pval = 0.92794 / val = 0.004
    Non-significance detected.

    Sorting parents in decreasing order with 
    weight(i-tau->j) = min_{iterations} |I_{ij}(tau)| 

Updating parents:

    Variable $X^0$ has 3 parent(s):
        ($X^1$ -1): max_pval = 0.00000, min_val = 1.000
        ($X^0$ -1): max_pval = 0.14650, min_val = 0.065
        ($X^1$ -2): max_pval = 0.14650, min_val = 0.065

Testing condition sets of dimension 1:

    Link ($X^1$ -1) --> $X^0$ (1/3):
    Combination 0: ($X^0$ -1)  --> pval = 0.00000 / val = 1.000
    No conditions of dimension 1 left.

    Link ($X^0$ -1) --> $X^0$ (2/3):
    Combination 0: ($X^1$ -1)  --> pval = 0.79018 / val = 0.012
    Non-significance detected.

    Link ($X^1$ -2) --> $X^0$ (3/3):
    Combination 0: ($X^1$ -1)  --> pval = 0.79018 / val = 0.012
    Non-significance detected.

    Sorting parents in decreasing order with 
    weight(i-tau->j) = min_{iterations} |I_{ij}(tau)| 

Updating parents:

    Variable $X^0$ has 1 parent(s):
        ($X^1$ -1): max_pval = 0.00000, min_val = 1.000

Algorithm converged for variable $X^0$

## Variable $X^1$

Iterating through pc_alpha = [0.2]:

# pc_alpha = 0.2 (1/1):

Testing condition sets of dimension 0:

    Link ($X^0$ -1) --> $X^1$ (1/6):
    Combination 0:  --> pval = 0.76828 / val = -0.013
    Non-significance detected.

    Link ($X^0$ -2) --> $X^1$ (2/6):
    Combination 0:  --> pval = 0.90544 / val = -0.005
    Non-significance detected.

    Link ($X^1$ -1) --> $X^1$ (3/6):
    Combination 0:  --> pval = 0.16411 / val = -0.063
    No conditions of dimension 0 left.

    Link ($X^1$ -2) --> $X^1$ (4/6):
    Combination 0:  --> pval = 0.76828 / val = -0.013
    Non-significance detected.

    Link ($X^2$ -1) --> $X^1$ (5/6):
    Combination 0:  --> pval = 0.93898 / val = -0.003
    Non-significance detected.

    Link ($X^2$ -2) --> $X^1$ (6/6):
    Combination 0:  --> pval = 0.13766 / val = -0.067
    No conditions of dimension 0 left.

    Sorting parents in decreasing order with 
    weight(i-tau->j) = min_{iterations} |I_{ij}(tau)| 

Updating parents:

    Variable $X^1$ has 2 parent(s):
        ($X^2$ -2): max_pval = 0.13766, min_val = 0.067
        ($X^1$ -1): max_pval = 0.16411, min_val = 0.063

Testing condition sets of dimension 1:

    Link ($X^2$ -2) --> $X^1$ (1/2):
    Combination 0: ($X^1$ -1)  --> pval = 0.13879 / val = -0.067
    No conditions of dimension 1 left.

    Link ($X^1$ -1) --> $X^1$ (2/2):
    Combination 0: ($X^2$ -2)  --> pval = 0.16543 / val = -0.062
    No conditions of dimension 1 left.

    Sorting parents in decreasing order with 
    weight(i-tau->j) = min_{iterations} |I_{ij}(tau)| 

Updating parents:

    Variable $X^1$ has 2 parent(s):
        ($X^2$ -2): max_pval = 0.13879, min_val = 0.067
        ($X^1$ -1): max_pval = 0.16543, min_val = 0.062

Algorithm converged for variable $X^1$

## Variable $X^2$

Iterating through pc_alpha = [0.2]:

# pc_alpha = 0.2 (1/1):

Testing condition sets of dimension 0:

    Link ($X^0$ -1) --> $X^2$ (1/6):
    Combination 0:  --> pval = 0.00000 / val = 0.504
    No conditions of dimension 0 left.

    Link ($X^0$ -2) --> $X^2$ (2/6):
    Combination 0:  --> pval = 0.64311 / val = -0.021
    Non-significance detected.

    Link ($X^1$ -1) --> $X^2$ (3/6):
    Combination 0:  --> pval = 0.65325 / val = -0.020
    Non-significance detected.

    Link ($X^1$ -2) --> $X^2$ (4/6):
    Combination 0:  --> pval = 0.00000 / val = 0.504
    No conditions of dimension 0 left.

    Link ($X^2$ -1) --> $X^2$ (5/6):
    Combination 0:  --> pval = 0.61544 / val = -0.023
    Non-significance detected.

    Link ($X^2$ -2) --> $X^2$ (6/6):
    Combination 0:  --> pval = 0.45829 / val = 0.033
    Non-significance detected.

    Sorting parents in decreasing order with 
    weight(i-tau->j) = min_{iterations} |I_{ij}(tau)| 

Updating parents:

    Variable $X^2$ has 2 parent(s):
        ($X^0$ -1): max_pval = 0.00000, min_val = 0.504
        ($X^1$ -2): max_pval = 0.00000, min_val = 0.504

Testing condition sets of dimension 1:

    Link ($X^0$ -1) --> $X^2$ (1/2):
    Combination 0: ($X^1$ -2)  --> pval = 0.58589 / val = 0.025
    Non-significance detected.

    Link ($X^1$ -2) --> $X^2$ (2/2):
    Combination 0: ($X^0$ -1)  --> pval = 0.74355 / val = -0.015
    Non-significance detected.

    Sorting parents in decreasing order with 
    weight(i-tau->j) = min_{iterations} |I_{ij}(tau)| 

Updating parents:

    Variable $X^2$ has 0 parent(s):

Algorithm converged for variable $X^2$

## Resulting condition sets:

    Variable $X^0$ has 1 parent(s):
        ($X^1$ -1): max_pval = 0.00000, min_val = 1.000

    Variable $X^1$ has 2 parent(s):
        ($X^2$ -2): max_pval = 0.13879, min_val = 0.067
        ($X^1$ -1): max_pval = 0.16543, min_val = 0.062

    Variable $X^2$ has 0 parent(s):

##
## Running Tigramite MCI algorithm
##

Parameters:

independence test = par_corr
tau_min = 0
tau_max = 2
max_conds_py = None
max_conds_px = None

        link ($X^0$ -1) --> $X^0$ (1/8):
        with conds_y = [ ($X^1$ -1) ]
        with conds_x = [ ($X^1$ -2) ]
        pval = 0.00000 | val = 0.708

        link ($X^0$ -2) --> $X^0$ (2/8):
        with conds_y = [ ($X^1$ -1) ]
        with conds_x = [ ($X^1$ -3) ]
        pval = 0.00000 | val = 0.489

        link ($X^1$ 0) --> $X^0$ (3/8):
        with conds_y = [ ($X^1$ -1) ]
        with conds_x = [ ($X^2$ -2) ($X^1$ -1) ]
        pval = 0.59467 | val = -0.024

        link ($X^1$ -1) --> $X^0$ (4/8):
        with conds_y = [ ]
        with conds_x = [ ($X^2$ -3) ($X^1$ -2) ]
        pval = 0.00000 | val = 1.000

        link ($X^1$ -2) --> $X^0$ (5/8):
        with conds_y = [ ($X^1$ -1) ]
        with conds_x = [ ($X^2$ -4) ($X^1$ -3) ]
        pval = 0.56285 | val = -0.026

        link ($X^2$ 0) --> $X^0$ (6/8):
        with conds_y = [ ($X^1$ -1) ]
        with conds_x = [ ]
        pval = 0.58354 | val = 0.025

        link ($X^2$ -1) --> $X^0$ (7/8):
        with conds_y = [ ($X^1$ -1) ]
        with conds_x = [ ]
        pval = 0.98187 | val = 0.001

        link ($X^2$ -2) --> $X^0$ (8/8):
        with conds_y = [ ($X^1$ -1) ]
        with conds_x = [ ]
        pval = 0.70043 | val = -0.017

        link ($X^0$ 0) --> $X^1$ (1/8):
        with conds_y = [ ($X^2$ -2) ($X^1$ -1) ]
        with conds_x = [ ($X^1$ -1) ]
        pval = 0.46324 | val = -0.033

        link ($X^0$ -1) --> $X^1$ (2/8):
        with conds_y = [ ($X^2$ -2) ($X^1$ -1) ]
        with conds_x = [ ($X^1$ -2) ]
        pval = 0.99243 | val = 0.000

        link ($X^0$ -2) --> $X^1$ (3/8):
        with conds_y = [ ($X^2$ -2) ($X^1$ -1) ]
        with conds_x = [ ($X^1$ -3) ]
        pval = 0.98942 | val = -0.001

        link ($X^1$ -1) --> $X^1$ (4/8):
        with conds_y = [ ($X^2$ -2) ]
        with conds_x = [ ($X^2$ -3) ($X^1$ -2) ]
        pval = 0.14761 | val = -0.065

        link ($X^1$ -2) --> $X^1$ (5/8):
        with conds_y = [ ($X^2$ -2) ($X^1$ -1) ]
        with conds_x = [ ($X^2$ -4) ($X^1$ -3) ]
        pval = 0.66044 | val = -0.020

        link ($X^2$ 0) --> $X^1$ (6/8):
        with conds_y = [ ($X^2$ -2) ($X^1$ -1) ]
        with conds_x = [ ]
        pval = 0.58606 | val = -0.025

        link ($X^2$ -1) --> $X^1$ (7/8):
        with conds_y = [ ($X^2$ -2) ($X^1$ -1) ]
        with conds_x = [ ]
        pval = 0.87857 | val = -0.007

        link ($X^2$ -2) --> $X^1$ (8/8):
        with conds_y = [ ($X^1$ -1) ]
        with conds_x = [ ]
        pval = 0.13879 | val = -0.067

        link ($X^0$ 0) --> $X^2$ (1/8):
        with conds_y = [ ]
        with conds_x = [ ($X^1$ -1) ]
        pval = 0.58369 | val = 0.025

        link ($X^0$ -1) --> $X^2$ (2/8):
        with conds_y = [ ]
        with conds_x = [ ($X^1$ -2) ]
        pval = 0.58589 | val = 0.025

        link ($X^0$ -2) --> $X^2$ (3/8):
        with conds_y = [ ]
        with conds_x = [ ($X^1$ -3) ]
        pval = 0.96453 | val = -0.002

        link ($X^1$ 0) --> $X^2$ (4/8):
        with conds_y = [ ]
        with conds_x = [ ($X^2$ -2) ($X^1$ -1) ]
        pval = 0.58606 | val = -0.025

        link ($X^1$ -1) --> $X^2$ (5/8):
        with conds_y = [ ]
        with conds_x = [ ($X^2$ -3) ($X^1$ -2) ]
        pval = 0.82268 | val = 0.010

        link ($X^1$ -2) --> $X^2$ (6/8):
        with conds_y = [ ]
        with conds_x = [ ($X^2$ -4) ($X^1$ -3) ]
        pval = 0.00000 | val = 0.501

        link ($X^2$ -1) --> $X^2$ (7/8):
        with conds_y = [ ]
        with conds_x = [ ]
        pval = 0.61544 | val = -0.023

        link ($X^2$ -2) --> $X^2$ (8/8):
        with conds_y = [ ]
        with conds_x = [ ]
        pval = 0.45829 | val = 0.033

## Significant links at alpha = 0.05:

    Variable $X^0$ has 3 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 1.000 | conf = (0.000, 0.000)
        ($X^0$ -1): pval = 0.00000 | val = 0.708 | conf = (0.000, 0.000)
        ($X^0$ -2): pval = 0.00000 | val = 0.489 | conf = (0.000, 0.000)

    Variable $X^1$ has 0 link(s):

    Variable $X^2$ has 1 link(s):
        ($X^1$ -2): pval = 0.00000 | val = 0.501 | conf = (0.000, 0.000)

## Significant links at alpha = 0.01:

    Variable $X^0$ has 3 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 1.000
        ($X^0$ -1): pval = 0.00000 | val = 0.708
        ($X^0$ -2): pval = 0.00000 | val = 0.489

    Variable $X^1$ has 0 link(s):

    Variable $X^2$ has 1 link(s):
        ($X^1$ -2): pval = 0.00000 | val = 0.501
/home/rung_ja/anaconda3/envs/py36/lib/python3.6/site-packages/tigramite-4.0.0b0-py3.6-linux-x86_64.egg/tigramite/independence_tests.py:1181: RuntimeWarning: divide by zero encountered in double_scalars
  trafo_val = value * np.sqrt(deg_f/(1. - value*value))

Here the partial correlation $X^1_{t-1} \to X^0_t$ is exactly 1. Since these now represent the same variable, the true link $X^0_{t-1} \to X^2_t$ cannot be detected anymore since we condition on $X^1_{t-2}$. Deterministic copies of other variables should be excluded from the analysis.

Causal sufficiency

Causal sufficiency demands that the set of variables contains all common causes of any two variables. This assumption is mostly violated when analyzing open complex systems outside a confined experimental setting. Any link estimated from a causal discovery algorithm could become non-significant if more variables are included in the analysis. Observational causal inference assuming causal sufficiency should generally be seen more as one step towards a physical process understanding. There exist, however, algorithms that take into account and can expclicitely represent confounded links (e.g., the FCI algorithm). Causal discovery can greatly help in an explorative model building analysis to get an idea of potential drivers. In particular, the absence of a link allows for a more robust conclusion: If there is no evidence for a statistical dependency, then a physical mechanism is less likely (assuming that the other assumptions hold).

See Runge, Jakob. 2018. “Causal Network Reconstruction from Time Series: From Theoretical Assumptions to Practical Estimation.” Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (7): 075310. for alternative approaches that do not necessitate Causal Sufficiency.

Unobserved driver / latent variable

For the common driver process, consider that the common driver was not measured:


In [8]:
np.random.seed(1)
data = np.random.randn(10000, 5)
a = 0.8
for t in range(5, 10000):
    data[t, 0] += a*data[t-1, 0]
    data[t, 1] += a*data[t-1, 1] + 0.5*data[t-1, 0]
    data[t, 2] += a*data[t-1, 2] + 0.5*data[t-1, 1] + 0.5*data[t-1, 4]
    data[t, 3] += a*data[t-1, 3] + 0.5*data[t-2, 4]
    data[t, 4] += a*data[t-1, 4]

# tp.plot_timeseries(dataframe)
obsdata = data[:,[0, 1, 2, 3]]
var_names_lat = ['W', 'Y', 'X', 'Z', 'U']

In [9]:
for data_here in [data, obsdata]:
    dataframe = pp.DataFrame(data_here)
    parcorr = ParCorr()
    pcmci_parcorr = PCMCI(
        dataframe=dataframe, 
        cond_ind_test=parcorr,
        verbosity=0)
    results = pcmci_parcorr.run_pcmci(tau_max=5, pc_alpha=0.1)
    pcmci_parcorr.print_significant_links(
            p_matrix = results['p_matrix'], 
            val_matrix = results['val_matrix'],
            alpha_level = 0.01)
    link_matrix = pcmci_parcorr.return_significant_parents(pq_matrix=results['p_matrix'],
                            val_matrix=results['val_matrix'], alpha_level=0.001)['link_matrix']
    tp.plot_graph(
        val_matrix=results['val_matrix'],
        link_matrix=link_matrix,
        var_names=var_names_lat,
        link_colorbar_label='cross-MCI',
        node_colorbar_label='auto-MCI',
        )


## Significant links at alpha = 0.01:

    Variable 0 has 2 link(s):
        (0 -1): pval = 0.00000 | val = 0.632
        (2 0): pval = 0.00493 | val = -0.028

    Variable 1 has 2 link(s):
        (1 -1): pval = 0.00000 | val = 0.624
        (0 -1): pval = 0.00000 | val = 0.442

    Variable 2 has 4 link(s):
        (2 -1): pval = 0.00000 | val = 0.626
        (4 -1): pval = 0.00000 | val = 0.461
        (1 -1): pval = 0.00000 | val = 0.450
        (0 0): pval = 0.00493 | val = -0.028

    Variable 3 has 3 link(s):
        (3 -1): pval = 0.00000 | val = 0.625
        (4 -2): pval = 0.00000 | val = 0.454
        (1 -3): pval = 0.00828 | val = -0.026

    Variable 4 has 3 link(s):
        (4 -1): pval = 0.00000 | val = 0.628
        (4 -5): pval = 0.00562 | val = 0.028
        (2 -2): pval = 0.00702 | val = -0.027

## Significant links at alpha = 0.01:

    Variable 0 has 2 link(s):
        (0 -1): pval = 0.00000 | val = 0.632
        (2 0): pval = 0.00661 | val = -0.027

    Variable 1 has 2 link(s):
        (1 -1): pval = 0.00000 | val = 0.624
        (0 -1): pval = 0.00000 | val = 0.442

    Variable 2 has 10 link(s):
        (2 -1): pval = 0.00000 | val = 0.715
        (1 -1): pval = 0.00000 | val = 0.385
        (3 0): pval = 0.00000 | val = 0.174
        (3 -1): pval = 0.00000 | val = 0.105
        (1 -2): pval = 0.00000 | val = -0.073
        (2 -2): pval = 0.00000 | val = -0.050
        (2 -3): pval = 0.00002 | val = -0.043
        (1 -3): pval = 0.00009 | val = -0.039
        (3 -5): pval = 0.00034 | val = -0.036
        (0 0): pval = 0.00661 | val = -0.027

    Variable 3 has 12 link(s):
        (3 -1): pval = 0.00000 | val = 0.717
        (2 -1): pval = 0.00000 | val = 0.278
        (2 0): pval = 0.00000 | val = 0.174
        (1 -3): pval = 0.00000 | val = -0.125
        (1 -2): pval = 0.00000 | val = -0.113
        (0 -5): pval = 0.00000 | val = -0.086
        (1 -4): pval = 0.00000 | val = -0.068
        (0 -4): pval = 0.00000 | val = -0.057
        (1 -5): pval = 0.00000 | val = -0.055
        (3 -2): pval = 0.00000 | val = -0.048
        (0 -3): pval = 0.00060 | val = -0.034
        (2 -2): pval = 0.00241 | val = -0.030

The upper plot shows the true causal graph if all variables are observed. The lower graph shows the case where variable $U$ is hidden. Then several spurious links appear: (1) $X\to Z$ and (2) links from $Y$ and $W$ to $Z$, which is counterintuitive because there is no possible indirect pathway (see upper graph). What's the reason? The culprit is the collider $X$: MCI (or FullCI and any other causal measure conditioning on the entire past) between $Y$ and $Z$ is conditioned on the parents of $Z$, which includes $X$ here in the lower latent graph. But then conditioning on a collider opens up the paths from $Y$ and $W$ to $Z$ and makes them dependent.

Solar forcing

In a geoscientific context, the solar forcing typically is a strong common driver of many processes. To remove this trivial effect, time series are typically anomalized, that is, the average seasonal cycle is subtracted. But one could also include the solar forcing explicitely as shown here via a sine wave for an artificial example. We've also made the time series more realistic by adding an auto-dependency on their past values.


In [10]:
np.random.seed(1)
data = np.random.randn(1000, 4)
# Simple sun
data[:,3] = np.sin(np.arange(1000)*20/np.pi)
var_names[3] = 'Sun'
c = 0.8
for t in range(1, 1000):
    data[t, 0] += 0.4*data[t-1, 0] + 0.4*data[t-1, 1] + c*data[t-1,3]
    data[t, 1] += 0.5*data[t-1, 1] + c*data[t-1,3]
    data[t, 2] += 0.6*data[t-1, 2] + 0.3*data[t-2, 1] + c*data[t-1,3]
dataframe = pp.DataFrame(data, var_names=var_names)
tp.plot_timeseries(dataframe)


Out[10]:
(<Figure size 432x288 with 4 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fe0822804e0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe0822285f8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe0821dd588>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe082191518>],
       dtype=object))

If we do not account for the common solar forcing, there will be many spurious links:


In [11]:
parcorr = ParCorr()
dataframe_nosun = pp.DataFrame(data[:,[0,1,2]], var_names=var_names)
pcmci_parcorr = PCMCI(
    selected_variables = [0,1,2],
    dataframe=dataframe_nosun, 
    cond_ind_test=parcorr,
    verbosity=0)
results = pcmci_parcorr.run_pcmci(tau_max=2, pc_alpha=0.2)
pcmci_parcorr.print_significant_links(
        p_matrix = results['p_matrix'], 
        val_matrix = results['val_matrix'],
        alpha_level = 0.01)


## Significant links at alpha = 0.01:

    Variable $X^0$ has 4 link(s):
        ($X^0$ -1): pval = 0.00000 | val = 0.420
        ($X^1$ -1): pval = 0.00000 | val = 0.403
        ($X^2$ -1): pval = 0.00027 | val = 0.116
        ($X^2$ 0): pval = 0.00488 | val = 0.089

    Variable $X^1$ has 3 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 0.478
        ($X^0$ -1): pval = 0.00002 | val = 0.135
        ($X^2$ 0): pval = 0.00330 | val = 0.093

    Variable $X^2$ has 5 link(s):
        ($X^2$ -1): pval = 0.00000 | val = 0.559
        ($X^1$ -2): pval = 0.00000 | val = 0.197
        ($X^1$ -1): pval = 0.00274 | val = 0.095
        ($X^1$ 0): pval = 0.00330 | val = 0.093
        ($X^0$ 0): pval = 0.00488 | val = 0.089

However, if we explicitely include the solar forcing variable (which we assume is known in this case), we can identify the correct causal graph. Since we are not interested in the drivers of the solar forcing variable, we don't attempt to reconstruct its parents. This can be achieved by selected_variables = [0,1,2].


In [12]:
parcorr = ParCorr()
pcmci_parcorr = PCMCI(
    selected_variables = [0,1,2],
    dataframe=dataframe, 
    cond_ind_test=parcorr,
    verbosity=0)
results = pcmci_parcorr.run_pcmci(tau_max=2, pc_alpha=0.2)
pcmci_parcorr.print_significant_links(
        p_matrix = results['p_matrix'], 
        val_matrix = results['val_matrix'],
        alpha_level = 0.01)
link_matrix = pcmci_parcorr.return_significant_parents(pq_matrix=results['p_matrix'],
                        val_matrix=results['val_matrix'], alpha_level=0.01)['link_matrix']
# Plot time series graph
tp.plot_time_series_graph(
    figsize=(6, 3),
    val_matrix=results['val_matrix'],
    link_matrix=link_matrix,
    var_names=var_names,
    link_colorbar_label='MCI',
    )


## Significant links at alpha = 0.01:

    Variable $X^0$ has 3 link(s):
        ($X^0$ -1): pval = 0.00000 | val = 0.378
        ($X^1$ -1): pval = 0.00000 | val = 0.356
        (Sun -1): pval = 0.00707 | val = 0.085

    Variable $X^1$ has 2 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 0.429
        (Sun -1): pval = 0.00000 | val = 0.258

    Variable $X^2$ has 5 link(s):
        ($X^2$ -1): pval = 0.00000 | val = 0.560
        (Sun -1): pval = 0.00000 | val = 0.309
        (Sun 0): pval = 0.00000 | val = 0.307
        (Sun -2): pval = 0.00000 | val = 0.302
        ($X^1$ -2): pval = 0.00000 | val = 0.205

Time sub-sampling

Sometimes a time series might be sub-sampled, that is the measurements are less frequent than the true underlying time-dependency. Consider the following process:


In [13]:
np.random.seed(1)
data = np.random.randn(1000, 3)
for t in range(1, 1000):
    data[t, 0] += 0.*data[t-1, 0] + 0.6*data[t-1,2]
    data[t, 1] += 0.*data[t-1, 1] + 0.6*data[t-1,0]
    data[t, 2] += 0.*data[t-1, 2] + 0.6*data[t-1,1]
dataframe = pp.DataFrame(data, var_names=var_names)
tp.plot_timeseries(dataframe)


Out[13]:
(<Figure size 432x288 with 3 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fe08200c0f0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe0883311d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe0822e2fd0>],
       dtype=object))

With the original time sampling we obtain the correct causal graph:


In [14]:
pcmci_parcorr = PCMCI(dataframe=dataframe, cond_ind_test=ParCorr())
results = pcmci_parcorr.run_pcmci(tau_min=0,tau_max=2, pc_alpha=0.2)
pcmci_parcorr.print_significant_links(
        p_matrix = results['p_matrix'], 
        val_matrix = results['val_matrix'],
        alpha_level = 0.01)


## Significant links at alpha = 0.01:

    Variable $X^0$ has 1 link(s):
        ($X^2$ -1): pval = 0.00000 | val = 0.529

    Variable $X^1$ has 1 link(s):
        ($X^0$ -1): pval = 0.00000 | val = 0.515

    Variable $X^2$ has 1 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 0.532

In [15]:
link_matrix = pcmci_parcorr.return_significant_parents(pq_matrix=results['p_matrix'],
                        val_matrix=results['val_matrix'], alpha_level=0.01)['link_matrix']
# Plot time series graph
tp.plot_time_series_graph(
    figsize=(6, 3),
    val_matrix=results['val_matrix'],
    link_matrix=link_matrix,
    var_names=var_names,
    link_colorbar_label='MCI',
    )


If we sub-sample the data, very counter-intuitive links can appear. The true causal loop gets detected in the wrong direction:


In [16]:
sampled_data = data[::2]
pcmci_parcorr = PCMCI(dataframe=pp.DataFrame(sampled_data, var_names=var_names), 
                      cond_ind_test=ParCorr(), verbosity=0)
results = pcmci_parcorr.run_pcmci(tau_min=0, tau_max=2, pc_alpha=0.2)
link_matrix = pcmci_parcorr.return_significant_parents(pq_matrix=results['p_matrix'],
                        val_matrix=results['val_matrix'], alpha_level=0.01)['link_matrix']
# Plot time series graph
tp.plot_time_series_graph(
    figsize=(6, 3),
    val_matrix=results['val_matrix'],
    link_matrix=link_matrix,
    var_names=var_names,
    link_colorbar_label='MCI',
    )


If causal lags are smaller than the time sampling, such problems may occur. Causal inference for sub-sampled data is still an active area of research.

Causal Markov condition

The Markov condition can be rephrased as assuming that the noises driving each variable are independent of each other and independent in time (iid). This is violated in the following example where each variable is driven by 1/f noise which refers to the scaling of the power spectrum. 1/f noise can be generated by averaging AR(1) processes (http://www.scholarpedia.org/article/1/f_noise) which means that the noise is not independent in time anymore (even though the noise terms of each individual variable are still independent). Note that this constitutes a violation of the Markov Condition of the observed process only. So one might call this rather a violation of Causal Sufficiency.


In [17]:
np.random.seed(1)
T = 10000
# Generate 1/f noise by averaging AR1-process with wide range of coeffs 
# (http://www.scholarpedia.org/article/1/f_noise)
def one_over_f_noise(T, n_ar=20):
    whitenoise = np.random.randn(T, n_ar)
    ar_coeffs = np.linspace(0.1, 0.9, n_ar)
    for t in range(T):
        whitenoise[t] += ar_coeffs*whitenoise[t-1]       
    return whitenoise.sum(axis=1)

data = np.random.randn(T, 3)
data[:,0] += one_over_f_noise(T)
data[:,1] += one_over_f_noise(T)
data[:,2] += one_over_f_noise(T)

for t in range(1, T):
    data[t, 0] +=  0.4*data[t-1, 1] 
    data[t, 2] +=  0.3*data[t-2, 1] 
dataframe = pp.DataFrame(data, var_names=var_names)
tp.plot_timeseries(dataframe)
# plt.psd(data[:,0],return_line=True)[2]
# plt.psd(data[:,1],return_line=True)[2]
# plt.psd(data[:,2],return_line=True)[2]
# plt.gca().set_xscale("log", nonposx='clip')
# plt.gca().set_yscale("log", nonposy='clip')


Out[17]:
(<Figure size 432x288 with 3 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fe0825c3f60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe082328978>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe081fd6da0>],
       dtype=object))

Here PCMCI will detect many spurious links, especially auto-dependencies, since the process has long memory and the present state is not independent of the further past given some set of parents.


In [18]:
parcorr = ParCorr()
pcmci_parcorr = PCMCI(
    dataframe=dataframe, 
    cond_ind_test=parcorr,
    verbosity=0)
results = pcmci_parcorr.run_pcmci(tau_max=5, pc_alpha=0.2)
pcmci_parcorr.print_significant_links(
        p_matrix = results['p_matrix'], 
        val_matrix = results['val_matrix'],
        alpha_level = 0.01)


## Significant links at alpha = 0.01:

    Variable $X^0$ has 7 link(s):
        ($X^0$ -1): pval = 0.00000 | val = 0.470
        ($X^1$ -1): pval = 0.00000 | val = 0.388
        ($X^1$ -2): pval = 0.00000 | val = -0.183
        ($X^0$ -2): pval = 0.00000 | val = 0.100
        ($X^1$ -3): pval = 0.00000 | val = -0.096
        ($X^1$ -4): pval = 0.00193 | val = -0.031
        ($X^0$ -3): pval = 0.00202 | val = 0.031

    Variable $X^1$ has 2 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 0.468
        ($X^1$ -2): pval = 0.00000 | val = 0.065

    Variable $X^2$ has 8 link(s):
        ($X^2$ -1): pval = 0.00000 | val = 0.468
        ($X^1$ -2): pval = 0.00000 | val = 0.288
        ($X^1$ -3): pval = 0.00000 | val = -0.145
        ($X^2$ -2): pval = 0.00000 | val = 0.082
        ($X^1$ -4): pval = 0.00000 | val = -0.076
        ($X^2$ -3): pval = 0.00018 | val = 0.037
        ($X^0$ -1): pval = 0.00696 | val = 0.027
        ($X^0$ -5): pval = 0.00831 | val = 0.026

Time aggregation

An important choice is how to aggregate measured time series. For example, climate time series might have been measured daily, but one might be interested in a less noisy time-scale and analyze monthly aggregates. Consider the following process:


In [19]:
np.random.seed(1)
data = np.random.randn(1000, 3)
for t in range(1, 1000):
    data[t, 0] += 0.7*data[t-1, 0] 
    data[t, 1] += 0.6*data[t-1, 1] + 0.6*data[t-1,0]
    data[t, 2] += 0.5*data[t-1, 2] + 0.6*data[t-1,1]
dataframe = pp.DataFrame(data, var_names=var_names)
tp.plot_timeseries(dataframe)


Out[19]:
(<Figure size 432x288 with 3 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fe0823e2d68>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe0823fde80>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7fe081f99e10>],
       dtype=object))

With the original time aggregation we obtain the correct causal graph:


In [20]:
pcmci_parcorr = PCMCI(dataframe=dataframe, cond_ind_test=ParCorr())
results = pcmci_parcorr.run_pcmci(tau_min=0,tau_max=2, pc_alpha=0.2)
pcmci_parcorr.print_significant_links(
        p_matrix = results['p_matrix'], 
        val_matrix = results['val_matrix'],
        alpha_level = 0.01)


## Significant links at alpha = 0.01:

    Variable $X^0$ has 1 link(s):
        ($X^0$ -1): pval = 0.00000 | val = 0.574

    Variable $X^1$ has 2 link(s):
        ($X^0$ -1): pval = 0.00000 | val = 0.514
        ($X^1$ -1): pval = 0.00000 | val = 0.512

    Variable $X^2$ has 2 link(s):
        ($X^1$ -1): pval = 0.00000 | val = 0.534
        ($X^2$ -1): pval = 0.00000 | val = 0.440

In [21]:
link_matrix = pcmci_parcorr.return_significant_parents(pq_matrix=results['p_matrix'],
                        val_matrix=results['val_matrix'], alpha_level=0.01)['link_matrix']
# Plot time series graph
tp.plot_time_series_graph(
    figsize=(6, 3),
    val_matrix=results['val_matrix'],
    link_matrix=link_matrix,
    var_names=var_names,
    link_colorbar_label='MCI',
    )


If we aggregate the data, we also detect a contemporaneous dependency for which no causal direction can be assessed in this framework and we obtain also several lagged spurious links. Essentially, we now have direct causal effects that appear contemporaneous on the aggregated time scale. Also causal inference for time-aggregated data is still an active area of research. Note again that this constitutes a violation of the Markov Condition of the observed process only. So one might call this rather a violation of Causal Sufficiency.


In [22]:
aggregated_data = pp.time_bin_with_mask(data, time_bin_length=4)
pcmci_parcorr = PCMCI(dataframe=pp.DataFrame(aggregated_data[0], var_names=var_names), cond_ind_test=ParCorr(), 
                      verbosity=0)
results = pcmci_parcorr.run_pcmci(tau_min=0, tau_max=2, pc_alpha=0.2)
# pcmci_parcorr._print_significant_links(
#         p_matrix = results['p_matrix'], 
#         val_matrix = results['val_matrix'],
#         alpha_level = 0.01)

In [23]:
link_matrix = pcmci_parcorr.return_significant_parents(pq_matrix=results['p_matrix'],
                        val_matrix=results['val_matrix'], alpha_level=0.01)['link_matrix']
# Plot time series graph
tp.plot_time_series_graph(
    figsize=(6, 3),
    val_matrix=results['val_matrix'],
    link_matrix=link_matrix,
    var_names=var_names,
    link_colorbar_label='MCI',
    )