Matrix $M$ has $N$ columns, which are the different strategies selected, to evaluate for potential overfit. We then have $t=1,2,...,T$ rows. Thus, you have performances of the $N$ strategies at different times. Keep in mind, that dates ($t=1,2,...,T$) have to be synchronised, i.e. if a performance measurement for strategy labelled $1$ is made on date $x$, then for all other strategies $2,3,4,...,N$ a measurement has to be made on the same date.
Assumption is made that performance measurements on $t=1,2,...,T$ are independent. Thus, you cannot just run $N$ strategies and take the performance measurements on different time-points because those would be autocorrelated. You can get away by defining "windows". For example, from $t=1$ to $t=2$ what was the starting equity and ending equity, and take a Sharpe, or annual-return. This will work. Though, in some special cases, where the strategy would have performed completely different (due to some kind of smoothing for example), this is not a feasible solution. And thus, one would be required to run independent $T$ backtests, for each of the $N$ strategies, to attempt and adhere to the independence assumption.
Note that $S$ should be even. You end up with $M_s$ matrices, that are $\mathbb{R}^{T/S \times N}$
Form all $C_S$ combinations of $M_s$ matrices (taken in groups of $S/2$). So suppose that $M$ is something like the following: $$M= \begin{bmatrix} [ ---- M_1 ---- ] \\ [ ---- M_2 ---- ] \\ [ ---- M_3 ---- ] \\ [ ---- M_4 ---- ] \end{bmatrix}$$
Where $M \in \mathbb{R}^{T \times N}$ and each $M_s \in \mathbb{R}^{T/S \times N}$. Note that:
$$ {S \choose S/2} = \frac{(S-1)! S}{(S-S/2)!(S/2-1)! S/2} = {S-1 \choose S/2-1} \frac{S}{S/2} = ... = \prod_{i=0}^{S/2-1} \frac{S-i}{S/2-i} $$And so according to the equation above, and to the example matrix, $S=4$, so the number of combinations is ${4 \choose 2 }= 6$, and these are:
$$C_4 = \{\{M_1,M_2\},\{M_1,M_3\},\{M_1,M_4\},\{M_2,M_3\},\{M_2,M_4\},\{M_3,M_4\}\}$$Thus: $$|\{C_S\}| = {S \choose S/2}$$
For each combination $c \in C_S$ (e.g.: $\{M_1,M_2\},\{M_1,M_3\},...)$:
a) Form $J$ (training set).
If $c=\{M_1,M_2\}$, then
$$J= \begin{bmatrix} [ ---- M_1 ---- ] \\ [ ---- M_2 ---- ] \end{bmatrix}$$$J \in \mathbb{R}^{T/2 \times N}$
b) Form $\bar{J}$ (testing set).
If $c=\{M_1,M_2\}$, then
$$\bar{J}= \begin{bmatrix} [ ---- M_3 ---- ] \\ [ ---- M_4 ---- ] \end{bmatrix}$$$\bar{J} \in \mathbb{R}^{T/2 \times N}$
Note that the order is preserved! For some performance measures, e.g. Sharpe, order is not important. But for drawdown, it certainly is.
c) Form vector $R^c$ (IS performance)
(probably by stitching together the performance measures at different times? So take the sum of the Sharpe ratios?). Derive $r^c$.
d) Form vector $\bar{R}^c$ (OOS performance).
Derive $\bar{r}^c$.
e) Determine the element $n^*$ such that $r^c_{n^*} \in \Omega^*_{n^*}$.
In other words, if $r_c=(1,4,2,3)$ and thus, the best performing strategy is at index $2$, so $n^*=2$. This is becasue, for example $\Omega_3^*$ when $N=4$ is $\{f \in \Omega | f_3=4 \} = \{ (1,2,4,3),(2,1,4,3), (1,3,4,2), (3,1,4,2), ... \}$, i.e. third strategy is always the best
f) Define the relative rank of $r_{n^*}^c$ (i.e. relative rank of the best performing strategy IS in the OOS).
$\bar{\omega_c} := \bar{r}^c_{n^*}/(N+1) \in (0,1)$
If the strategy optimization procedure is not overfitting, we should observe that $r_{n^*}^c$ systematically outperforms OOS
g) Define/compute the logit $\lambda_c = \ln{\frac{\bar{\omega_c}}{1-\bar{\omega_c}}}$.
High logit values imply a consistency between IS and OOS performances, which indicates a low level of overfitting
To summarize:
1) Form training set $J$
2) Form testing set $\bar{J}$
3) Form performances vector IS $R^c$, derive $r^c$
4) Form performances vector OOS $\bar{R}^c$, derive $\bar{r}^c$
5) Determine element $n^*$
6) Compute relative rank: $\bar{\omega}^c$
7) Compute logit: $\lambda_c$
Compute the distribution of ranks OOS by collecting all the $\lambda_c$, for $c \in C_S$
$$f(\lambda) = \sum_{c \in C_s} \frac{\chi_{\{\lambda\}} \left( \lambda_c\right)}{|\{C_S\}|}$$Also note
$$f_{-\infty}^{\infty} f(\lambda) d\lambda = 1$$The probability that the model configuration selected as optimal IS will underperform the median of the N model configurations OOS.
$$\text{PBO} = \sum_{n=1}^N P\left[ \bar{r_n} < N/2 | r \in \Omega^*_{n} \right] P\left[ r \in \Omega^*_n \right] = \int_{-\infty}^0 f(\lambda) d\lambda$$Where,
$$f(\lambda) = \sum_{c \in C_S} \frac{\chi_{\{ \lambda\}} \left( \lambda_c \right)}{|\{ C_S \}|}$$i.e. the frequency (PDF) of logits. They are discrete, thus the statement above makes sense ($\chi{\{\cdot\} \left( \cdot \right)}$ is an indicator function)
This determines to what extent greater performance IS leads to lower performance OOS, an occurence associated with the memory effects discussed in Bailey et al. [1]
Peform a regression:
$$\bar{R_{n^*}}^c = \alpha + \beta R_{n^*}^c + \epsilon^c$$The probability that the model selecteed as optimal IS will deliver a loss OOS.
Compute:
$$P \left[ \bar{R_{n^*}}^c < 0 \right]$$This analysis determines whether the procedure used to select a strategy IS is preferable to randomly choosing one model configurations among the N altenratives.
First order stochastic dominance if:
$$P \left[ \bar{R_{n^*} \geq x} \right] \geq P \left[\text{Mean}(\bar{R}) \geq x \right] \ \ \forall x$$and $$P \left[ \bar{R_{n^*} \geq x} \right] > P \left[\text{Mean}(\bar{R}) \geq x \right] \ \ \text{for some} \ \ x$$
A less demanding criterion is second-order stochastic dominance. This requires that:
$$\text{SD}2[x] = \int_{-\infty}^x (P\left[ \text{Mean}(\bar{R}) \leq x \right] - P\left[ \bar{R_{n^*}} \leq x \right]) dx \geq 0 \ \ \forall x$$and that
$$\text{SD}2[x] > 0 \ \ \text{at some} \ \ x$$
In [148]:
%matplotlib inline
from numpy import inf
import matplotlib.pyplot as plt
import seaborn as sns # not required
sns.set_style('darkgrid') # not required
import numpy as np
import itertools
import pandas as pd
import scipy.stats
In [84]:
T = 16
S = 4
N = 30
M = np.random.rand(T, N)
M
Out[84]:
In [85]:
subMatrices = []
for i in range(int(T/S)):
subMatrices.append(M[S*i:(S*i + int(S))])
subMatrices
Out[85]:
In [86]:
combinations = itertools.combinations(subMatrices, int(S/2))
In [87]:
# Step 4 - 1) form J
J = np.array(next(combinations))
J
Out[87]:
In [88]:
# Step 4 - 2) form J_bar
J_bar = [x for x in subMatrices if x not in J] # list comprehension to preserve the order!
J_bar
Out[88]:
In [89]:
# Step 4 - 3) form R^c
R = np.sum(np.sum(J,axis=1),axis=0) # !!! be careful here !!!
R
Out[89]:
In [90]:
# Step 4 - 4) form R_bar^c
R_bar = np.sum(np.sum(J_bar,axis=1),axis=0)
R_bar
Out[90]:
In [91]:
# Step 4 - 5) Determine element n*
n_star = np.where(R==np.max(R)) # !!! Test for the unlikely cases where there are 2 same max(R)!!!
n_star[0][0] # danger becasue of the case where there are 2 or more same max(R) !!
Out[91]:
In [92]:
# Step 4 - 6) Compute relative rank
intermediate = R_bar.argsort()
ranks = intermediate.argsort()
omega_bar = ranks[n_star[0][0]]/(N+1)
omega_bar #Notice that the strategy is the best IS and OOS, we still get 0.5. This is because
# N is very small.
Out[92]:
In [93]:
# Step 4 - 7) Compute logit
logit = np.log(omega_bar/(1-omega_bar)) # assert that omega_bar is in (0,1)
logit
Out[93]:
In [94]:
# Visualization of the ln(x/(1-x)) function:
x = np.arange(0.01,1,0.05)
y = np.log(x/(1-x))
f = plt.figure(figsize=(10,5))
ax = f.add_subplot(111)
ax.plot(x,y)
Out[94]:
In [95]:
M = np.random.rand(256,40)
S = 16
In [180]:
def produceLogits(M, S):
T = M.shape[0]
subMatrices = []
all_R_bar = []
performance_degradation_R = []
performance_degradation_R_bar = []
for i in range(int(T/S)):
subMatrices.append(M[S*i:(S*i + int(S))])
combinations = itertools.combinations(subMatrices, int(S/2))
logits = []
for J in combinations:
J = np.array(J)
J_bar = [x for x in subMatrices if x not in J]
R = np.sum(np.sum(J,axis=1),axis=0)
R_bar = np.sum(np.sum(J_bar,axis=1),axis=0)
for r in R_bar:
all_R_bar.append(r) # required for stochastic dominance
n_star = np.where(R==np.max(R))
ranks = R_bar.argsort().argsort()
performance_degradation_R.append(np.max(R)) # required for Performance Degradation
performance_degradation_R_bar.append(R_bar[n_star[0][0]]) # required for Performance Degradation
N = subMatrices[0].shape[1]
omega_bar = ranks[n_star[0][0]]/(N+1)
logit = np.log(omega_bar/(1-omega_bar)) #division by zero!!! FIX!!!
logits.append(logit)
return logits, [performance_degradation_R,performance_degradation_R_bar, all_R_bar]
In [181]:
logits, [R, R_bar, all_R_bar] = produceLogits(M, S)
In [182]:
logits = [x for x in logits if (x != -inf)]
logits = pd.DataFrame(data=logits,columns=['logits'])
In [183]:
PBO = len(logits.loc[logits.logits < 0])/len(logits.logits)
In [169]:
f = plt.figure(figsize=(20,10))
ax = f.add_subplot(111)
p = sns.regplot(np.array(R),np.array(R_bar),ax=ax)
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x=p.get_lines()[0].get_xdata(),y=p.get_lines()[0].get_ydata())
ax.set_title('Performance Degradation', fontsize=20)
ax.set_xlabel('SR IS', fontsize=20)
ax.set_ylabel('SR OOS', fontsize=20)
formatter = '{0:.2f}'
ax.text(np.median(np.array(R)), np.max(np.array(R_bar)),r'[SROOS]='+formatter.format(intercept)+'+'+formatter.format(slope)+'*[SRIS]+err',fontsize=25)
ax.text(np.median(np.array(R)), np.min(np.array(R_bar)),r'$P[\overline{R_{n^*}}^c < 0]\equiv$P[SROOS<0]='+formatter.format(len(np.where(np.array(R_bar)<0)[0])/len(np.array(R_bar))),fontsize=25)
Out[169]:
In [174]:
f = plt.figure(figsize=(20,10))
ax = f.add_subplot(111)
minLogit = np.min(logits.logits)
sns.distplot(logits.logits, hist_kws=dict(cumulative=True),kde_kws=dict(cumulative=True),ax=ax)
ax.text(minLogit, 0.9, '[PBO]:$\sum_{n=1}^N P[\overline{r_n} < N/2 | r \in \Omega^{*}_n] P[r \in \Omega^{*}_n]=\int_{-\infty}^{0}f(\lambda) d\lambda$=' + '{:.4f}'.format(PBO),fontsize=20)
ax.text(minLogit,0.8,'[SROOS]:'+'$\overline{R_{n^*}}^c = \\alpha + \\beta R^c_{n^*}+\\epsilon^c$;$\overline{R_{n^*}}^c$='+formatter.format(intercept)+'+'+formatter.format(slope)+'*$R^c_{n^*}$+err',fontsize=20)
ax.text(minLogit,0.7,'[Prob. of loss]:'+'$P[\overline{R_{n^*}}^c < 0]$='+formatter.format(len(np.where(np.array(R_bar)<0)[0])/len(np.array(R_bar))),fontsize=20)
ax.set_title('PBO and Summary',fontsize=25)
Out[174]:
In [199]:
f = plt.figure(figsize=(15,10))
ax = f.add_subplot(111)
best_R = np.sort(np.array(R_bar))
cdf_1 = np.arange(len(best_R))/float(len(best_R))
ax.plot(best_R, cdf_1,label=r'$\bar{R_{n^*}}$')
all_R = np.sort(np.array(all_R_bar))
cdf_2 = np.arange(len(all_R_bar))/float(len(all_R_bar))
ax.plot(all_R, cdf_2,label=r'$\bar{R}$')
ax.set_title('Stochastic Dominance', fontsize=25)
plt.legend(fontsize=25)
Out[199]:
In [ ]: