Recall that we fit on training set and tested (validated) on test set.
To "stabilize" the test error we can draw a new test set and then average the resulting test errors, this is called cross-validation.
K-fold cross-validation
Leave-one-out CV: Set $K = n$, each sample is its own holdout.
Def Convex optimization is a problem where you must find an argmin of a convex function with convex function constraints. $$ \min_{x \in \mathbb R^p} f(x) $$ subject to $$ g_i(x) \le 0, \quad i=1,\ldots,m \tag{Inequality cons.} $$ $$ h_j(x) = 0, \quad j = 1,\ldots,r \tag{Equality cons.} $$ where $f,g_i$ are convex and $h_j$ is affine (i.e. $h_j(x) = A_j x + b_j$).
In [133]:
T = np.linspace(0.01,2*np.pi,100)
plt.plot(T,-np.log(T),label='Conv')
plt.plot(T,3*np.sin(T),label='Nonconv')
plt.legend()
Out[133]:
1st Order Condition If f is differentiable then it is convex if $$ f(x) \ge f(x_0) + \nabla f(x_0)^\top (x - x_0), \forall x,x_0 $$ 2nd Order Condition If f is twice differentiable then it is convex if $$ x^\top (\nabla^2 f(x)) x \ge 0, \forall x \tag{positive semi-definite} $$
Operations that preserve convexity: Addition, partial minimization (minimize some dimensions), composition of convex and non-decreasing convex
Most compositions do not preserve convexity (neural nets!).
For differentiable f gradient descent iterates $$ x \gets x - \eta \nabla f(x) $$ with possibly changing learning rate $\eta$.
Gradient descent has a fixed point $x_0$ if and only if $\nabla f(x_0) = 0$.
Recall 1st Order Condition. If f is differentiable then it is convex if $$ f(x) \ge f(x_0) + \nabla f(x_0)^\top (x - x_0), \forall x,x_0 $$ and when $\nabla f(x_0) = 0$ then $$ f(x) \ge f(x_0), \forall x $$ so any fixed point of gradient descent is a global min (for convex, differentiable f)
Suppose $f$ is convex but not differentiable, but instead of the gradient we have some other $g(x)$ that satisfies the 1st order condition, $$ f(x) \ge f(x_0) + g(x_0)^\top (x - x_0), \forall x,x_0. $$ We will call such a $g$ a subgradient.
Then we can define subgradient descent to be $$ x \gets x - \eta g(x) $$ with possibly changing learning rate $\eta$. Then we have that $x_0$ is a fixed point iff $g(x_0) = 0$ which is a minimizer.
Example
Minimize $|x|$ (we know there is a minimizer at $x=0$).
Attempt 1. $$ g(x) = \left\{ \begin{array}{cc} 1,&x \ge 0\\ -1,&x \le 0\end{array} \right. $$ Can verify that this satisfies the 1st order condition (is a subgradient), $$ |x| \ge |x_0| + g(x_0)(x - x_0) $$ but $0$ is not a fixed point of subgradient descent, $0 - \eta g(0) = -\eta \ne 0$.
Attempt 2. $$ g(x) = \left\{ \begin{array}{cc} 1,&x > 0\\0,& x = 0\\-1,&x \le 0\end{array} \right. $$
Subgradients are not unique, and not all subgradients are equally useful.
Ex. $f(x) = |x|$ then
Ex. Soft-thresholding $y,\beta \in \mathbb R^n$,
$$f(\beta) = \frac 12 \sum_{i=1}^n (y_i - \beta_i)^2 + \lambda \sum_{i=1}^n |\beta_i|$$
The problem is separable
$$
\min_\beta f(\beta) = \frac 12 \sum_{i=1}^n \min_{\beta_i} f_i(\beta_i)
$$
where $f_i(\beta_i) = \frac 12 (y_i - \beta_i)^2 + \lambda |\beta_i|$
Focus on minimizing $f_i(b) = \frac 12 (y_i - b)^2 + \lambda |b|$, which has subdifferential,
When is $0 \in \partial f_i$?
Hence, $f_i(b) = \frac 12 (y_i - b)^2 + \lambda |b|$ is minimized at
$$
b = \left\{ \begin{array}{ll} y_i - \lambda,& y_i > \lambda \\ y_i + \lambda,& y_i < -\lambda\\0,& |y_i| \le \lambda \end{array}\right.
$$
is called soft thresholding.
Soft thresholding is commonly used for orthonormal bases.
Want to minimize $$ \frac 12 \sum_{i=1}^T (y - W \beta)_i^2 + \lambda \sum_{i=1}^T |\beta_i|. $$
Then consider the least squares objective, $$ \| y - W \beta\|^2 = (y - W \beta)^\top W W^\top(y - W \beta) = \| W^\top y - W^\top W \beta\|^2 = \| W^\top y - \beta\|^2$$ Then we want to minimize $$ \frac 12 \sum_{i=1}^T (W^\top y - \beta_i)^2 + \lambda \sum_{i=1}^T |\beta_i|. $$ which is solved by soft thresholding $(W^\top y)_i$ at $\lambda$. Our denoised version of $y$ is then $\hat y = W \beta$.
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [161]:
## Explore Turkish stock exchange dataset
tse = pd.read_excel('data_akbilgic.xlsx',skiprows=1)
tse = tse.rename(columns={'ISE':'TLISE','ISE.1':'USDISE'})
tse.head()
Out[161]:
In [162]:
tse.describe()
Out[162]:
In [163]:
tse.info()
In [164]:
## Plot turkish lyra
tse = tse.set_index('date')
tse['TLISE'].plot()
Out[164]:
In [23]:
## Autocorrelation plot
pd.plotting.autocorrelation_plot(tse['TLISE'])
Out[23]:
In [24]:
## USD
tse['USDISE'].plot()
Out[24]:
In [25]:
pd.plotting.autocorrelation_plot(tse['USDISE'])
Out[25]:
In [26]:
## NIKKEI index
tse['NIKKEI'].plot()
Out[26]:
In [27]:
pd.plotting.autocorrelation_plot(tse['NIKKEI'])
Out[27]:
In [35]:
## Volatility seems most interesting
## will construct local measure of volatility
## remove rolling window estimate (local centering)
## square the residuals
tse_trem = tse - tse.rolling("7D").mean()
tse_vol = tse_trem**2.
In [36]:
tse_trem['NIKKEI'].plot()
Out[36]:
In [39]:
tse_vol['NIKKEI'].plot()
Out[39]:
In [136]:
T,p = tse.shape
In [57]:
def const_wave(T,a,b):
wave = np.zeros(T)
s1 = (b-a) // 2
s2 = (b-a) - s1
norm_C = (s1*s2 / (s1+s2))**0.5
wave[a:a+s1] = norm_C / s1
wave[a+s1:b] = -norm_C / s2
return wave
In [142]:
def _const_wave_basis(T,a,b):
if b-a < 2:
return []
wave_basis = []
wave_basis.append(const_wave(T,a,b))
mid_pt = a + (b-a)//2
wave_basis += _const_wave_basis(T,a,mid_pt)
wave_basis += _const_wave_basis(T,mid_pt,b)
return wave_basis
In [143]:
def const_wave_basis(T,a,b):
father = np.ones(T) / T**0.5
return [father] + _const_wave_basis(T,a,b)
In [165]:
# Construct discrete Haar wavelet basis
wave_basis = const_wave_basis(T,0,T)
W = np.array(wave_basis).T
W.shape
Out[165]:
In [151]:
plt.plot(W[:,1])
Out[151]:
In [152]:
plt.plot(W[:,2])
Out[152]:
In [153]:
plt.plot(W[:,3])
Out[153]:
In [158]:
plt.plot(W[:,T//2+1])
Out[158]:
In [160]:
## Verify that it is orthonormal
np.abs(W.T @ W - np.eye(W.shape[1])).sum(), np.abs(W @ W.T - np.eye(W.shape[1])).sum()
Out[160]:
In [144]:
def soft(y,lamb):
pos_part = (y - lamb) * (y > lamb)
neg_part = (y + lamb) * (y < -lamb)
return pos_part + neg_part
In [114]:
## Make wavelet transformation and soft threshold
tse_wave = W.T @ tse_vol.values
lamb = .001
tse_soft = soft(tse_wave,lamb)
tse_rec = W @ tse_soft
tse_den = tse_vol.copy()
tse_den.iloc[:,:] = tse_rec
In [125]:
_ = tse_vol.plot(subplots=True,figsize=(10,10))
In [124]:
_ = tse_den.plot(subplots=True,figsize=(10,10))