Mean squared error obtained in KRR in the training points:
$$ \require{cancel} \begin{aligned} \frac{1}{N} \|\hat{y} - y \|^2 &= \frac{1}{N}\| K\alpha - y \|^2 = \\ &=\frac{1}{N} \| K(K + \sigma^2 I)^{-1}y - y \|^2 = \\ &=\frac{1}{N} \| K(K + \sigma^2 I)^{-1}y - y \|^2 = \\ &=\frac{1}{N} \| \cancel{(K + \sigma^2 I)(K + \sigma^2 I)^{-1}y} - (\sigma^2 I)(K + \sigma^2 I)^{-1}y - \cancel{y} \|^2 = \\ &= \frac{\sigma^4}{N} \| (K + \sigma^2 I)^{-1}y \|^2 \\ &= \frac{\sigma^4}{N} y^t(K + \sigma^2 I)^{-2}y \\ &= \frac{\sigma^4}{N} \| \alpha \|^2 \end{aligned} $$The predictive variance of $\mathcal{GP}s$ for the training points $\mathcal{D}=(X,y)$ is: $$ \begin{aligned} Cov(\hat{y}) &= K+\sigma^2 I - \left(K(K+\sigma^2 I)^{-1} K \right) \\ &= K+\sigma^2 I - \left( (K+\sigma^2 I - \sigma^2 I )(K+\sigma^2 I)^{-1} K \right) \\ &= K+\sigma^2 I - \left( \cancel{(K+\sigma^2 I)(K+\sigma^2 I)^{-1}} K - \sigma^2 I (K+\sigma^2 I)^{-1} K \right) \\ &= \cancel{K}+\sigma^2 I \cancel{- K} + \sigma^2 I (K+\sigma^2 I)^{-1} (K+\sigma^2 I - \sigma^2 I) \\ &= \sigma^2 I + \sigma^2 I \cancel{(K+\sigma^2 I)^{-1} (K+\sigma^2 I)} - \sigma^2 I (K+\sigma^2 I)^{-1}\sigma^2 I \\ &= 2\sigma^2 I - \sigma^2 I (K+\sigma^2 I)^{-1}\sigma^2 I \\ &= 2\sigma^2 I - \sigma^4 (K+\sigma^2 I)^{-1} \\ &= 2\sigma^2 \left( I - \frac{\sigma^2}{2} (K+\sigma^2 I)^{-1}\right) \\ \end{aligned} $$
Let's try to reduce the first matrix term: $K(K+\lambda I)^{-1} = \left((K)^{-1}\right)^{-1}(K+\lambda I)^{-1} =\left((K+\lambda I) K^{-1} \right)^{-1} =\left(I + \lambda K^{-1}\right)^{-1}$