Notebook that explores time-series and techniques to analyze them.

Resources:

"A time-series is sequence of measurements from a system that varies in time"

A time-series is generally decomposed in three major components:

**Trend**: persistent change along time.**Seasonality**: regular periodic variation. There can be multiple seasonalities, and each can span different time-frames (by day, week, month, year, etc.)**Noise**: random variation

```
In [ ]:
```%matplotlib notebook
import numpy as np
import seaborn as sns
import pandas as pd
sns.set_context("paper")

Moving Averages (also rolling/running average or moving/running mean) is a technique that helps to extract the trend from a series. It reduces noise and decreases impact of outliers. Is consists in dividing the series in overlapping windows of fixed size $N$, and for each considering the average value. It follows that the first $N-1$ values will be undefined, given that they don't have enough predecessors to compute the average.

**Exponentially-Weighted Moving Average (EWMA)** is an alternative that gives more importance to recent values.

```
In [ ]:
```# rolling mean basic example
series = np.arange(10)
pd.Series(series).rolling(3).mean()

```
In [ ]:
```# ewma basic example
series = np.arange(10)
pd.Series(series).ewm(3).mean()

```
In [ ]:
```# ewm on partial long series of 0s
series = [1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0]
pd.Series(series).ewm(span=2).mean()

```
In [ ]:
```pd.Series(series).ewm(span=2).mean().plot()

Basic Pandas methods:

- pad/ffill: fill values forward
- bfill/backfill: fill values backward

```
In [ ]:
```# Random arrays to play with
a = np.arange(20)
b = a*a
b_empty = np.array(a*a).astype('float')
# Add missing values and get a Pandas Series
b_empty[[0, 5, 6, 15]] = np.nan
c = pd.Series(b_empty)

```
In [ ]:
```# Visualize how filling method works
fig, axes = sns.plt.subplots(2)
sns.pointplot(np.arange(20), c, ax=axes[0])
sns.pointplot(np.arange(20), c.fillna(method='bfill'), ax=axes[1])
sns.plt.show()

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.interpolate.html

'linear' ignores the index, ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval

```
In [ ]:
```df = pd.read_csv("time_series.csv")
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index("datetime", inplace=True)
df.fillna(method='pad', axis=0, inplace=True)
df.head()

```
In [ ]:
```#Determing rolling statistics
rolmean = pd.rolling_mean(new_df, window=12)
rolstd = pd.rolling_std(new_df, window=12)
#Plot rolling statistics:
sns.plt.plot(new_df, color='blue',label='Original')
sns.plt.plot(rolmean, color='red', label='Rolling Mean')
sns.plt.plot(rolstd, color='black', label = 'Rolling Std')
sns.plt.show()

```
In [ ]:
```new_df = df.copy()
new_df['val'] = new_df['val'] - pd.ewma(new_df, halflife=12)['val']
new_df['val'] = new_df['val'] - pd.ewma(new_df, halflife=12)['val']
new_df.plot()
sns.plt.show()

```
In [ ]:
```df = pd.read_csv("time_series.csv")
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index("datetime", inplace=True)
df.head()

```
In [ ]:
```df.plot()
sns.plt.show()

```
In [ ]:
```df.fillna(method='pad', axis=0).plot()
sns.plt.show()

```
In [ ]:
```null_indexes = [i for i, isnull in enumerate(pd.isnull(df['val'].values)) if isnull]

```
In [ ]:
```# missing_values_correct
y = [32.69,32.15,32.61,29.3,28.96,28.78,31.05,29.58,29.5,30.9,31.26,31.48,29.74,29.31,29.72,28.88,30.2,27.3,26.7,27.52]

```
In [ ]:
```filled = df['val'].interpolate(method='time').values
predict = filled[null_indexes]
len(predict)==len(y)

```
In [ ]:
```d = sum([abs((y[i]-predict[i])/y[i]) for i in range(len(y))])

```
In [ ]:
``````
d
```

There exists different methods to analyze correlation for time-series. If we compare two different time series we are talking about **cross-correlation**, while in **auto-correlation** a time-series is compared with itself (can detect seasonality). Both of the previous mentioned categories can use normalization (useful for example when series characterized by different scales, also good for values = zero).

Correlation between two time-series $y$ and $x$ is defined as

$$ corr(x, y) = \sum_{n=0}^{n-1} x[n]*y[n] $$while normalized correlation is defined as

$$norm\_corr(x,y)=\dfrac{\sum_{n=0}^{n-1} x[n]*y[n]}{\sqrt{\sum_{n=0}^{n-1} x[n]^2 * \sum_{n=0}^{n-1} y[n]^2}}$$For auto-correlation we shift the time-series by an interval called **lag**, and then compare the shifted version with the original one to understand the strength of the correlation (process sometime also called serial-correlation, especially when lag=1).
The idea is that a series values are not random independent event, but should have some level of dependency with preceding values. This dependency is the pattern we are trying to discover.

Suggestions: check correlation after removing the trend. Understand the seasonality more appropriate for your case.

```
In [ ]:
```a = np.array([1,2,-2,4,2,3,1,0])
b = np.array([2,3,-2,3,2,4,1,-1])
c = np.array([-2,0,4,0,1,1,0,-2])

```
In [ ]:
```print("a and b correlate value = {}".format(np.correlate(a, b)[0]))
print("a and c correlate value = {}".format(np.correlate(a, c)[0]))

```
In [ ]:
```def normalized_cross_correlation(a, v):
# cross-correlation is simply the dot product of our arrays
cross_cor = np.dot(a, v)
norm_term = np.sqrt(np.sum(a**2) * np.sum(v**2))
return cross_cor/norm_term

```
In [ ]:
```normalized_cross_correlation(a, c)

```
In [ ]:
```print("a and a/2 correlate value = {}".format(np.correlate(a, a/2)[0]))
print("a and a/2 normalized correlate value = {}".format(normalized_cross_correlation(a, a/2)))

```
In [ ]:
```