# Basic statistical analysis of time series

In this example, basic time series statistical analysis is demonstrated.

``````

In :

import numpy as np
from scipy import stats

import matplotlib.pyplot as plt
%matplotlib inline

import evapy

``````

## Generating a Gaussian stochastic signal

Lets start by generating a Gaussian time series signal. The duration is set to 3 hours and sampling rate is 10Hz.

``````

In :

t = np.arange(0., 3*3600., 0.1)

``````

The signal will be generated using a rectangular power spectrum between 5 and 15 seconds of period. It is important to use sufficiently many discretizations to obtain a non-repeating approximation.

``````

In :

start_period = 15.
stop_period = 5.

discrete_size = 1000

``````
``````

In :

amp_norm = stats.norm.rvs(size=discrete_size)
phase = stats.uniform.rvs(size=discrete_size)*2.*np.pi

signal_mean = 0.

``````
``````

In :

freq = np.linspace(2.*np.pi*1./start_period, 2.*np.pi*1./stop_period, num=discrete_size)

``````

Generating the signal by super-positioning a large number of cosine functions with stochastic amplitudes and phases.

``````

In :

signal = np.zeros(len(t)) + signal_mean

for i, freq_i in enumerate(freq):
signal = signal + amp_norm[i]*np.cos(freq_i*t + phase[i])

``````
``````

In :

plt.figure('Snapshot of signal', figsize=(16,9))
plt.title('Snapshot of the signal')
plt.plot(t[1500:2000], signal[1500:2000])
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')
plt.grid()
plt.show()

``````
``````

``````

### Statistical analysis of the signal

Lets see if the signal resambles a Guassian signal with zero mean.

``````

In :

descr_stats = stats.describe(signal)

print('mean: {}'.format(descr_stats.mean))
print('std: {}'.format(np.sqrt(descr_stats.variance)))
print('skewness: {}'.format(descr_stats.skewness))

``````
``````

mean: -0.0003558229724597345
std: 22.36264071482929
skewness: 0.001286657211675139

``````

Estimate the normal distribution paramters from data and plot the normal pdf with the histogram for data.

``````

In :

params = stats.norm.fit(signal)
signal_dist = stats.norm(*params)

``````
``````

In :

x = np.linspace(-100, 100, num=200)

``````
``````

In :

plt.figure('Signal histogram', figsize=(16,9))
plt.title('Signal histogram and Normal PDF')
plt.plot(x, signal_dist.pdf(x), color='r')
plt.hist(signal, bins=50, density=True)
plt.show()

``````
``````

``````

It is clear that the signal is indeed Gaussian!

### Extreme value analysis

#### Finding the peaks

`evapy` package provides robust and fast methods to identify upcrossings and peaks. The declustered peaks refer to the largest peak per upcrossing. hus, for a narrow-banded signal, all peaks are declustered. In this example, declustering will occur for mean-upcrossing.

``````

In :

peak_index = evapy.evstats.argrelmax(signal)
peak = signal[peak_index]

peak_dc_index = evapy.evstats.argrelmax_decluster(signal, x_up=np.mean(signal))
peak_dc = signal[peak_dc_index]

``````
``````

In :

plt.figure('Finding the peaks', figsize=(16,9))
plt.title('Finding the peaks')
plt.plot(t, signal, 'k-', label='signal')
plt.plot(t[peak_dc_index], peak_dc, 'r^', markersize=12, label='peak declustered')
plt.plot(t[peak_index], peak, 'bo', label='peak')
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')
plt.xlim(100,200)
plt.grid()
plt.legend()
plt.show()

``````
``````

``````

#### Counting upcrossings

Lets find the number of upcrossings for different levels using the `evapy` package.

``````

In :

uc_0 = evapy.evstats.argupcross(signal, x_up=np.mean(signal))

print('number of upcrossings above the mean: {}'.format(len(uc_0)))

``````
``````

number of upcrossings above the mean: 1514

``````

Since the signal is 10800 seconds long, this mean that the Tz ((mean or) zero crossing period) of the signal is

``````

In :

t[-1]/len(uc_0)

``````
``````

Out:

7.133355350066052

``````

Now, lets find the upcrossing rate at mean+std level.

``````

In :

uc_1 = evapy.evstats.argupcross(signal, x_up=np.mean(signal)+np.std(signal))

``````
``````

In :

print('number of upcrossings above the mean+std: {}'.format(len(uc_1)))

``````
``````

number of upcrossings above the mean+std: 933

``````

The upcrossing rate is then:

``````

In :

len(uc_1)/t[-1]

``````
``````

Out:

0.08638968879341474

``````

#### Statistical analysis of peaks

Now that we have established the peaks, lets investigate their distribution. Using different distribution functions the `evapy` package offers.

``````

In :

eak_index = evapy.evstats.argrelmax(signal)
peak = signal[peak_index]

peak_dc_index = evapy.evstats.argrelmax_decluster(signal, x_up=np.mean(signal))
peak_dc = signal[peak_dc_index]

r = np.linspace(-10, 100, num=100)

``````

Using Rayleigh distribution:

``````

In :

params_ray = evapy.distributions.rayleigh.fit(peak_dc)
peak_dc_ray = evapy.distributions.rayleigh(*params_ray)

``````
``````

In :

print('location: {}'.format(params_ray))
print('scale: {}'.format(params_ray))

``````
``````

location: -1.6435433313514798
scale: 23.33482891334915

``````
``````

In :

plt.figure('Peaks histogram and Rayleigh distribution', figsize=(16,9))
plt.plot(r, peak_dc_ray.pdf(r), color='r')
plt.hist(peak_dc, bins=25, density=True)
plt.show()

``````
``````

``````

Using the Weibull distribution:

Note: Sometimes, the estimation alogorithm fails if the starting values for scale parameter is too far off. This is a known issue and we are working on it. In the mean time, you may provice a reasonable starting value for the scale parameter.

``````

In :

params_wb = evapy.distributions.weibull.fit(peak_dc, scale=20.)
peak_dc_wb = evapy.distributions.weibull(*params_wb)

``````
``````

In :

print('shape: {}'.format(params_wb))
print('location: {}'.format(params_wb))
print('scale: {}'.format(params_wb))

``````
``````

shape: 2.064018486408185
location: -2.102492679918142
scale: 33.617562038180125

``````

The shape parameter should be close to 2. Sometimes the estimation alogorithm fails and divereges for 3 parameter Weibull using the MLE.

Lets see how the histogram looks.

``````

In :

plt.figure('Peaks histogram and Weibull distribution', figsize=(16,9))
plt.plot(r, peak_dc_wb.pdf(r), color='r')
plt.hist(peak_dc, bins=25, density=True)
plt.show()

``````
``````

``````

Often, the to provide a good starting value for the scale parameter. Or to lock the location parameter based on prior knowledge. In our case, the signal mean can be a reasonable choice.

``````

In :

params_wb = evapy.distributions.weibull.fit(peak_dc, floc=np.mean(signal))
peak_dc_wb = evapy.distributions.weibull(*params_wb)

``````
``````

In :

print('shape: {}'.format(params_wb))
print('location: {}'.format(params_wb))
print('scale: {}'.format(params_wb))

``````
``````

shape: 1.8455102728096229
location: -0.0003558229724597345
scale: 31.01194237617016

``````
``````

In :

plt.figure('Peaks histogram and Weibull distribution', figsize=(16,9))
plt.plot(r, peak_dc_wb.pdf(r), color='r')
plt.hist(peak_dc, bins=25, density=True)
plt.show()

``````
``````

``````

Using the generalized exponential tail distribution:

Note that the generalized exponential tail distribution is particularly suitable for truncated data. Lets truncate our peak data at 35.

``````

In :

peak_dc_trunc = peak_dc[peak_dc>=35.]

``````

We have now discareded most of our data:

``````

In :

print('original data size: {}'.format(len(peak_dc)))
print('truncated data size: {}'.format(len(peak_dc_trunc)))

``````
``````

original data size: 1514
truncated data size: 468

``````

WARNING: At this point, the estimation alogrithm for the generalized exponential tail distribution is very unstable. It is recommended to lock the location parameter and provide a good starting value for the scale parameter. We are working on it!

``````

In :

params_get = evapy.distributions.genexptail.fit(peak_dc_trunc, scale=20., floc=np.mean(signal))
peak_dc_get = evapy.distributions.genexptail(*params_get)

``````
``````

In :

print('shape: {}'.format(params_get))
print('q: {}'.format(params_get))
print('location: {}'.format(params_get))
print('scale: {}'.format(params_get))

``````
``````

shape: 1.672369041261915
q: 6.055141376708667
location: -0.0003558229724597345
scale: 25.014685329726746

``````
``````

In :

plt.figure('Peaks histogram and generalized exponential tail distribution', figsize=(16,9))
plt.plot(r, peak_dc_get.pdf(r), color='r')
plt.hist(peak_dc_trunc, bins=15, density=True)
plt.show()

``````
``````

``````

#### Estimating the largest maxima

We can use the the relationship between the peak distributions and the maxima distributions to establish an estimate for the largest maxima. First, lets see whats the largest observed value in our signal.

``````

In :

max_obs = np.max(signal)
print('largest maxima: {}'.format(max_obs))

``````
``````

largest maxima: 91.11669792144568

``````

Using Rayleig:

``````

In :

rloc, rscale = evapy.distributions.rayleigh.fit(peak_dc, floc=0.)
rN = len(peak_dc)

``````
``````

In :

maxima_ray_dist = evapy.distributions.acer_o1(2., rN, loc=rloc, scale=np.sqrt(2.)*rscale)

``````
``````

In :

median_maxima = maxima_ray_dist.ppf(0.5)
print('median maxima: {}'.format(median_maxima))

``````
``````

median maxima: 87.47910129889922

``````

Using Weibull:

``````

In :

wshape, wloc, wscale = evapy.distributions.weibull.fit(peak_dc, floc=0.)
wN = len(peak_dc)

``````
``````

In :

maxima_wb_dist = evapy.distributions.acer_o1(wshape, wN, loc=wloc, scale=wscale)

``````
``````

In :

median_maxima = maxima_wb_dist.ppf(0.5)
print('median maxima: {}'.format(median_maxima))

``````
``````

median maxima: 93.65923694847491

``````

Using generalized exponential tail:

``````

In :

gshape, qn, gloc, gscale = evapy.distributions.genexptail.fit(peak_dc_trunc, floc=0., scale=20.)
gN = len(peak_dc_trunc)

``````
``````

In :

maxima_get_dist = evapy.distributions.acer_o1(gshape, gN*qn, loc=gloc, scale=gscale)

``````
``````

In :

median_maxima = maxima_get_dist.ppf(0.5)
print('median maxima: {}'.format(median_maxima))

``````
``````

median maxima: 88.76801208798118

``````
``````

In [ ]:

``````