In this example, basic time series statistical analysis is demonstrated.
In [1]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline
import evapy
Lets start by generating a Gaussian time series signal. The duration is set to 3 hours and sampling rate is 10Hz.
In [2]:
t = np.arange(0., 3*3600., 0.1)
The signal will be generated using a rectangular power spectrum between 5 and 15 seconds of period. It is important to use sufficiently many discretizations to obtain a non-repeating approximation.
In [3]:
start_period = 15.
stop_period = 5.
discrete_size = 1000
In [4]:
amp_norm = stats.norm.rvs(size=discrete_size)
phase = stats.uniform.rvs(size=discrete_size)*2.*np.pi
signal_mean = 0.
In [5]:
freq = np.linspace(2.*np.pi*1./start_period, 2.*np.pi*1./stop_period, num=discrete_size)
Generating the signal by super-positioning a large number of cosine functions with stochastic amplitudes and phases.
In [6]:
signal = np.zeros(len(t)) + signal_mean
for i, freq_i in enumerate(freq):
signal = signal + amp_norm[i]*np.cos(freq_i*t + phase[i])
In [7]:
plt.figure('Snapshot of signal', figsize=(16,9))
plt.title('Snapshot of the signal')
plt.plot(t[1500:2000], signal[1500:2000])
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')
plt.grid()
plt.show()
Lets see if the signal resambles a Guassian signal with zero mean.
In [10]:
descr_stats = stats.describe(signal)
print('mean: {}'.format(descr_stats.mean))
print('std: {}'.format(np.sqrt(descr_stats.variance)))
print('skewness: {}'.format(descr_stats.skewness))
Estimate the normal distribution paramters from data and plot the normal pdf with the histogram for data.
In [11]:
params = stats.norm.fit(signal)
signal_dist = stats.norm(*params)
In [12]:
x = np.linspace(-100, 100, num=200)
In [14]:
plt.figure('Signal histogram', figsize=(16,9))
plt.title('Signal histogram and Normal PDF')
plt.plot(x, signal_dist.pdf(x), color='r')
plt.hist(signal, bins=50, density=True)
plt.show()
It is clear that the signal is indeed Gaussian!
evapy
package provides robust and fast methods to identify upcrossings and peaks. The declustered peaks refer to the largest peak per upcrossing. hus, for a narrow-banded signal, all peaks are declustered. In this example, declustering will occur for mean-upcrossing.
In [15]:
peak_index = evapy.evstats.argrelmax(signal)
peak = signal[peak_index]
peak_dc_index = evapy.evstats.argrelmax_decluster(signal, x_up=np.mean(signal))
peak_dc = signal[peak_dc_index]
In [16]:
plt.figure('Finding the peaks', figsize=(16,9))
plt.title('Finding the peaks')
plt.plot(t, signal, 'k-', label='signal')
plt.plot(t[peak_dc_index], peak_dc, 'r^', markersize=12, label='peak declustered')
plt.plot(t[peak_index], peak, 'bo', label='peak')
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')
plt.xlim(100,200)
plt.grid()
plt.legend()
plt.show()
Lets find the number of upcrossings for different levels using the evapy
package.
In [17]:
uc_0 = evapy.evstats.argupcross(signal, x_up=np.mean(signal))
print('number of upcrossings above the mean: {}'.format(len(uc_0)))
Since the signal is 10800 seconds long, this mean that the Tz ((mean or) zero crossing period) of the signal is
In [18]:
t[-1]/len(uc_0)
Out[18]:
Now, lets find the upcrossing rate at mean+std level.
In [19]:
uc_1 = evapy.evstats.argupcross(signal, x_up=np.mean(signal)+np.std(signal))
In [20]:
print('number of upcrossings above the mean+std: {}'.format(len(uc_1)))
The upcrossing rate is then:
In [21]:
len(uc_1)/t[-1]
Out[21]:
Now that we have established the peaks, lets investigate their distribution. Using different distribution functions the evapy
package offers.
In [22]:
eak_index = evapy.evstats.argrelmax(signal)
peak = signal[peak_index]
peak_dc_index = evapy.evstats.argrelmax_decluster(signal, x_up=np.mean(signal))
peak_dc = signal[peak_dc_index]
r = np.linspace(-10, 100, num=100)
Using Rayleigh distribution:
In [23]:
params_ray = evapy.distributions.rayleigh.fit(peak_dc)
peak_dc_ray = evapy.distributions.rayleigh(*params_ray)
In [24]:
print('location: {}'.format(params_ray[0]))
print('scale: {}'.format(params_ray[1]))
In [26]:
plt.figure('Peaks histogram and Rayleigh distribution', figsize=(16,9))
plt.plot(r, peak_dc_ray.pdf(r), color='r')
plt.hist(peak_dc, bins=25, density=True)
plt.show()
Using the Weibull distribution:
Note: Sometimes, the estimation alogorithm fails if the starting values for scale parameter is too far off. This is a known issue and we are working on it. In the mean time, you may provice a reasonable starting value for the scale parameter.
In [27]:
params_wb = evapy.distributions.weibull.fit(peak_dc, scale=20.)
peak_dc_wb = evapy.distributions.weibull(*params_wb)
In [28]:
print('shape: {}'.format(params_wb[0]))
print('location: {}'.format(params_wb[1]))
print('scale: {}'.format(params_wb[2]))
The shape parameter should be close to 2. Sometimes the estimation alogorithm fails and divereges for 3 parameter Weibull using the MLE.
Lets see how the histogram looks.
In [30]:
plt.figure('Peaks histogram and Weibull distribution', figsize=(16,9))
plt.plot(r, peak_dc_wb.pdf(r), color='r')
plt.hist(peak_dc, bins=25, density=True)
plt.show()
Often, the to provide a good starting value for the scale parameter. Or to lock the location parameter based on prior knowledge. In our case, the signal mean can be a reasonable choice.
In [31]:
params_wb = evapy.distributions.weibull.fit(peak_dc, floc=np.mean(signal))
peak_dc_wb = evapy.distributions.weibull(*params_wb)
In [32]:
print('shape: {}'.format(params_wb[0]))
print('location: {}'.format(params_wb[1]))
print('scale: {}'.format(params_wb[2]))
In [34]:
plt.figure('Peaks histogram and Weibull distribution', figsize=(16,9))
plt.plot(r, peak_dc_wb.pdf(r), color='r')
plt.hist(peak_dc, bins=25, density=True)
plt.show()
Using the generalized exponential tail distribution:
Note that the generalized exponential tail distribution is particularly suitable for truncated data. Lets truncate our peak data at 35.
In [35]:
peak_dc_trunc = peak_dc[peak_dc>=35.]
We have now discareded most of our data:
In [36]:
print('original data size: {}'.format(len(peak_dc)))
print('truncated data size: {}'.format(len(peak_dc_trunc)))
WARNING: At this point, the estimation alogrithm for the generalized exponential tail distribution is very unstable. It is recommended to lock the location parameter and provide a good starting value for the scale parameter. We are working on it!
In [37]:
params_get = evapy.distributions.genexptail.fit(peak_dc_trunc, scale=20., floc=np.mean(signal))
peak_dc_get = evapy.distributions.genexptail(*params_get)
In [38]:
print('shape: {}'.format(params_get[0]))
print('q: {}'.format(params_get[1]))
print('location: {}'.format(params_get[2]))
print('scale: {}'.format(params_get[3]))
In [40]:
plt.figure('Peaks histogram and generalized exponential tail distribution', figsize=(16,9))
plt.plot(r, peak_dc_get.pdf(r), color='r')
plt.hist(peak_dc_trunc, bins=15, density=True)
plt.show()
We can use the the relationship between the peak distributions and the maxima distributions to establish an estimate for the largest maxima. First, lets see whats the largest observed value in our signal.
In [41]:
max_obs = np.max(signal)
print('largest maxima: {}'.format(max_obs))
Using Rayleig:
In [42]:
rloc, rscale = evapy.distributions.rayleigh.fit(peak_dc, floc=0.)
rN = len(peak_dc)
In [43]:
maxima_ray_dist = evapy.distributions.acer_o1(2., rN, loc=rloc, scale=np.sqrt(2.)*rscale)
In [44]:
median_maxima = maxima_ray_dist.ppf(0.5)
print('median maxima: {}'.format(median_maxima))
Using Weibull:
In [45]:
wshape, wloc, wscale = evapy.distributions.weibull.fit(peak_dc, floc=0.)
wN = len(peak_dc)
In [46]:
maxima_wb_dist = evapy.distributions.acer_o1(wshape, wN, loc=wloc, scale=wscale)
In [47]:
median_maxima = maxima_wb_dist.ppf(0.5)
print('median maxima: {}'.format(median_maxima))
Using generalized exponential tail:
In [48]:
gshape, qn, gloc, gscale = evapy.distributions.genexptail.fit(peak_dc_trunc, floc=0., scale=20.)
gN = len(peak_dc_trunc)
In [49]:
maxima_get_dist = evapy.distributions.acer_o1(gshape, gN*qn, loc=gloc, scale=gscale)
In [50]:
median_maxima = maxima_get_dist.ppf(0.5)
print('median maxima: {}'.format(median_maxima))
In [ ]: