Analisis del viento

Docente: Dr. Daniel Forchetti @danielforchetti
Alumnos: Franco Bellomo @fnbellomo, Lucas Bellomo @LucasEBellomo

Marco teórico

La velocidad de los vientos sigue una distribución de $Weibull(x;\lambda,\kappa)$, y las podemos medir con un anemómetro. Si obtenemos varias velocidades, entoces vamos a poder hacer un ajuste por mínimos cuadrados (OLS) y obtener los valores caracteristicos de la distribución $\lambda$ y $\kappa$.
La Función de Densidad de Probabilidad (PDF en ingles) de una variable aleatoria $x$ que sigue la distribución de Weibull es:

$$f\left(x;\lambda,\kappa\right)=\begin{cases} \dfrac{\kappa}{\lambda}\left(\dfrac{x}{\lambda}\right)^{\kappa-1}e^{-\left(x/\lambda\right)^{\kappa}} & x\geq0\\ \\0 & x<0 \end{cases}$$

En python la PDF que esta implementa es la Weibull exponencial:

$$f\left(x;\lambda,\kappa,\alpha\right)=\begin{cases}\alpha\dfrac{\kappa}{\lambda}\left(\dfrac{x}{\lambda}\right)^{\kappa-1}\left[1-e^{-\left(x/\lambda\right)^{\kappa}}\right]^{\alpha-1}e^{-\left(x/\lambda\right)^{\kappa}} & x\geq0\\ \\0 & x<0 \end{cases}$$

Donde $\kappa > 0$ y $\alpha > 0$ son el primer y segundo parámetro de forma (shape parameter) respectivamente, y $\lambda > 0$ en el parámetro de escala de la distribución (scale paramenter).
En particular si $\alpha = 1$ obtenemos la distribución de Weibull.

Simulando la distribución

Imports necesarios



In [1]:

    
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

#para pasar el datashet de r a pandas y al reves tambien
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
import pandas.rpy.common as com

#Para que los plot queden en el notebook
%matplotlib inline

#this needs rpy2 package
%load_ext rmagic

Algunas distribuciones de Weibull típicas



In [2]:

    
#Defino los parámetros de la distribución para hacer el plot
shape_values = [1, 1.6, 3.6, 15]
scale_values = [3, 4, 6, 17]
lineStyles = ['-', '--', ':', '-.']
colors = ['black', 'red', 'blue', 'orange']
x = np.linspace(-1, 20, 2000)
 
#Ploteo la distribución
fig, ax = plt.subplots(figsize=(10, 8))
 
for (shape, scale, lineStyle, color) in zip(shape_values, scale_values, lineStyles, colors):
    distribution = stats.exponweib(1, shape, loc = 0, scale = scale)
    plt.plot(x, distribution.pdf(x), ls=lineStyle, c=color, linewidth = 3, 
             label=r'$\kappa=%.1f,\ \lambda=%i$' % (shape, scale))
 
plt.xlim(0 , 20)
plt.ylim(0, 0.4)
plt.xlabel('$x$', fontsize=25)
plt.ylabel(r'$f(x| \kappa,\lambda)$', fontsize=25)
plt.title('PDF two parameter Weibull distribution', fontsize=15)
plt.legend(loc = "best", fontsize=20)
plt.show()

Primer ajuste (scipy.stats)

Generamos n-sample de valores aleatorios que siguen una distribución de Weibull con $\kappa = 1.6$, $\lambda = 4$ y realizamos un primer ajuste con SciPy. Luegos graficamos los datos (el histograma en azul) y la curva que obtenemos del ajuste (rojo).



In [3]:

    
#Genero datos con la distribución de Weibull

#Parámetros y datos generados
kappa, lam = 1.6, 4
nsample = 10000
data = stats.exponweib.rvs(1,kappa, loc = 0, scale = lam, size = nsample)

#Ajusto los datos
shape_a, shape_b, loc, scale = stats.exponweib.fit(data, loc=0)
x = np.linspace(0, 20, 1000)
distribution = stats.exponweib(shape_a, shape_b, loc = loc, scale = scale)

#Ploteo
plt.figure(figsize=(10, 8))
plt.hist(data, normed=True, histtype='bar', alpha=0.2, 
         label=r'$\kappa=%.2f,\ \lambda=%.2f$' % (kappa, lam))
plt.plot(x, distribution.pdf(x), color = "red", 
         label=r'$\kappa=%.2f,\ \lambda=%.2f$' % (shape_b, scale))
plt.title('Azul: mediciones   Rojo: ajuste', fontsize=15)
plt.xlabel('$vel \ viento$', fontsize=25)
plt.ylabel(r'$f(x| \kappa,\lambda)$', fontsize=25)
plt.legend(loc = "best", fontsize=20)
plt.show()

Ajustando con R (fitdistrplus)

Si bien el ajuste anterior nos da valores adecuados, la idea ahora es poder hacer otro ajuste más completo donde obtengamos mayor información (por ej, intervalos de confianza, error estandar, ...)

Magic de R (EDITAR)

IPython has an rmagic extension that contains a some magic functions for working with R via rpy2.
The line magic %Rpush copies its arguments to variables of the same name in rpy2. The %R line magic evaluates the string in rpy2 and returns the results.
There are two more line magics, %Rpull and %Rget. Both are useful after some R code has been executed and there are variables in the rpy2 namespace that one would like to retrieve. The main difference is that one returns the value (%Rget), while the other pulls it to self.shell.user_ns (%Rpull). Imagine we've stored the results of some calculation in the variable "a" in rpy2's namespace. By using the %R magic, we can obtain these results and store them in b. We can also pull them directly to user_ns with %Rpull. They are both views on the same data.



In [16]:

    
#Genero algunos datos para ajsutar.
kappa, lam, nsample = 1.6, 4, 1000
data = stats.exponweib.rvs(1,kappa, loc = 0, scale = lam, size = nsample)

Para pasar un array de numpy a R y que lo entienda este, lo tengo que copiar con rep()



In [19]:

    
%%R -i data -o fit2

library(fitdistrplus)

#Generate fake data
shape <- 1.9
x <- rweibull(n=10, shape=shape, scale=1)

data2 <- rep(data)

#Fit x data with fitdist
fit2 <- fitdist(data2, "weibull")
fit.w <- fitdist(x, "weibull")
plot(fit2)
summary(fit2)









    





Fitting of the distribution ' weibull ' by maximum likelihood 
Parameters : 
      estimate Std. Error
shape 1.634158 0.04040220
scale 4.151027 0.08455037
Loglikelihood:  -2157.334   AIC:  4318.667   BIC:  4328.483 
Correlation matrix:
          shape     scale
shape 1.0000000 0.3120903
scale 0.3120903 1.0000000



In [5]:

    
print fit2[0][0], fit2[2][0]









    



1.64195848513 0.0401311311781

Generando datos

Para simplificar, supongo que todos los meses tienen 26 días. Ademas, los datos estar guardados de la siguiente forma:

              dia-mes-año hora:minuto, velcodad viento



In [3]:

    
#constantes de la distribución
kappa, lam = 1.6, 4
hora = []

for i in range(0, 24):
    for j in np.arange(0, 60, 10):
        hora.append(i)

minuto = 12*30*24*[0, 10, 20, 30, 40, 50]
dias = []
meses = []
anio = [2014]
medicion = []
        
for mes in range(1, 13):
    for dia in range(1, 25):
        for i in range(144):
            val = stats.exponweib.rvs(1,kappa, loc = 0, scale = lam, size = 1)
            medicion.append(val[0])
            dias.append(dia)
            meses.append(mes)

hora = 30*12*hora
anio = 41472*anio

fecha = []
for i in range(len(anio)):
    fec = '%d-%d-%d %d:%d' %(dias[i], meses[i], anio[i], hora[i], minuto[i])
    fecha.append(fec)

#datosDataSheet = zip(dias, meses, anio, hora, minuto, medicion)
datosDataSheet = zip(fecha, medicion)
df = DataFrame(data = datosDataSheet, columns = ['Fecha', 'VelViento'])
df.to_csv('datos3.csv', index = False)

Ahora, escribo una función en R con la cual hago el analisis de los datos, y lo devueldo a un dataframe de python



In [2]:

    
base = importr('base')
fitdistrplus = importr('fitdistrplus')

r = robjects.r

r('''
    path <- "/home/franco/Desktop/Proyectos/GitHub/Molinos/datos3.csv"
    myData <- read.csv(path, header = TRUE)
    names(myData)[1] <- "Fecha"
    names(myData)[2] <- "VelViento"
    myData$time <-strptime(myData$Fecha, "%d-%m-%Y %H:%M")

    fit_year <- fitdist(myData$VelViento, "weibull")

    shape <- c()
    scale <- c()
    error_shape <- c()
    error_scale <- c()
    vel_min <- c()
    vel_max <- c()
    vel_prom <- c()

    for(i in 0:11){

      fit <- fitdist(myData$VelViento[myData$time$mon == i], "weibull") 
      shape <- c(shape, fit$estimate[1])
      scale <- c(scale, fit$estimate[2])
      error_shape <- c(error_shape, fit$sd[1])
      error_scale <- c(error_scale, fit$sd[2])
      vel_min <- c(vel_min, min(myData$VelViento[myData$time$mon == i]))
      vel_max <- c(vel_max, max(myData$VelViento[myData$time$mon == i]))
      vel_prom <- c(vel_prom, mean(myData$VelViento[myData$time$mon == i]))
    }

    mes <- c(1:12)
    
    datos_meses <- data.frame(mes, shape, scale, error_shape, 
                          error_scale, vel_max, vel_min, vel_prom)
                          
    shape_year <- c(fit_year$estimate[1], fit_year$sd[1])
    scale_year <- c(fit_year$estimate[2],fit_year$sd[2])

    dt_anio <- data.frame(shape_year, scale_year)

  ''')

df_meses = com.load_data('datos_meses')
df_anio = com.load_data('dt_anio')

Entonces, los datos analizados nos quedan:



In [3]:

    
df_anio



In [4]:

    
df_meses









    Out[4]:






  
    
      
      mes
      shape
      scale
      error_shape
      error_scale
      vel_max
      vel_min
      vel_prom
    
  
  
    
      1 
        1
       1.624777
       4.011322
       0.021667
       0.044197
       13.308705
       0.008624
       3.593553
    
    
      2 
        2
       1.597732
       3.982284
       0.021192
       0.044624
       17.214456
       0.005929
       3.572638
    
    
      3 
        3
       1.642403
       4.023454
       0.021768
       0.043890
       14.102925
       0.033276
       3.597913
    
    
      4 
        4
       1.584287
       3.973859
       0.020828
       0.044951
       16.752868
       0.002588
       3.564112
    
    
      5 
        5
       1.576450
       3.977964
       0.020913
       0.045193
       14.555497
       0.024818
       3.571988
    
    
      6 
        6
       1.589433
       4.015068
       0.021070
       0.045232
       15.362640
       0.016314
       3.603369
    
    
      7 
        7
       1.630204
       4.058354
       0.021361
       0.044635
       18.458527
       0.069668
       3.628294
    
    
      8 
        8
       1.613670
       4.039624
       0.021341
       0.044854
       15.086976
       0.025377
       3.617638
    
    
      9 
        9
       1.577981
       3.947073
       0.020774
       0.044838
       13.934380
       0.013052
       3.540237
    
    
      10
       10
       1.619034
       3.978038
       0.021374
       0.044010
       17.572493
       0.023480
       3.562839
    
    
      11
       11
       1.575198
       3.966355
       0.020781
       0.045128
       14.776303
       0.043549
       3.558880
    
    
      12
       12
       1.606712
       3.995986
       0.021363
       0.044533
       13.397509
       0.008582
       3.582613



In [6]:

    
fig, axes = plt.subplots(nrows=2, figsize=(10,10))

axes[1].plot(df_meses['mes'], df_meses['shape'], label=r'$\lambda$')
axes[1].plot(df_meses['mes'], df_meses['scale'], label=r'$\kappa$')
axes[0].plot(df_meses['mes'], df_meses['vel_max'], label=r'$Vel.\; Max$')
axes[1].plot(df_meses['mes'], df_meses['vel_min'], label=r'$Vel.\; Min$')
axes[1].plot(df_meses['mes'], df_meses['vel_prom'], label=r'$Vel.\; Prom$')

for ax in axes.flatten():
    ax.grid(True)
    ax.legend(loc = "best", fontsize=20)



In [8]:

    
#Execute this cell to load the notebook's style sheet, then ignore it
#Css diseñando por el grupo de @LorenaABarba.
from IPython.core.display import HTML
css_file = '../../css/IPython_personal_style.css'
HTML(open(css_file, "r").read())









    Out[8]:

	shape_year	scale_year
1	1.602511	3.997672
2	0.006118	0.012901

	mes	shape	scale	error_shape	error_scale	vel_max	vel_min	vel_prom
1	1	1.624777	4.011322	0.021667	0.044197	13.308705	0.008624	3.593553
2	2	1.597732	3.982284	0.021192	0.044624	17.214456	0.005929	3.572638
3	3	1.642403	4.023454	0.021768	0.043890	14.102925	0.033276	3.597913
4	4	1.584287	3.973859	0.020828	0.044951	16.752868	0.002588	3.564112
5	5	1.576450	3.977964	0.020913	0.045193	14.555497	0.024818	3.571988
6	6	1.589433	4.015068	0.021070	0.045232	15.362640	0.016314	3.603369
7	7	1.630204	4.058354	0.021361	0.044635	18.458527	0.069668	3.628294
8	8	1.613670	4.039624	0.021341	0.044854	15.086976	0.025377	3.617638
9	9	1.577981	3.947073	0.020774	0.044838	13.934380	0.013052	3.540237
10	10	1.619034	3.978038	0.021374	0.044010	17.572493	0.023480	3.562839
11	11	1.575198	3.966355	0.020781	0.045128	14.776303	0.043549	3.558880
12	12	1.606712	3.995986	0.021363	0.044533	13.397509	0.008582	3.582613