Docente: Dr. Daniel Forchetti @danielforchetti
Alumnos: Franco Bellomo @fnbellomo, Lucas Bellomo @LucasEBellomo
La velocidad de los vientos sigue una distribución de $Weibull(x;\lambda,\kappa)$, y las podemos medir con un anemómetro. Si obtenemos varias velocidades, entoces vamos a poder hacer un ajuste por mínimos cuadrados (OLS) y obtener los valores caracteristicos de la distribución $\lambda$ y $\kappa$.
La Función de Densidad de Probabilidad (PDF en ingles) de una variable aleatoria $x$ que sigue la distribución de Weibull es:
En python la PDF que esta implementa es la Weibull exponencial:
$$f\left(x;\lambda,\kappa,\alpha\right)=\begin{cases}\alpha\dfrac{\kappa}{\lambda}\left(\dfrac{x}{\lambda}\right)^{\kappa-1}\left[1-e^{-\left(x/\lambda\right)^{\kappa}}\right]^{\alpha-1}e^{-\left(x/\lambda\right)^{\kappa}} & x\geq0\\ \\0 & x<0 \end{cases}$$Donde $\kappa > 0$ y $\alpha > 0$ son el primer y segundo parámetro de forma (shape parameter) respectivamente, y $\lambda > 0$ en el parámetro de escala de la distribución (scale paramenter).
En particular si $\alpha = 1$ obtenemos la distribución de Weibull.
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
#para pasar el datashet de r a pandas y al reves tambien
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
import pandas.rpy.common as com
#Para que los plot queden en el notebook
%matplotlib inline
#this needs rpy2 package
%load_ext rmagic
In [2]:
#Defino los parámetros de la distribución para hacer el plot
shape_values = [1, 1.6, 3.6, 15]
scale_values = [3, 4, 6, 17]
lineStyles = ['-', '--', ':', '-.']
colors = ['black', 'red', 'blue', 'orange']
x = np.linspace(-1, 20, 2000)
#Ploteo la distribución
fig, ax = plt.subplots(figsize=(10, 8))
for (shape, scale, lineStyle, color) in zip(shape_values, scale_values, lineStyles, colors):
distribution = stats.exponweib(1, shape, loc = 0, scale = scale)
plt.plot(x, distribution.pdf(x), ls=lineStyle, c=color, linewidth = 3,
label=r'$\kappa=%.1f,\ \lambda=%i$' % (shape, scale))
plt.xlim(0 , 20)
plt.ylim(0, 0.4)
plt.xlabel('$x$', fontsize=25)
plt.ylabel(r'$f(x| \kappa,\lambda)$', fontsize=25)
plt.title('PDF two parameter Weibull distribution', fontsize=15)
plt.legend(loc = "best", fontsize=20)
plt.show()
In [3]:
#Genero datos con la distribución de Weibull
#Parámetros y datos generados
kappa, lam = 1.6, 4
nsample = 10000
data = stats.exponweib.rvs(1,kappa, loc = 0, scale = lam, size = nsample)
#Ajusto los datos
shape_a, shape_b, loc, scale = stats.exponweib.fit(data, loc=0)
x = np.linspace(0, 20, 1000)
distribution = stats.exponweib(shape_a, shape_b, loc = loc, scale = scale)
#Ploteo
plt.figure(figsize=(10, 8))
plt.hist(data, normed=True, histtype='bar', alpha=0.2,
label=r'$\kappa=%.2f,\ \lambda=%.2f$' % (kappa, lam))
plt.plot(x, distribution.pdf(x), color = "red",
label=r'$\kappa=%.2f,\ \lambda=%.2f$' % (shape_b, scale))
plt.title('Azul: mediciones Rojo: ajuste', fontsize=15)
plt.xlabel('$vel \ viento$', fontsize=25)
plt.ylabel(r'$f(x| \kappa,\lambda)$', fontsize=25)
plt.legend(loc = "best", fontsize=20)
plt.show()
IPython has an rmagic extension that contains a some magic functions for working with R via rpy2.
The line magic %Rpush copies its arguments to variables of the same name in rpy2. The %R line magic evaluates the string in rpy2 and returns the results.
There are two more line magics, %Rpull and %Rget. Both are useful after some R code has been executed and there are variables in the rpy2 namespace that one would like to retrieve. The main difference is that one returns the value (%Rget), while the other pulls it to self.shell.user_ns (%Rpull). Imagine we've stored the results of some calculation in the variable "a" in rpy2's namespace. By using the %R magic, we can obtain these results and store them in b. We can also pull them directly to user_ns with %Rpull. They are both views on the same data.
In [16]:
#Genero algunos datos para ajsutar.
kappa, lam, nsample = 1.6, 4, 1000
data = stats.exponweib.rvs(1,kappa, loc = 0, scale = lam, size = nsample)
Para pasar un array de numpy a R y que lo entienda este, lo tengo que copiar con rep()
In [19]:
%%R -i data -o fit2
library(fitdistrplus)
#Generate fake data
shape <- 1.9
x <- rweibull(n=10, shape=shape, scale=1)
data2 <- rep(data)
#Fit x data with fitdist
fit2 <- fitdist(data2, "weibull")
fit.w <- fitdist(x, "weibull")
plot(fit2)
summary(fit2)
In [5]:
print fit2[0][0], fit2[2][0]
In [3]:
#constantes de la distribución
kappa, lam = 1.6, 4
hora = []
for i in range(0, 24):
for j in np.arange(0, 60, 10):
hora.append(i)
minuto = 12*30*24*[0, 10, 20, 30, 40, 50]
dias = []
meses = []
anio = [2014]
medicion = []
for mes in range(1, 13):
for dia in range(1, 25):
for i in range(144):
val = stats.exponweib.rvs(1,kappa, loc = 0, scale = lam, size = 1)
medicion.append(val[0])
dias.append(dia)
meses.append(mes)
hora = 30*12*hora
anio = 41472*anio
fecha = []
for i in range(len(anio)):
fec = '%d-%d-%d %d:%d' %(dias[i], meses[i], anio[i], hora[i], minuto[i])
fecha.append(fec)
#datosDataSheet = zip(dias, meses, anio, hora, minuto, medicion)
datosDataSheet = zip(fecha, medicion)
df = DataFrame(data = datosDataSheet, columns = ['Fecha', 'VelViento'])
df.to_csv('datos3.csv', index = False)
Ahora, escribo una función en R con la cual hago el analisis de los datos, y lo devueldo a un dataframe de python
In [2]:
base = importr('base')
fitdistrplus = importr('fitdistrplus')
r = robjects.r
r('''
path <- "/home/franco/Desktop/Proyectos/GitHub/Molinos/datos3.csv"
myData <- read.csv(path, header = TRUE)
names(myData)[1] <- "Fecha"
names(myData)[2] <- "VelViento"
myData$time <-strptime(myData$Fecha, "%d-%m-%Y %H:%M")
fit_year <- fitdist(myData$VelViento, "weibull")
shape <- c()
scale <- c()
error_shape <- c()
error_scale <- c()
vel_min <- c()
vel_max <- c()
vel_prom <- c()
for(i in 0:11){
fit <- fitdist(myData$VelViento[myData$time$mon == i], "weibull")
shape <- c(shape, fit$estimate[1])
scale <- c(scale, fit$estimate[2])
error_shape <- c(error_shape, fit$sd[1])
error_scale <- c(error_scale, fit$sd[2])
vel_min <- c(vel_min, min(myData$VelViento[myData$time$mon == i]))
vel_max <- c(vel_max, max(myData$VelViento[myData$time$mon == i]))
vel_prom <- c(vel_prom, mean(myData$VelViento[myData$time$mon == i]))
}
mes <- c(1:12)
datos_meses <- data.frame(mes, shape, scale, error_shape,
error_scale, vel_max, vel_min, vel_prom)
shape_year <- c(fit_year$estimate[1], fit_year$sd[1])
scale_year <- c(fit_year$estimate[2],fit_year$sd[2])
dt_anio <- data.frame(shape_year, scale_year)
''')
df_meses = com.load_data('datos_meses')
df_anio = com.load_data('dt_anio')
Entonces, los datos analizados nos quedan:
In [3]:
df_anio
Out[3]:
In [4]:
df_meses
Out[4]:
In [6]:
fig, axes = plt.subplots(nrows=2, figsize=(10,10))
axes[1].plot(df_meses['mes'], df_meses['shape'], label=r'$\lambda$')
axes[1].plot(df_meses['mes'], df_meses['scale'], label=r'$\kappa$')
axes[0].plot(df_meses['mes'], df_meses['vel_max'], label=r'$Vel.\; Max$')
axes[1].plot(df_meses['mes'], df_meses['vel_min'], label=r'$Vel.\; Min$')
axes[1].plot(df_meses['mes'], df_meses['vel_prom'], label=r'$Vel.\; Prom$')
for ax in axes.flatten():
ax.grid(True)
ax.legend(loc = "best", fontsize=20)
In [8]:
#Execute this cell to load the notebook's style sheet, then ignore it
#Css diseñando por el grupo de @LorenaABarba.
from IPython.core.display import HTML
css_file = '../../css/IPython_personal_style.css'
HTML(open(css_file, "r").read())
Out[8]: