Notas para contenedor de docker:
Comando de docker para ejecución de la nota de forma local:
nota: cambiar <ruta a mi directorio>
por la ruta de directorio que se desea mapear a /datos
dentro del contenedor de docker.
docker run --rm -v <ruta a mi directorio>:/datos --name jupyterlab_r_kernel_local -p 8888:8888 -d palmoreck/jupyterlab_r_kernel:1.1.0
password para jupyterlab: qwerty
Detener el contenedor de docker:
docker stop jupyterlab_r_kernel_local
Documentación de la imagen de docker palmoreck/jupyterlab_r_kernel:1.1.0
en liga.
En esta nota se consideran resolver problemas de la forma:
$$\min f_o(x)$$con $f_o:\mathbb{R}^n \rightarrow \mathbb{R}$ fuertemente convexa y $f_o \in \mathcal{C}^2(\text{dom}f_o)$ para buscar óptimos locales. Además se asume que los puntos iniciales $x^{(0)}$ de los métodos iterativos están en $\text{dom}f_o$ y los conjuntos $f_o(x^{(0)})$-subnivel son conjuntos cerrados. Ver 1.4.Polinomios_de_Taylor_y_diferenciacion_numerica y 4.1.Optimizacion_numerica_y_machine_learning para definiciones utilizadas en esta nota.
También se asume que existe un punto óptimo $x^*$ por lo que el problema tiene solución y el valor óptimo se denota por $p^* = f_o(x^*) = \inf f_o(x)$
Las suposición que una función $f$ sea convexa asegura que una condición necesaria y suficiente para que $x^*$ sea óptimo es: $\nabla f(x^*) = 0$ la cual es en general es un conjunto de $n$ ecuaciones no lineales en $n$ variables y que resuelve el problema de optimización planteado al inicio.
Algoritmo de descenso en gradiente
Dado un punto inicial $x$ en $\text{dom}f_o$
Repetir el siguiente bloque para $k=0,1,2,...$
- Calcular la dirección de descenso $\Delta x = - \nabla f_o(x)$.
- Búsqueda de línea. Elegir un tamaño de paso $t > 0$.
- Hacer la actualización: $x = x + t\Delta x$.
hasta convergencia (satisfacer criterio de paro).
Nota: los pasos en el algoritmo anterior representan una guía para la implementación. Al describirse los pasos de un algoritmo no implica que se tengan que implementar uno a continuación del otro como se describe. Si una implementación respeta la lógica y al mismo método, entonces pueden seguirse los pasos de una forma distinta.
In [1]:
install.packages("latex2exp",lib="/usr/local/lib/R/site-library/",
repos="https://cran.itam.mx/")
In [1]:
library(ggplot2)
library(latex2exp)
In [2]:
inc_index<-function(vec,index,h){
'
Auxiliary function for gradient and Hessian computation.
Args:
vec (double): vector
index (int): index.
h (float): quantity that vec[index] will be increased.
Returns:
vec (double): vector with vec[index] increased by h.
'
vec[index]<-vec[index]+h
vec
}
In [3]:
gradient_approximation<-function(f,x,h=1e-8){
'
Numerical approximation of gradient for function f using forward differences.
Args:
f (expression): definition of function f.
x (double): vector that holds values where gradient will be computed.
h (float): step size for forward differences, tipically h=1e-8
Returns:
gf (array): numerical approximation to gradient of f.
'
n<-length(x)
gf<-vector("double",n)
for(i in 1:n){
gf[i]=(f(inc_index(x,i,h))-f(x))
}
gf/h
}
In [4]:
Hessian_approximation<-function(f,x,h=1e-6){
'
Numerical approximation of Hessian for function f using forward differences.
Args:
f (expression): definition of function f.
x (double): vector that holds values where Hessian will be computed.
h (float): step size for forward differences, tipically h=1e-6
Returns:
Hf (double): matrix of numerical approximation to Hessian of f.
'
n<-length(x)
Hf<-matrix(rep(0,n^2),nrow=n,ncol=n)
f_x<-f(x)
for(i in 1:n){
x_inc_in_i<-inc_index(x,i,h)
f_x_inc_in_i<-f(x_inc_in_i)
for(j in i:n){
dif<-f(inc_index(x_inc_in_i,j,h))-f_x_inc_in_i-f(inc_index(x,j,h))+f_x
Hf[i,j]<-dif
if(j!=i)
Hf[j,i]<-dif
}
}
Hf/h^2
}
In [5]:
line_search_by_backtracking<-function(f,dir_desc,x,
der_direct, alpha=.15, beta=.5){
'
Line search that sufficiently decreases f restricted to a ray in the direction dir_desc.
Args:
alpha (float): parameter in line search with backtracking, tipically .15
beta (float): parameter in line search with backtracking, tipically .5
f (expression): definition of function f.
dir_desc (double): vector of descent direction.
x (double): vector that holds values where line search will be performed.
der_direct (float): directional derivative of f.
Returns:
t (float): positive number for stepsize along dir_desc that sufficiently decreases f.
'
t<-1
if (alpha > 1/2){
print('alpha must be less than or equal to 1/2')
t <- -1
}
if (beta>1){
print('beta must be less than 1')
t <- -1
}
if (t!=-1){
eval1 <- f(x+t*dir_desc)
eval2 <- f(x) + alpha*t*der_direct
while (eval1 > eval2){
t<-beta*t
eval1 <- f(x+t*dir_desc)
eval2 <- f(x)+alpha*t*der_direct
}
}
t
}
In [6]:
Euclidian_norm<-function(vec){
sqrt(sum(vec*vec))
}
In [7]:
compute_error<-function(x_obj,x_approx){
'
Relative or absolute error between x_obj and x_approx.
'
if (Euclidian_norm(x_obj) > .Machine$double.eps){
Err<-Euclidian_norm(x_obj-x_approx)/Euclidian_norm(x_obj)
}else
Err<-Euclidian_norm(x_obj-x_approx)
Err
}
In [8]:
gradient_descent<-function(f, x_0, tol,
tol_backtracking, x_ast, p_ast, maxiter){
'
Method of gradient descent to numerically approximate solution of min f.
Args:
f (expression): definition of function f.
x_0 (double): vector of initial point for gradient descent method.
tol (float): tolerance that will halt method. Controls norm of gradient of f.
tol_backtracking (float): tolerance that will halt method. Controls value of line search by backtracking.
x_ast (double): vector solution of min f, now its required that user knows the solution...
p_ast (double): vector value of f(x_ast), now its required that user knows the solution...
maxiter (int): maximum number of iterations
Returns:
x (double): vector approximation of x_ast.
iteration (int): number of iterations.
Err_plot (double): vector array of absolute error between p_ast and f(x) with x approximation.
of x_ast. Useful for plotting.
x_plot (double): vector array that containts in columns vector of approximations. Last column
contains x, approximation of solution. Useful for plotting.
'
iteration <- 1
x <- x_0
feval <- f(x)
gfeval <- gradient_approximation(f,x)
normgf <- Euclidian_norm(gfeval)
Err_plot_aux <- vector("double",maxiter)
Err_plot_aux[iteration] <- compute_error(p_ast,feval)
Err <- compute_error(x_ast,x)
n <- length(x)
x_plot <- matrix(0,nrow=n,ncol=maxiter)
x_plot[,iteration] <- x
cat(sprintf("I\tNormagf\t\tError x_ast\tError p_ast\tline search\n"))
cat(sprintf("%d\t%.2e\t%0.2e\t%0.2e\t%s\n",iteration,normgf,Err,Err_plot_aux[iteration],"---"))
iteration<-iteration + 1
while(normgf>tol && iteration <= maxiter){
dir_desc <- -gfeval
der_direct <- sum(gfeval*dir_desc)
t <- line_search_by_backtracking(f,dir_desc,x,der_direct)
x <- x + t*dir_desc
feval <- f(x)
gfeval <- gradient_approximation(f,x)
normgf <- Euclidian_norm(gfeval)
Err_plot_aux[iteration] <- compute_error(p_ast,feval)
x_plot[,iteration] <- x
Err <- compute_error(x_ast,x)
cat(sprintf("%d\t%.2e\t%0.2e\t%0.2e\t%0.2e\n",iteration,normgf,Err,Err_plot_aux[iteration],t))
if (t<tol_backtracking){ #if t is less than tol_backtracking then we need to check the reason
iter_salida <- iteration
iteration <- maxiter - 1
}
iteration <- iteration + 1
} #while
cat(sprintf("Error of x with respect to x_ast: %.2e\n",Err))
cat(sprintf("Approximate solution:"))
print(x)
cond <- Err_plot_aux > .Machine$double.eps*10**(-2)
Err_plot <- Err_plot_aux[cond]
if (iteration == maxiter && t < tol_backtracking){
print("Backtracking value less than tol_backtracking, check approximation")
iteration<-iter_salida
x_plot <- x_plot[,1:iteration]
iteration <- iteration -1
}
else{
if (iteration == maxiter) print("Reached maximum number of iterations, check approximation")
iteration <- iteration - 1
x_plot <- x_plot[,1:iteration]
}
list(x,iteration,Err_plot,x_plot)
}
1) $$\min \quad (x_1-2)^2+(2-x_2)^2+x_3^2+x_4^4$$
La solución del problema anterior es $x^* = (2,2,0,0)^T$.
Método de descenso gradiente con $x_0=(5,5,1,0)^T$:
In [9]:
fo <-function(x)(x[1]-2)**2 + (2-x[2])**2 + x[3]**2 + x[4]**4
In [10]:
x_ast <- c(2,2,0,0)
x_0 <- c(5,5,1,0)
tol <- 1e-8
tol_backtracking <- 1e-14
maxiter <- 5
p_ast <- fo(x_ast)
In [11]:
l<-gradient_descent(fo, x_0, tol, tol_backtracking, x_ast, p_ast, maxiter)
In [12]:
x <- l[[1]]
total_of_iterations <- l[[2]]
Err_plot <- l[[3]]
x_plot <- l[[4]]
In [13]:
gg <- ggplot()
In [14]:
gg +
geom_line(aes(x=1:total_of_iterations,y=Err_plot)) +
xlab('Iterations') + ylab(TeX('Error relativo entre f_o(x^k) y p^*'))
2) El método de descenso en gradiente es altamente sensible a la forma de las curvas de nivel de la función objetivo $f_o$. Para observar esto considérese el problema: $$\min \frac{1}{2}\left(x_1^2+Cx_2^2 \right)$$
La solución del problema anterior es $x^*=(0,0)^T$.
Método de descenso en gradiente con $x_0=(0.5,0.5)^T$.
Descenso en gradiente para caso $C=10$
In [15]:
cte<-10
In [16]:
fo <- function(x) 1/2*(x[1]**2+cte*x[2]**2)
In [17]:
x_ast <- c(0,0)
x_0 <- c(0.5,0.5)
tol <- 1e-8
tol_backtracking <- 1e-14
maxiter <- 50
p_ast <- fo(x_ast)
l<-gradient_descent(fo, x_0, tol, tol_backtracking, x_ast, p_ast, maxiter)
x <- l[[1]]
total_of_iterations <- l[[2]]
Err_plot <- l[[3]]
x_plot <- l[[4]]
In [18]:
gg <- ggplot()
In [19]:
gg +
geom_line(aes(x=1:total_of_iterations,y=Err_plot)) +
xlab('Iterations') + ylab(TeX('Error relativo entre f_o(x^k) y p^*'))
In [20]:
gg +
geom_point(aes(x=x_plot[1,],y=x_plot[2,]),size=2) +
annotate(geom='text', x=0.5, y=0.47,
label=TeX("x^{(0)}", output='character'), parse=TRUE) +
xlab('x') + ylab('y') +
ggtitle(TeX('Iter del método de descenso en gradiente para $f_o$ con $C=10$'))
Referencias: