Optimizers is one of the most important library of a Machine Learning (ML) framowork. Most of ML models are formuled as an optimization problems, i.e. minimizing a given cost function. Historically, Gradient Descent was one of the first optmization methods used for accomplishing such task.
Gradient Descent is based on the observation that if a multivariate function $f()$ is defined and differentiable in a neighborhood of a given point $a$ then for $\alpha$ small enough
$$f(a) \geq f \big( a-\alpha \bigtriangledown f(a) \big)$$and if $$f(a) = f\big( a-\alpha \bigtriangledown f(a) \big)$$ if and only if $\alpha=0$ then $a$ is a local minimum. Also, if $f()$ is a convex function then $a$ is also a global minimum.
One of the main problems with Gradient Descent is that it needs for each iteration you to provide $\alpha$ also known as learning rate. If $\alpha$ is too small you are going to converge too slowly. On the other hand, if $\alpha$ is too large you might diverge from the local optimum.
There are more sophisticated algorithms than Gradient Descent that are able to select the best learning rate automatically and, hence, for which you only need to provide
The most popular of these algorithms are
Python offers the package scipy.optimize where to minimize a function the following methods are available:
Let's see these algorithms in action.
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as spo
def parab(X):
## X = 2 is the min
Y = (X - 2)**2 + 1.5
return Y
initial_guess = 3
opt_methods_no_Jacobian = ['Nelder-Mead','Powell','CG','BFGS','L-BFGS-B','TNC','COBYLA','SLSQP']
mins = [spo.minimize(parab,initial_guess,method=om,) for om in opt_methods_no_Jacobian]
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian,
'min' : [m.x for m in mins]
}).set_index('Method')
min_df
Out[1]:
In [2]:
def plot_mins(f,min_df,xmin=-5,xmax=5,ticks=1000,title='',xguess=3,legend_pos='upper left'):
x = np.linspace(xmin, xmax, ticks)
y = f(x)
plt.plot(x, y ,color='b')
plt.plot(min_df, f(min_df),'g',linewidth=13,marker="+",label='solutions')
plt.plot(xguess, f(xguess),'black',linewidth=13,marker="*",label='initial guess')
plt.axis([xmin-1, xmax+1, y.min()-1, y.max()+1])
plt.title(title)
plt.grid(True)
plt.legend(loc=legend_pos)
plt.show()
plot_mins(f=parab,min_df=min_df,xmin=-3,xmax=8,title='Parabola')
In [3]:
def non_convex_1(X):
if type(X) is float:
if X <=0 or X >=4:
return 5.5
else:
return (X - 2)**2 + 1.5
else:
Y = (X - 2)**2 + 1.5
Y[np.logical_or(X <= 0 , X >=4)] = 5.5
return Y
initial_guess = 4.5
mins = [spo.minimize(non_convex_1,initial_guess,method=om,) for om in opt_methods_no_Jacobian]
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian,
'min' : [m.x for m in mins]
}).set_index('Method')
min_df
Out[3]:
In [4]:
plot_mins(f=non_convex_1,min_df=min_df,xmin=-2,xmax=9,title='Non-Convex example N.1 - initial guess = 4.5',
xguess=4.5,legend_pos='lower right')
In [5]:
initial_guess = 3.9
mins = [spo.minimize(non_convex_1,initial_guess,method=om,) for om in opt_methods_no_Jacobian]
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian,
'min' : [m.x for m in mins]
}).set_index('Method')
min_df
Out[5]:
In [6]:
plot_mins(f=non_convex_1,min_df=min_df,xmin=-2,xmax=9,title='Non-Convex example N.1 - initial guess = 3.9',
xguess=3.9,legend_pos='lower right')
The problem with such function is the fact it is not convex. Hence, starting from initial guess x = 3.9 the above optimizers are able to find the global minimum. On the contrary, starting from initial guess x = 4.5 no optmizer is able to find the global minimum.
For solving this optimization problem we need something more than the above optmiziation methods.
For instance, simulated annealing is one of the most widely used algorithms for finding the global minimum
of a multivariable function for different complex systems. In scipy.optimize simulated annealing has been
replaced by Basin Hopping that is a global optimization framework
particularly suited for multivariable multimodal optimization problems
In [7]:
from scipy.optimize import basinhopping
res = basinhopping(non_convex_1, 4.5, minimizer_kwargs={"method": "BFGS"},niter=200)
In [8]:
res.x
Out[8]:
We can see that Basin Hopping is able to find the global minimum instead of previous methods.
Let us consider the problem of minimizing the Rosenbrock function. This function (and its respective derivatives) is implemented in rosen
In [9]:
from scipy.optimize import minimize, rosen, rosen_der
In [10]:
mins = [spo.minimize(rosen,[1.3, 0.7, 0.8, 1.9, 1.2],method=om) for om in opt_methods_no_Jacobian]
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian,
'min' : [m.x for m in mins],
'min_value': [rosen(m.x) for m in mins]
}).set_index('Method')
min_df.sort_values(by = 'min_value' , inplace=True)
min_df
Out[10]:
We can see that all optimizers provide close minima although Powell makes a better job this time. Let's test Basin Hopping
In [11]:
res = basinhopping(rosen, [1.3, 0.7, 0.8, 1.9, 1.2], minimizer_kwargs={"method": "BFGS"},niter=200)
print(">>> basinhopping min: "+str(res.x)+" - min_value:"+str(rosen(res.x)))
Pretty close the other ones. Let's concatenate results.
In [12]:
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian+['Basin Hopping'],
'min' : [m.x for m in mins]+[res.x],
'min_value': [rosen(m.x) for m in mins]+[rosen(res.x)]
}).set_index('Method')
min_df.sort_values(by = 'min_value' , inplace=True)
min_df
Out[12]:
In [13]:
def my_rosen(x,out_value=10e10):
## no negative values
if np.any(x<0.0):
return out_value
return rosen(x)
In [14]:
mins = [spo.minimize(my_rosen,[1.3, 0.7, 0.8, 1.9, 1.2],method=om) for om in opt_methods_no_Jacobian]
res = basinhopping(my_rosen, [1.3, 0.7, 0.8, 1.9, 1.2], minimizer_kwargs={"method": "BFGS"},niter=200)
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian+['Basin Hopping'],
'min' : [m.x for m in mins]+[res.x],
'min_value': [rosen(m.x) for m in mins]+[rosen(res.x)]
}).set_index('Method')
min_df.sort_values(by = 'min_value' , inplace=True)
min_df
Out[14]:
We know that the global minima is (1,1,1,1) so we should not have sub-optimal solution requiring the constraints that the points must lie on the hyperplane identified by all vector components with same value, i.e. $x_1=x_2=x_3=x_4$
In [15]:
def my_rosen(x,out_value=10e10,approx = 0.1):
## no negative values
if np.any(x<0.0):
return out_value
## x1 = x2 = x3 = x4
if type(x) is list:
x = np.array(x)
if np.absolute(x[0] - x[1])<approx and np.absolute(x[0] - x[2])<approx and np.absolute(x[0] - x[3])<approx and np.absolute(x[1] - x[2])<approx and np.absolute(x[1] - x[3])<approx and np.absolute(x[2] - x[3])<approx:
rosen(x)
else:
return out_value
In [16]:
mins = [spo.minimize(my_rosen,[1.3, 0.7, 0.8, 1.9, 1.2],method=om) for om in opt_methods_no_Jacobian]
res = basinhopping(my_rosen, [1.3, 0.7, 0.8, 1.9, 1.2], minimizer_kwargs={"method": "BFGS"},niter=200)
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian+['Basin Hopping'],
'min' : [m.x for m in mins]+[res.x],
'min_value': [rosen(m.x) for m in mins]+[rosen(res.x)]
}).set_index('Method')
min_df.sort_values(by = 'min_value' , inplace=True)
min_df
Out[16]:
We can see that no one handled this constraint properly.
We need to change approach restricting the list of methods to the ones able to handle constraints and bounds.
In [17]:
opt_methods_no_Jacobian_bounds = ['SLSQP','L-BFGS-B','TNC']
## handling bounds for SLSQP, L-BFGS-B, TNC
bnds = ((0,1),(0,1),(0,1),(0,1),(0,1))
mins = [spo.minimize(rosen,[1.3, 0.7, 0.8, 1.9, 1.2],method=om,bounds=bnds) for om in opt_methods_no_Jacobian_bounds]
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian_bounds,
'min' : [m.x for m in mins],
'min_value': [rosen(m.x) for m in mins]
}).set_index('Method')
min_df.sort_values(by = 'min_value' , inplace=True)
min_df
Out[17]:
In [20]:
## handling constraints for SLSQP (the only that seems to support constraints)
cons = ({'type': 'eq','fun' : lambda x: np.array([x[0]- x[1]]),'jac' : lambda x: np.array([1,-1,0,0,0])},
{'type': 'eq','fun' : lambda x: np.array([x[0]- x[2]]),'jac' : lambda x: np.array([1,0,-1,0,0])},
{'type': 'eq','fun' : lambda x: np.array([x[0]- x[3]]),'jac' : lambda x: np.array([1,0,0,-1,0])},
{'type': 'eq','fun' : lambda x: np.array([x[0]- x[4]]),'jac' : lambda x: np.array([1,0,0,0,-1])},
{'type': 'eq','fun' : lambda x: np.array([x[1]- x[2]]),'jac' : lambda x: np.array([0,1,-1,0,0])},
{'type': 'eq','fun' : lambda x: np.array([x[1]- x[3]]),'jac' : lambda x: np.array([0,1,0,-1,0])},
{'type': 'eq','fun' : lambda x: np.array([x[1]- x[4]]),'jac' : lambda x: np.array([0,1,0,0,-1])},
{'type': 'eq','fun' : lambda x: np.array([x[2]- x[3]]),'jac' : lambda x: np.array([0,0,1,-1,0])},
{'type': 'eq','fun' : lambda x: np.array([x[2]- x[4]]),'jac' : lambda x: np.array([0,0,1,0,-1])},
{'type': 'eq','fun' : lambda x: np.array([x[3]- x[4]]),'jac' : lambda x: np.array([0,0,0,1,-1])})
bnds = ((0,1),(0,1),(0,1),(0,1),(0,1))
mins = spo.minimize(rosen,[0.5, 0.7, 0.8, 0.7, 1.2],method='SLSQP',bounds=bnds,constraints=cons)
min_df = pd.DataFrame({'Method' : 'SLSQP',
'min' : [mins.x] ,
'min_value': rosen(mins.x)
}).set_index('Method')
min_df
Out[20]:
This is a very interesting and disastrous results. The constraints leads us to a sub-optimal solution even if we know the optimal solution satisfies such contraint. Let's try to release the contrant transorming it formn equalities to 1 disequality
In [21]:
cons = ({'type': 'ineq','fun' : lambda x: np.array(4-np.sum(np.array(x))),'jac' : lambda x: np.array([-1,-1,-1,-1,-1])})
bnds = ((0,1),(0,1),(0,1),(0,1),(0,1))
mins = spo.minimize(rosen,[0.5, 0.7, 0.8, 0.7, 1.2],method='SLSQP',bounds=bnds,constraints=cons)
min_df = pd.DataFrame({'Method' : 'SLSQP',
'min' : [mins.x] ,
'min_value': rosen(mins.x)
}).set_index('Method')
min_df
Out[21]:
Better ... even if not the optimal solution (1,1,1,1,1). This is pretty important for the use case of optimizing a portfolio. We know that the sum of allocation percentages must be equal to 1, $\sum{AP_i}=1$ Hence, it seems better to express such constraint as $\sum{AP_i} \leq 1$
We can see anyway that such result is much better than the one we get injecting such constraint directly inside the cost function as it often happens in literature
In [22]:
def my_rosen(x,out_value=10e10,approx = 0.1):
## no negative values
if np.any(x<0.0):
return out_value
if type(x) is list:
x = np.array(x)
## x1 = x2 = x3 = x4
if np.absolute(np.sum(x)-4)<approx:
rosen(x)
else:
return out_value
mins = [spo.minimize(my_rosen,[1.3, 0.7, 0.8, 1.9, 1.2],method=om) for om in opt_methods_no_Jacobian]
res = basinhopping(my_rosen, [1.3, 0.7, 0.8, 1.9, 1.2], minimizer_kwargs={"method": "BFGS"},niter=200)
min_df = pd.DataFrame({'Method' : opt_methods_no_Jacobian+['Basin Hopping'],
'min' : [m.x for m in mins]+[res.x],
'min_value': [rosen(m.x) for m in mins]+[rosen(res.x)]
}).set_index('Method')
min_df.sort_values(by = 'min_value' , inplace=True)
min_df
Out[22]:
Finally, let's see how Basin Hopping behaves best working with bonds and constraints.
In [23]:
class MyBounds(object):
def __init__(self, xmax=[1.0,1.0,1.0,1.0,1.0], xmin=[0,0,0,0,0] ):
self.xmax = np.array(xmax)
self.xmin = np.array(xmin)
def __call__(self, **kwargs):
x = kwargs["x_new"]
tmax = bool(np.all(x <= self.xmax))
tmin = bool(np.all(x >= self.xmin))
cons = bool(np.sum(x) <= 4)
return tmax and tmin and cons
mybounds = MyBounds()
res = basinhopping(rosen, [1.3, 0.7, 0.8, 1.9, 1.2], minimizer_kwargs={"method": "BFGS"},
niter=200,accept_test=mybounds)
print(">>> basinhopping min: "+str(res.x)+" - min_value:"+str(rosen(res.x)))
In [25]:
import matplotlib.pyplot as plt
import pandas.io.data as web
def get_data(symbols,
add_ref=True,
data_source='yahoo',
price='Adj Close',
start='1/21/2010',
end='4/15/2016'):
"""Read stock data (adjusted close) for given symbols from."""
if add_ref and 'SPY' not in symbols: # add SPY for reference, if absent
symbols.insert(0, 'SPY')
df = web.DataReader(symbols,
data_source=data_source,
start=start,
end=end)
return df[price,:,:]
def compute_daily_returns(df):
"""Compute and return the daily return values."""
# Note: Returned DataFrame must have the same number of rows
daily_returns = (df / df.shift(1)) - 1
daily_returns.ix[0,:] = 0
return daily_returns
def fill_missing_values(df_data):
"""Fill missing values in data frame, in place."""
df_data.fillna(method='ffill',inplace=True)
df_data.fillna(method='backfill',inplace=True)
return df_data
def cumulative_returns(df):
return df/df.ix[0,:] - 1
df = fill_missing_values(get_data(symbols=['GOOG','SPY','IBM','GLD'],
start='4/21/2015',
end='7/15/2016'))
dr = compute_daily_returns(df)
In [26]:
def cumulative_returns_obj(alloc,df=df):
if type(alloc) is list:
alloc = np.array(alloc)
cr = cumulative_returns(df).ix[-1,:]
return -1 * np.dot(cr , alloc)
In [46]:
cons = ({'type': 'ineq','fun' : lambda x: np.array(1-np.sum(np.array(x))),'jac' : lambda x: np.array([-1,-1,-1,-1])})
bnds = ((0,1),(0,1),(0,1),(0,1))
mins = spo.minimize(cumulative_returns_obj,[0.25, 0.25, 0.25, 0.25],method='SLSQP',bounds=bnds,constraints=cons)
min_df = pd.DataFrame({'Asset' : df.columns,
'allocation' : mins.x ,
'portfolio cumul. ret': -1 * cumulative_returns_obj(mins.x)
}).set_index('Asset')
min_df
Out[46]:
According to this allocation all the money should be put on GOOG as it is expected as GOOG has the highest cumulative return.
In [34]:
cumulative_returns(df).ix[-1,:]
Out[34]:
As excercise, let's check if Basin Hopping gets the same result.
In [35]:
class MyBoundsP(object):
def __init__(self, xmax=[1,1,1,1], xmin=[0,0,0,0] ):
self.xmax = np.array(xmax)
self.xmin = np.array(xmin)
def __call__(self, **kwargs):
x = kwargs["x_new"]
tmax = bool(np.all(np.array(x) <= self.xmax))
tmin = bool(np.all(np.array(x) >= self.xmin))
conss = bool( np.sum(np.array(x)) <= 1)
return tmax and tmin and conss
In [36]:
myboundsp = MyBoundsP()
def print_fun(x, f, accepted):
print("at minimum %.4f accepted %d" % (f, int(accepted)))
res = basinhopping(cumulative_returns_obj, [.25,0.25,0.25,0.25], minimizer_kwargs={"method": "BFGS"},
niter=10,accept_test=myboundsp,callback=print_fun , T=-1.0, stepsize=-500000)
print(">>> basinhopping max: "+str(res.x)+" - max_value:"+str(-1 * cumulative_returns_obj(res.x)))
res
Out[36]:
This behaviour is very disappointing. It turns out that each iteration of basinhopping rejected the step but the botton line is a crazy portfolio allocation. In the following we discard this optimization method.
In [43]:
def average_daily_return_obj(alloc,dr=dr):
if type(alloc) is list:
alloc = np.array(alloc)
return -100 * np.dot(dr.mean(),alloc)
In [47]:
cons = ({'type': 'ineq','fun' : lambda x: np.array(1-np.sum(np.array(x))),'jac' : lambda x: np.array([-1,-1,-1,-1])})
bnds = ((0,1),(0,1),(0,1),(0,1))
mins = spo.minimize(average_daily_return_obj,[0.25, 0.25, 0.25, 0.25],method='SLSQP',bounds=bnds,constraints=cons)
min_df = pd.DataFrame({'Asset' : df.columns,
'allocation' : mins.x ,
'portfolio avg daily ret': -1 * average_daily_return_obj(mins.x)/100
}).set_index('Asset')
min_df
Out[47]:
According to this allocation all the money should be put on GOOG as it is expected as GOOG has the highest average daily return. Notice that we needed to multiply for 100 for handling numerical problems
In [50]:
dr.mean()
Out[50]:
In [51]:
def std_daily_return_obj(alloc,dr=dr):
if type(alloc) is list:
alloc = np.array(alloc)
return 10000000 * np.dot(dr.std(),alloc)
In [53]:
cons = ({'type': 'eq','fun' : lambda x: np.array(1-np.sum(np.array(x))),'jac' : lambda x: np.array([-1,-1,-1,-1])})
bnds = ((0,1),(0,1),(0,1),(0,1))
mins = spo.minimize(std_daily_return_obj,[0.25, 0.25, 0.25, 0.25],method='SLSQP',bounds=bnds,constraints=cons)
min_df = pd.DataFrame({'Asset' : df.columns,
'allocation' : mins.x ,
'portfolio risk': std_daily_return_obj(mins.x) / 10000000
}).set_index('Asset')
min_df
Out[53]:
In order to make the optmizer get the rigth solution we had to multiply
by 10000000, i.e. all the money on GLD.
Otherwise ~50% of the money should have gone on GLD and 50% on SPY. This is a sub-optimal solution
beacuse we know that GLD has the minimum risk.
In [54]:
dr.std()
Out[54]:
In [73]:
def sharpe_ratio_obj(alloc,dr=dr,sample_freq='d',risk_free_rate=0.0):
if type(alloc) is list:
alloc = np.array(alloc)
sr = ( np.sum(alloc * dr,axis=1) - risk_free_rate).mean() / np.sum(alloc * dr,axis=1).std()
if sample_freq == 'd':
sr = sr * np.sqrt(252)
elif sample_freq == 'w':
sr = sr * np.sqrt(52)
elif sample_freq == 'm':
sr = sr * np.sqrt(12)
else:
raise Exception('unkown sample frequency :'+str(sample_freq))
return -1*sr
In [76]:
cons = ({'type': 'ineq','fun' : lambda x: np.array(1-np.sum(np.array(x))),'jac' : lambda x: np.array([-1,-1,-1,-1])})
bnds = ((0,1),(0,1),(0,1),(0,1))
mins = spo.minimize(sharpe_ratio_obj,[0.25, 0.25, 0.25, 0.25],method='SLSQP',bounds=bnds,constraints=cons)
min_df = pd.DataFrame({'Asset' : df.columns,
'max_point' : mins.x,
'max_value': -1 * average_daily_return_obj(mins.x)
}).set_index('Asset')
min_df
Out[76]:
According to this allocation ~50% of the money should be put on GLD and 50% on GOOG