Optimizing a function with probability simplex constraints

This notebook arose in response to a question on StackOverflow about how to optimize a function with probability simplex constraints in python (see http://stackoverflow.com/questions/32252853/optimization-with-python-scipy-optimize). This is a topic I've thought about a lot for our paper on optimal immune repertoires so I was interested to see what other people had to say about it.

Problem statement

For a given $\boldsymbol y$ and $\gamma$ find the $\boldsymbol x^\star$ that maximizes the following expression over the probability simplex:

$$\max_{x_i \geq 0, \, \sum_i x_i = 1} \left[\sum_i \left(\frac{x_i}{y_i}\right)^\gamma\right]^{1/\gamma}$$

Solution using scipy.optimize's SLSQP algorithm (user58925)


In [135]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from scipy.optimize import minimize

In [136]:
def objective_function(x, y, gamma=0.2):
    return -((x/y)**gamma).sum()**(1.0/gamma)

cons = ({'type': 'eq', 'fun': lambda x: np.array([sum(x) - 1])})

y = np.array([0.5, 0.3, 0.2])
initial_x = np.array([0.2, 0.3, 0.5])

opt = minimize(objective_function, initial_x, args=(y,), method='SLSQP',
               constraints=cons, bounds=[(0, 1)] * len(initial_x))
opt


/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:2: RuntimeWarning: invalid value encountered in power
  from ipykernel import kernelapp as app
Out[136]:
  status: 0
 success: True
    njev: 7
    nfev: 38
     fun: -265.27701765861417
       x: array([ 0.29466743,  0.33480631,  0.37052625])
 message: 'Optimization terminated successfully.'
     jac: array([-265.27723694, -265.27757263, -265.27632904,    0.        ])
     nit: 7

Works on my machine (the poster on StackOverflow reported issues with this) and actually requires a surprisingly small number of function evaluations.

Alternative solution using Nelder-Mead on transformed variables (CT Zhu)


In [137]:
def trans_x(x):
    x1 = x**2/(1.0+x**2)
    z  = np.hstack((x1, 1-sum(x1)))
    return z

def F(x, y, gamma=0.2):
    z = trans_x(x)
    return -(((z/y)**gamma).sum())**(1./gamma)

In [142]:
opt = minimize(F, np.array([1., 1.]), args=(np.array(y),),
               method='Nelder-Mead')
trans_x(opt.x), opt


/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:8: RuntimeWarning: invalid value encountered in power
Out[142]:
(array([ 0.29467183,  0.33481782,  0.37051034]),
   status: 0
    nfev: 73
 success: True
     fun: -265.27701755536299
       x: array([ 0.64635885,  0.70946991])
 message: 'Optimization terminated successfully.'
     nit: 39)

Works but needs a slightly higher number of function evaluations for convergence.


In [144]:
opt = minimize(F, np.array([0., 1.]), args=(np.array([0.2, 0.1, 0.8]), 2.0),
               method='Nelder-Mead')
trans_x(opt.x), opt


Out[144]:
(array([ 1.,  1., -1.]),
   status: 0
    nfev: 299
 success: True
     fun: -11.249999999999998
       x: array([ -3.10743912e+07,   3.96029532e+10])
 message: 'Optimization terminated successfully.'
     nit: 103)

In general though this method can fail, as it does not enforce the non-negativity constraint on the third variable.

Analytical solution

It turns our the problem is solvable analytically. One can start by writing down the Lagrangian of the (equality constrained) optimization problem:

$$L = \sum_i (x_i/y_i)^\gamma - \lambda \left(\sum x_i - 1\right)$$

The optimal solution is found by setting the first derivative of this Lagrangian to zero:

$$0 = \partial L / \partial x_i = \gamma x_i^{(\gamma-1)/\gamma_i} - \lambda$$$$\Rightarrow x_i \propto y_i^{\gamma/(\gamma - 1)}$$

Using this insight the optimization problem can be solved simply and efficiently:


In [140]:
def analytical(y, gamma=0.2):
    x = y**(gamma/(gamma-1.0))
    x /= np.sum(x)
    return x
xanalytical = analytical(np.array(y))
xanalytical, objective_function(xanalytical, np.array(y))


Out[140]:
(array([ 0.29466774,  0.33480719,  0.37052507]), -265.27701765929692)

Solution using projected gradient algorithm

This problem can also be solved using a projected gradient algorithm, but this will be for another time.


In [ ]: