A simple introduction into Paramz based gradient based optimization of parameterized models.
Paramz is a python based parameterized modelling framework, that handles parameterization, printing, randomizing and many other parameter based operations to be done to a parameterized model.
In this example we will make use of the rosenbrock function of scipy. We will write a paramz model calling the scipy rosen function as an objective function and its gradients and use it to show the features of Paramz.
In [1]:
import paramz, numpy as np
from scipy.optimize import rosen_der, rosen
The starting position of the rosen function is set to be $$ x_0 = [-1,1] $$
In [2]:
x = np.array([-1,1])
For paramz to understand your model there is three steps involved:
Initialize your model using the __init__()
function. The init function contains a call to the super class to make sure paramz can setup the model structure. Then we setup the parameters contained for this model and lastly we tell the model that we have those parameters by linking them to self
.
In [3]:
class Rosen(paramz.Model): # Inherit from paramz.Model to ensure all model functionality.
def __init__(self, x, name='rosen'): # Initialize the Rosen model with a numpy array `x` and name `name`.
super(Rosen, self).__init__(name=name) # Call to super to make sure the structure is set up.
self.x = paramz.Param('position', x) # setup a Param object for the position parameter.
self.link_parameter(self.x) # Tell the model that the parameter `x` exists.
The class created above only holds the information about the parameters, we still have to implement the objective function to optimize over. For now the class can be instantiated but is not functional yet.
In [4]:
r = Rosen(x)
try:
print(r)
except NotImplementedError as e:
print(e)
The optimization of a gradient based mathematical model is based on an objective function to optimize over. The paramz
framework expects the objective_function
to be overwridden, returning the current objective of the model. It can make use of all parameters inside the model and you can rely on the parameters to be updated when the objective function is called. This function does not take any parameters.
In [5]:
class Rosen(paramz.Model):
def __init__(self, x, name='rosen'):
super(Rosen, self).__init__(name=name)
self.x = paramz.Param('position', x)
self.link_parameter(self.x)
def objective_function(self): # The function to overwrite for the framework to know about the objective to optimize
return rosen(self.x) # Call the rosenbrock function of scipy as objective function.
This model is now functional, except optimization. The gradients are not initialized and an optimization will stagnate, as there are no gradients to consider. The optimization of parameters requires the gradients of the parameters to be updated. For this, we provide an inversion of control based approach, in which to update parameters and set gradients of parameters. The gradients for parameters are saved in the gradient
of the parameter itself. The model handles the distribution and collection of correct gradients to the optimizer itself.
To implement the parameters_changed(self)
function we overwrite the function on the class. This function has the expensive bits of computation in it, as it is only being called if an update is absolutely necessary. We also compute the objective for the current parameter set and store it as a variable, so that a call to objective_function()
can be done in a lazy function and to prevent computational overhead:
In [6]:
class Rosen(paramz.Model):
def __init__(self, x, name='rosen'):
super(Rosen, self).__init__(name=name)
self.x = paramz.Param('position', x)
self.link_parameter(self.x)
def objective_function(self):
return self._obj
def parameters_changed(self): # Overwrite the parameters_changed function for model updates
self._obj = rosen(self.x) # Lazy evaluation of the rosen function only when there is an update
self.x.gradient[:] = rosen_der(self.x) # Compuataion and storing of the gradients for the position parameter
In [7]:
r = Rosen(x)
This rosen model is a fully working parameterized model for gradient based optimization of the rosen function of scipy.
In [8]:
print(r)
Or use the notebook representation:
In [9]:
r
Out[9]:
Note the model just printing the shape (in the value
column) of the parameters, as parameters can be any sized arrays or matrices (with arbitrary numbers of dimensions).
We can print the actual values of the parameters directly, either by programmatically assigned variable
In [10]:
r.x
Out[10]:
Or by name:
In [11]:
r.position
Out[11]:
We can redefine the name freely, as long as it does not exist already:
In [12]:
r.x.name = 'pos'
In [13]:
r
Out[13]:
Now r.position
will not be accessible anymore!
In [14]:
try:
r.position
except AttributeError as v:
print("Attribute Error: " + str(v))
Param
objects represent the parameters for the model. We told the model in the initialization that the position parameter (re-)named pos
is a parameter of the model. Thus the model will listen to changes of the parameter values and update on any changes. We will set one element of the parameter and see what happens to the model:
In [15]:
print("Objective before change: {}".format(r._obj))
r.x[0] = 1
print("Objective after change: {}".format(r._obj))
Note that we never actually told the model to update. It listened to changes to any of its parameters and updated accordingly. This update chain is based on the hierarchy of the model structure. Specific values of parameters can be accessed through indexing, just like indexing numpy arrays. In fact Param
is a derivative of ndarray
and inherits all its traits. Thus, Param
can be used in any calculation involved with numpy. Importantly, when using a Param
parameter inside a computation, it will be returning a normal numpy array. This prevents unwanted side effects and pointer errors.
In [18]:
2 * r.x
Out[18]:
The optimization routine for the model can be accessed by the optimize()
function. A call to optimize will setup the optimizer, do the iteration through getting and setting the parameters in an optimal 'in memory' fashion. By supplying messages=1
as an optional parameter we can print the progress of the optimization itself.
In [20]:
r.x[:] = [100,5] # Set to a difficult starting position to show the messages of the optimization.
In [23]:
r.optimize(messages=1) # Call the optimization and show the progress.
Out[23]:
To show the values of the positions itself, we directly print the Param
object:
In [25]:
r.x
Out[25]:
We could also randomize the model by using the convenience function randomize()
, on the part we want to randomize. It can be any part of the model, also the whole model can be randomized:
In [34]:
np.random.seed(100)
r.randomize()
In [35]:
r.x
Out[35]:
In [36]:
r.x.randomize()
In [37]:
r.x
Out[37]:
Importantly when implementing gradient based optimization is to make sure, that the gradients implemented match the numerical gradients of the objective function. This can be achieved using the checkgrad()
function in paramz. It does a triangle numerical gradient estimate around the current position of the parameter. The verbosity of the gradient checker can be adjusted using the verbose
option. If verbose
is False
, only one bool
will be returned, specifying whether the gradients check the numerical gradients or not. The option of verbose=True
returns a full list of every parameter, checking each parameter individually. This can be called on each subpart of the model again.
Here we can either directly call it on the parameter:
In [43]:
r.x.checkgrad(verbose=1)
Out[43]:
Or on the whole model (verbose or not):
In [44]:
r.checkgrad()
Out[44]:
In [45]:
r.checkgrad(verbose=1)
Out[45]:
Or on individual parameters, note that numpy indexing is used:
In [46]:
r.x[[0]].checkgrad(verbose=1)
Out[46]:
In many optimization scenarios it is necessary to constrain parameters to only take on certain ranges of values, may it be bounded in a region (between two numbers), fixed or constrained to only be positive or negative numbers. This can be achieved in paramz by applying a transformation to a parameter. For convenience the most common constraints are placed in specific functions, found by r.constrain_<tab>
:
Each parameter can be constrain individually, by subindexing the Param
object or Parameterized
objects as a whole. Note that indexing functions like numpy indexing, so we need to make sure to keep the array structure when indexing singular elements. Next we bound $x_0$ to be constrained between $-10$ and $-1$ and $x_1$ to be constrained to only positive values:
In [47]:
r.x[[0]].constrain_bounded(-10,-1)
In [48]:
r.x[[1]].constrain_positive()
The printing will contain the constraints, either directly on the object, or it lists the constraints contained within a parameter. If a parameter has multiple constraints spread across the Param
object all constraints contained in the whole Param
object are indicated with {<partial constraint>}
:
In [49]:
r
Out[49]:
To show the individual constraints, we look at the Param
object of interest directly:
In [50]:
r.x
Out[50]:
The constraints (and other indexed properties) are held by each parameter as a dictionary with the name. For example the constraints are held in a constraints dictionary, where the keys are the constraints, and the values are the indices this constraint refers to. You can either ask for the constraints of the whole model:
In [53]:
list(r.constraints.items())
Out[53]:
Or the constraints of individual Parameterized
objects:
In [55]:
list(r.x.constraints.items())
Out[55]:
The constraints of subparts of the model are only views into the actual constaints held by the root of the model hierarchy.
The hierarchy of a Paramz model is a tree, where the nodes of the tree are Parameterized
objects and the leaves are Param
objects. The Model
class is Parameterized
itself and, thus can serve as a child itself. This opens the possibility for combining models together in a bigger model. As a simple example, we will just add two rosen models together into a single model:
In [59]:
class DoubleRosen(paramz.Model):
def __init__(self, x1, x2, name='silly_double'):
super(DoubleRosen, self).__init__(name=name) # Call super to initiate the structure of the model
self.r1 = Rosen(x1) # Instantiate the underlying Rosen classes
self.r2 = Rosen(x2)
# Tell this model, which parameters it has. Models are just the same as parameters:
self.link_parameters(self.r1, self.r2)
def objective_function(self):
return self._obj # Lazy evaluation of the objective
def parameters_changed(self):
self._obj = self.r1._obj + self.r2._obj # Just add both objectives together to optimize both models.
The keen eyed will have noticed, that we did not set any gradients in the above definition. That is because the underlying rosen models handle their gradients directly!
In [62]:
dr = DoubleRosen(np.random.normal(size=2), np.random.normal(size=2))
All options listed above are availible for this model now. No additional steps need to be taken!
In [63]:
dr.checkgrad(verbose=1)
Out[63]:
To show the different ways of how constraints are displayed, we constrain different parts of the model and fix parts of it too:
In [66]:
dr.r1.constrain_negative()
dr.r1.x[[0]].fix()
dr.r2.x[[1]].constrain_bounded(-30, 5)
dr.r2.x[[0]].constrain_positive()
In [67]:
dr
Out[67]:
First, we can see, that because two models with the same name were added to dr
, the framework renamed the second model to have a unique name. This only happens when two childs of one parameter share the same name. If the two childs not under the same parameter share names, it is just fine, as you can see in the name of x in both models: position
.
Second, the constraints are displayed in curly brackets {}
if they do not span all underlying parameters. If a constraint, however, spans all parameters, it is shown without curly brackets, such as -ve
for the first rosen model.
We can now just like before perform all actions paramz support on this model, as well as on sub models. For example we can check the gradients of only one part of the model:
In [69]:
dr.r2.checkgrad(verbose=1)
Out[69]:
Or print only one model:
In [70]:
dr.r1
Out[70]:
We can showcase that constraints are mapped to each parameter directly. We can either access the constraints of the whole model directly:
In [71]:
print(dr.constraints)
Or for parameters directly:
In [72]:
print(dr.r2.constraints)
Note, that the constraints are remapped to directly index the parameters locally. This directly leeds up to the in memory handling of parameters. The root node of the hierarchy holds one parameter array param_array
comprising all parameters. The same goes for the gradient gradient
:
In [73]:
dr.param_array
Out[73]:
Each child parameter (and subsequent parameters) have their own view into the memory of the root node:
In [75]:
dr.r2.param_array
Out[75]:
When changing the param_array
of a parameter it directly edits the memory of the root node. This is a big part of the optimization of paramz, as getting and setting parameters works directly in memory and does not need any python routines (such as loops or traversal) functionality.
The constraints as described above, directly index the param_array
of their Parameterized
or Param
object. That is why the remapping exists.
This param_array
has its counterpart for the optimizer, which holds the remapped parameters by the constraints. The constraints are transformation mappings, which transform model parameters param_array
into optimizer parameters optimizer_array
. This optimizer array is presented to the optimizer and the constraints framework handles the mapping directly.
In [76]:
print(dr.param_array)
print(dr.optimizer_array)
Note, that the optimizer array does only contain three values. This is because the first element of the the first rosen model is fixed and is not presented to the optimizer. The transformed gradients can be computed by the root node directly:
In [77]:
dr._transform_gradients(dr.gradient)
Out[77]:
These gradients and parameters are presented to the optimizer and mapped back into model parameters after each iteration.
For more details into the paramz framework there is the API documentation hosted here. The source code can be found on github, where also discussions and other issues are discussed in the github issues.