Paramz Tutorial

A simple introduction into Paramz based gradient based optimization of parameterized models.

Paramz is a python based parameterized modelling framework, that handles parameterization, printing, randomizing and many other parameter based operations to be done to a parameterized model.

In this example we will make use of the rosenbrock function of scipy. We will write a paramz model calling the scipy rosen function as an objective function and its gradients and use it to show the features of Paramz.



In [1]:

    
import paramz, numpy as np
from scipy.optimize import rosen_der, rosen

The starting position of the rosen function is set to be $$ x_0 = [-1,1] $$



In [2]:

    
x = np.array([-1,1])

For paramz to understand your model there is three steps involved:

Step One: Initialization of the Model

Initialize your model using the __init__() function. The init function contains a call to the super class to make sure paramz can setup the model structure. Then we setup the parameters contained for this model and lastly we tell the model that we have those parameters by linking them to self.



In [3]:

    
class Rosen(paramz.Model): # Inherit from paramz.Model to ensure all model functionality.
    def __init__(self, x, name='rosen'): # Initialize the Rosen model with a numpy array `x` and name `name`.
        super(Rosen, self).__init__(name=name) # Call to super to make sure the structure is set up.
        self.x = paramz.Param('position', x) # setup a Param object for the position parameter.
        self.link_parameter(self.x) # Tell the model that the parameter `x` exists.

The class created above only holds the information about the parameters, we still have to implement the objective function to optimize over. For now the class can be instantiated but is not functional yet.



In [4]:

    
r = Rosen(x)
try:
    print(r)
except NotImplementedError as e:
    print(e)









    



Implement the result of the objective function here

Step Two: Adding the Objective Function

The optimization of a gradient based mathematical model is based on an objective function to optimize over. The paramz framework expects the objective_function to be overwridden, returning the current objective of the model. It can make use of all parameters inside the model and you can rely on the parameters to be updated when the objective function is called. This function does not take any parameters.



In [5]:

    
class Rosen(paramz.Model):
    def __init__(self, x, name='rosen'):
        super(Rosen, self).__init__(name=name)
        self.x = paramz.Param('position', x)
        self.link_parameter(self.x)
    def objective_function(self): # The function to overwrite for the framework to know about the objective to optimize
        return rosen(self.x) # Call the rosenbrock function of scipy as objective function.

Step Three: Adding Update Routine for Parameter Changes

This model is now functional, except optimization. The gradients are not initialized and an optimization will stagnate, as there are no gradients to consider. The optimization of parameters requires the gradients of the parameters to be updated. For this, we provide an inversion of control based approach, in which to update parameters and set gradients of parameters. The gradients for parameters are saved in the gradient of the parameter itself. The model handles the distribution and collection of correct gradients to the optimizer itself.

To implement the parameters_changed(self) function we overwrite the function on the class. This function has the expensive bits of computation in it, as it is only being called if an update is absolutely necessary. We also compute the objective for the current parameter set and store it as a variable, so that a call to objective_function() can be done in a lazy function and to prevent computational overhead:



In [6]:

    
class Rosen(paramz.Model):
    def __init__(self, x, name='rosen'):
        super(Rosen, self).__init__(name=name)
        self.x = paramz.Param('position', x)
        self.link_parameter(self.x)
    def objective_function(self):
        return self._obj
    def parameters_changed(self): # Overwrite the parameters_changed function for model updates
        self._obj = rosen(self.x) # Lazy evaluation of the rosen function only when there is an update
        self.x.gradient[:] = rosen_der(self.x) # Compuataion and storing of the gradients for the position parameter

Model Usage

Having implemented a paramz model with its necessary functions, the whole set of functionality of paramz is available for us. We will instantiate a rosen model class to play around with.



In [7]:

    
r = Rosen(x)

This rosen model is a fully working parameterized model for gradient based optimization of the rosen function of scipy.

Printing and Naming

All Parameterized and Param objects are named and can be accessed by name. This ensures a cleaner model creation and printing, when big models are created. In our simple example we only have a position and the model name itself: rosen.



In [8]:

    
print(r)









    



Name : rosen
Objective : 4.0
Number of Parameters : 2
Number of Optimization Parameters : 2
Updates : True
Parameters:
  rosen.    |  value  |  constraints
  position  |   (2,)  |

Or use the notebook representation:



In [9]:

    
r









    Out[9]:







Model: rosen

Objective: 4.0

Number of Parameters: 2

Number of Optimization Parameters: 2

Updates: True



  rosen.  value constraints
  position  (2,)

Note the model just printing the shape (in the value column) of the parameters, as parameters can be any sized arrays or matrices (with arbitrary numbers of dimensions).

We can print the actual values of the parameters directly, either by programmatically assigned variable



In [10]:

    
r.x









    Out[10]:








  index
  rosen.position
  constraints

  [0]     -1.00000000            
  [1]      1.00000000

Or by name:



In [11]:

    
r.position









    Out[11]:








  index
  rosen.position
  constraints

  [0]     -1.00000000            
  [1]      1.00000000

We can redefine the name freely, as long as it does not exist already:



In [12]:

    
r.x.name = 'pos'



In [13]:

    
r









    Out[13]:







Model: rosen

Objective: 4.0

Number of Parameters: 2

Number of Optimization Parameters: 2

Updates: True



  rosen. value constraints
  pos    (2,)

Now r.position will not be accessible anymore!



In [14]:

    
try:
    r.position
except AttributeError as v:
    print("Attribute Error: " + str(v))









    



Attribute Error: 'Rosen' object has no attribute 'position'

Setting Parameters and Automated Updates

Param objects represent the parameters for the model. We told the model in the initialization that the position parameter (re-)named pos is a parameter of the model. Thus the model will listen to changes of the parameter values and update on any changes. We will set one element of the parameter and see what happens to the model:



In [15]:

    
print("Objective before change: {}".format(r._obj))
r.x[0] = 1
print("Objective after change: {}".format(r._obj))









    



Objective before change: 4.0
Objective after change: 0.0

Note that we never actually told the model to update. It listened to changes to any of its parameters and updated accordingly. This update chain is based on the hierarchy of the model structure. Specific values of parameters can be accessed through indexing, just like indexing numpy arrays. In fact Param is a derivative of ndarray and inherits all its traits. Thus, Param can be used in any calculation involved with numpy. Importantly, when using a Param parameter inside a computation, it will be returning a normal numpy array. This prevents unwanted side effects and pointer errors.



In [18]:

    
2 * r.x









    Out[18]:





array([ 2.,  2.])

Optimization

The optimization routine for the model can be accessed by the optimize() function. A call to optimize will setup the optimizer, do the iteration through getting and setting the parameters in an optimal 'in memory' fashion. By supplying messages=1 as an optional parameter we can print the progress of the optimization itself.



In [20]:

    
r.x[:] = [100,5] # Set to a difficult starting position to show the messages of the optimization.



In [23]:

    
r.optimize(messages=1) # Call the optimization and show the progress.









    





 
 










    Out[23]:





<paramz.optimization.optimization.opt_lbfgsb at 0x110417d68>

To show the values of the positions itself, we directly print the Param object:



In [25]:

    
r.x









    Out[25]:








  index
  rosen.pos
  constraints

  [0]  1.00000003            
  [1]  1.00000006

We could also randomize the model by using the convenience function randomize(), on the part we want to randomize. It can be any part of the model, also the whole model can be randomized:



In [34]:

    
np.random.seed(100)
r.randomize()



In [35]:

    
r.x









    Out[35]:








  index
  rosen.pos
  constraints

  [0]  -1.74976547            
  [1]  0.34268040



In [36]:

    
r.x.randomize()



In [37]:

    
r.x









    Out[37]:








  index
  rosen.pos
  constraints

  [0]   1.15303580            
  [1]  -0.25243604

Gradient Checking

Importantly when implementing gradient based optimization is to make sure, that the gradients implemented match the numerical gradients of the objective function. This can be achieved using the checkgrad() function in paramz. It does a triangle numerical gradient estimate around the current position of the parameter. The verbosity of the gradient checker can be adjusted using the verbose option. If verbose is False, only one bool will be returned, specifying whether the gradients check the numerical gradients or not. The option of verbose=True returns a full list of every parameter, checking each parameter individually. This can be called on each subpart of the model again.

Here we can either directly call it on the parameter:



In [43]:

    
r.x.checkgrad(verbose=1)









    



       Name        |     Ratio     |  Difference   |  Analytical   |   Numerical   |   dF_ratio    
---------------------------------------------------------------------------------------------------
 rosen.pos[[0]]    |   1.000000    |   0.000000    |  729.913735   |  729.913735   |     6e-06     
 rosen.pos[[1]]    |   1.000000    |   0.000000    |  -316.385520  |  -316.385520  |     3e-06     






    Out[43]:





True

Or on the whole model (verbose or not):



In [44]:

    
r.checkgrad()









    Out[44]:





True



In [45]:

    
r.checkgrad(verbose=1)









    



       Name        |     Ratio     |  Difference   |  Analytical   |   Numerical   |   dF_ratio    
---------------------------------------------------------------------------------------------------
 rosen.pos[[0]]    |   1.000000    |   0.000000    |  729.913735   |  729.913735   |     6e-06     
 rosen.pos[[1]]    |   1.000000    |   0.000000    |  -316.385520  |  -316.385520  |     3e-06     






    Out[45]:





True

Or on individual parameters, note that numpy indexing is used:



In [46]:

    
r.x[[0]].checkgrad(verbose=1)









    



       Name        |     Ratio     |  Difference   |  Analytical   |   Numerical   |   dF_ratio    
---------------------------------------------------------------------------------------------------
 rosen.pos[[0]]    |   1.000000    |   0.000000    |  729.913735   |  729.913735   |     6e-06     






    Out[46]:





True

Constraining Parameter Spaces

In many optimization scenarios it is necessary to constrain parameters to only take on certain ranges of values, may it be bounded in a region (between two numbers), fixed or constrained to only be positive or negative numbers. This can be achieved in paramz by applying a transformation to a parameter. For convenience the most common constraints are placed in specific functions, found by r.constrain_<tab>:

Each parameter can be constrain individually, by subindexing the Param object or Parameterized objects as a whole. Note that indexing functions like numpy indexing, so we need to make sure to keep the array structure when indexing singular elements. Next we bound $x_0$ to be constrained between $-10$ and $-1$ and $x_1$ to be constrained to only positive values:



In [47]:

    
r.x[[0]].constrain_bounded(-10,-1)









    



Warning: changing parameters to satisfy constraints



In [48]:

    
r.x[[1]].constrain_positive()









    



Warning: changing parameters to satisfy constraints

The printing will contain the constraints, either directly on the object, or it lists the constraints contained within a parameter. If a parameter has multiple constraints spread across the Param object all constraints contained in the whole Param object are indicated with {<partial constraint>}:



In [49]:

    
r









    Out[49]:







Model: rosen

Objective: 250.27291265120238

Number of Parameters: 2

Number of Optimization Parameters: 2

Updates: True



  rosen. value    constraints    
  pos    (2,) {-10.0,-1.0} {+ve}

To show the individual constraints, we look at the Param object of interest directly:



In [50]:

    
r.x









    Out[50]:








  index
  rosen.pos
  constraints

  [0]   1.15303580 -10.0,-1.0 
  [1]  -0.25243604     +ve

The constraints (and other indexed properties) are held by each parameter as a dictionary with the name. For example the constraints are held in a constraints dictionary, where the keys are the constraints, and the values are the indices this constraint refers to. You can either ask for the constraints of the whole model:



In [53]:

    
list(r.constraints.items())









    Out[53]:





[(Logexp, array([1])), (Logistic, array([0]))]

Or the constraints of individual Parameterized objects:



In [55]:

    
list(r.x.constraints.items())









    Out[55]:





[(Logexp, array([1])), (Logistic, array([0]))]

The constraints of subparts of the model are only views into the actual constaints held by the root of the model hierarchy.

Models Inside Models

The hierarchy of a Paramz model is a tree, where the nodes of the tree are Parameterized objects and the leaves are Param objects. The Model class is Parameterized itself and, thus can serve as a child itself. This opens the possibility for combining models together in a bigger model. As a simple example, we will just add two rosen models together into a single model:



In [59]:

    
class DoubleRosen(paramz.Model):
    def __init__(self, x1, x2, name='silly_double'):
        super(DoubleRosen, self).__init__(name=name) # Call super to initiate the structure of the model
        self.r1 = Rosen(x1) # Instantiate the underlying Rosen classes
        self.r2 = Rosen(x2)
        # Tell this model, which parameters it has. Models are just the same as parameters:
        self.link_parameters(self.r1, self.r2) 
    def objective_function(self):
        return self._obj # Lazy evaluation of the objective
    def parameters_changed(self):
        self._obj = self.r1._obj + self.r2._obj # Just add both objectives together to optimize both models.

The keen eyed will have noticed, that we did not set any gradients in the above definition. That is because the underlying rosen models handle their gradients directly!



In [62]:

    
dr = DoubleRosen(np.random.normal(size=2), np.random.normal(size=2))

All options listed above are availible for this model now. No additional steps need to be taken!



In [63]:

    
dr.checkgrad(verbose=1)









    



                 Name                  |     Ratio     |  Difference   |  Analytical   |   Numerical   |   dF_ratio    
-----------------------------------------------------------------------------------------------------------------------
 silly_double.rosen.position[[0]]      |   1.000000    |   0.000000    |  108.010909   |  108.010909   |     4e-06     
 silly_double.rosen.position[[1]]      |   1.000000    |   0.000000    |   95.252778   |   95.252778   |     3e-06     
 silly_double.rosen_1.position[[0]]    |   1.000000    |   0.000000    |  149.218065   |  149.218065   |     5e-06     
 silly_double.rosen_1.position[[1]]    |   1.000000    |   0.000000    |  -111.392885  |  -111.392885  |     4e-06     






    Out[63]:





True

To show the different ways of how constraints are displayed, we constrain different parts of the model and fix parts of it too:



In [66]:

    
dr.r1.constrain_negative()
dr.r1.x[[0]].fix()
dr.r2.x[[1]].constrain_bounded(-30, 5)
dr.r2.x[[0]].constrain_positive()









    



Warning: changing parameters to satisfy constraints
WARNING: reconstraining parameters silly_double.rosen
WARNING: reconstraining parameters silly_double.rosen_1.position



In [67]:

    
dr









    Out[67]:







Model: silly_double

Objective: 167.60030150881036

Number of Parameters: 4

Number of Optimization Parameters: 3

Updates: True



  silly_double.   value    constraints   
  rosen.position   (2,)    -ve {fixed}   
  rosen_1.position  (2,) {-30.0,5.0} {+ve}

First, we can see, that because two models with the same name were added to dr, the framework renamed the second model to have a unique name. This only happens when two childs of one parameter share the same name. If the two childs not under the same parameter share names, it is just fine, as you can see in the name of x in both models: position.

Second, the constraints are displayed in curly brackets {} if they do not span all underlying parameters. If a constraint, however, spans all parameters, it is shown without curly brackets, such as -ve for the first rosen model.

We can now just like before perform all actions paramz support on this model, as well as on sub models. For example we can check the gradients of only one part of the model:



In [69]:

    
dr.r2.checkgrad(verbose=1)









    



                 Name                  |     Ratio     |  Difference   |  Analytical   |   Numerical   |   dF_ratio    
-----------------------------------------------------------------------------------------------------------------------
 silly_double.rosen_1.position[[0]]    |   1.000000    |   0.000000    |   73.069369   |   73.069369   |     9e-07     
 silly_double.rosen_1.position[[1]]    |   1.000000    |   0.000000    |  -485.670996  |  -485.670996  |     6e-06     






    Out[69]:





True

Or print only one model:



In [70]:

    
dr.r1









    Out[70]:







Model: rosen

Objective: 136.4722526743063

Number of Parameters: 2

Number of Optimization Parameters: 1

Updates: True



  rosen.  value constraints
  position  (2,) -ve {fixed}

We can showcase that constraints are mapped to each parameter directly. We can either access the constraints of the whole model directly:



In [71]:

    
print(dr.constraints)









    



{Logexp: array([2]),
 Logistic: array([3]),
 NegativeLogexp: array([0, 1]),
 'fixed': array([0])}

Or for parameters directly:



In [72]:

    
print(dr.r2.constraints)









    



{Logexp: array([0]), Logistic: array([1])}

Note, that the constraints are remapped to directly index the parameters locally. This directly leeds up to the in memory handling of parameters. The root node of the hierarchy holds one parameter array param_array comprising all parameters. The same goes for the gradient gradient:



In [73]:

    
dr.param_array









    Out[73]:





array([-0.58359505, -0.81684707,  0.67272081, -0.10441114])

Each child parameter (and subsequent parameters) have their own view into the memory of the root node:



In [75]:

    
dr.r2.param_array









    Out[75]:





array([ 0.67272081, -0.10441114])

When changing the param_array of a parameter it directly edits the memory of the root node. This is a big part of the optimization of paramz, as getting and setting parameters works directly in memory and does not need any python routines (such as loops or traversal) functionality.

The constraints as described above, directly index the param_array of their Parameterized or Param object. That is why the remapping exists.

This param_array has its counterpart for the optimizer, which holds the remapped parameters by the constraints. The constraints are transformation mappings, which transform model parameters param_array into optimizer parameters optimizer_array. This optimizer array is presented to the optimizer and the constraints framework handles the mapping directly.



In [76]:

    
print(dr.param_array)
print(dr.optimizer_array)









    



[-0.58359505 -0.81684707  0.67272081 -0.10441114]
[ 0.23376881 -0.0412787   1.76760584]

Note, that the optimizer array does only contain three values. This is because the first element of the the first rosen model is fixed and is not presented to the optimizer. The transformed gradients can be computed by the root node directly:



In [77]:

    
dr._transform_gradients(dr.gradient)









    Out[77]:





array([-29910.38352466,     73.06936887,   -485.6709962 ])

These gradients and parameters are presented to the optimizer and mapped back into model parameters after each iteration.

silly_double.	value	constraints
rosen.position	(2,)	-ve {fixed}
rosen_1.position	(2,)	{-30.0,5.0} {+ve}