Store and load skopt optimization results

Mikhail Pak, October 2016.


In [1]:
import numpy as np
np.random.seed(777)

Problem statement

We often want to store optimization results in a file. This can be useful, for example,

  • if you want to share your results with colleagues;
  • if you want to archive and/or document your work;
  • or if you want to postprocess your results in a different Python instance or on an another computer.

The process of converting an object into a byte stream that can be stored in a file is called serialization. Conversely, deserialization means loading an object from a byte stream.

Warning: Deserialization is not secure against malicious or erroneous code. Never load serialized data from untrusted or unauthenticated sources!

Simple example

We will use the same optimization problem as in the bayesian-optimization.ipynb notebook:


In [2]:
from skopt import gp_minimize

noise_level = 0.1

def obj_fun(x, noise_level=noise_level):
    return np.sin(5 * x[0]) * (1 - np.tanh(x[0] ** 2)) + np.random.randn() * noise_level

res = gp_minimize(obj_fun,            # the function to minimize
                  [(-2.0, 2.0)],      # the bounds on each dimension of x
                  x0=[0.],            # the starting point
                  acq_func="LCB",     # the acquisition function (optional)
                  n_calls=15,         # the number of evaluations of f including at x0
                  n_random_starts=0,  # the number of random initialization points
                  random_state=777)

As long as your Python session is active, you can access all the optimization results via the res object.

So how can you store this data in a file? skopt conveniently provides functions skopt.dump() and skopt.load() that handle this for you. These functions are essentially thin wrappers around the joblib module's dump() and load().

We will now show how to use skopt.dump() and skopt.load() for storing and loading results.

Using skopt.dump() and skopt.load()

For storing optimization results into a file, call the skopt.dump() function:


In [3]:
from skopt import dump, load

dump(res, 'result.pkl')

And load from file using skopt.load():


In [4]:
res_loaded = load('result.pkl')

res_loaded.fun


Out[4]:
-0.17493386623950199

You can fine-tune the serialization and deserialization process by calling skopt.dump() and skopt.load() with additional keyword arguments. See the joblib documentation (dump and load) for the additional parameters.

For instance, you can specify the compression algorithm and compression level (highest in this case):


In [5]:
dump(res, 'result.gz', compress=9)

from os.path import getsize
print('Without compression: {} bytes'.format(getsize('result.pkl')))
print('Compressed with gz:  {} bytes'.format(getsize('result.gz')))


Without compression: 66278 bytes
Compressed with gz:  17623 bytes

Unserializable objective functions

Notice that if your objective function is non-trivial (e.g. it calls MATLAB engine from Python), it might be not serializable and skopt.dump() will raise an exception when you try to store the optimization results. In this case you should disable storing the objective function by calling skopt.dump() with the keyword argument store_objective=False:


In [6]:
dump(res, 'result_without_objective.pkl', store_objective=False)

Notice that the entry 'func' is absent in the loaded object but is still present in the local variable:


In [7]:
res_loaded_without_objective = load('result_without_objective.pkl')

print('Loaded object: ', res_loaded_without_objective.specs['args'].keys())
print('Local variable:', res.specs['args'].keys())


Loaded object:  dict_keys(['verbose', 'base_estimator', 'dimensions', 'n_random_starts', 'n_calls', 'x0', 'n_points', 'callback', 'acq_optimizer', 'n_restarts_optimizer', 'kappa', 'acq_func', 'xi', 'random_state', 'y0'])
Local variable: dict_keys(['dimensions', 'n_random_starts', 'n_calls', 'n_restarts_optimizer', 'kappa', 'func', 'random_state', 'callback', 'verbose', 'x0', 'n_points', 'acq_optimizer', 'base_estimator', 'acq_func', 'xi', 'y0'])

Possible problems

  • Python versions incompatibility: In general, objects serialized in Python 2 cannot be deserialized in Python 3 and vice versa.
  • Security issues: Once again, do not load any files from untrusted sources.
  • Extremely large results objects: If your optimization results object is extremely large, calling skopt.dump() with store_objective=False might cause performance issues. This is due to creation of a deep copy without the objective function. If the objective function it is not critical to you, you can simply delete it before calling skopt.dump(). In this case, no deep copy is created:

In [8]:
del res.specs['args']['func']

dump(res, 'result_without_objective_2.pkl')