Introduction:

This material has been used in the past to teach colleagues in our group how to use persistable.

The persistable package provides a general loggable superclass that provides Python users a simple way to persist load calculations and track corresponding calculation parameters.

Inheriting from Persistable automatically spools a logger and appends the PersistLoad object for easy and reproducible data persistance with loading, with parameter tracking. The PersistLoad object is based on setting a workingdatadir within which all persisted data is saved and logs are stored. Such a directory acts as a home for a specific set of experiments.

For more details, read the docs.

Imports:


In [3]:
# Persistable Class:
from persistable import Persistable

# Set a persistable top path:
from pathlib import Path
LOCALDATAPATH = Path('.').absolute()

Instantiate Persistable:

Each persistable object is instantiated with parameters that should uniquely (or nearly uniquely) define the payload.


In [4]:
params = {
    "hello": "world",
    "another_dict": {
        "test": [1,2,3]
    },
    "a": 1,
    "b": 4
}
p = Persistable(
    payload_name="first_payload",
    params=params,
    workingdatapath=LOCALDATAPATH / "knowledgeshare_20170929" # object will live in this local disk location  
)


2018-03-07 12:27:41,709 - Persistable - __init__ - INFO - ---- NEW PERSISTABLE SESSION ---- (/Users/aloosley/Alex/Repos/persistable/examples/knowledgeshare_20170929)
2018-03-07 12:27:41,712 - Persistable - __init__ - INFO - Payload named first_payload; Parameters set to {'hello': 'world', 'another_dict': {'test': [1, 2, 3]}, 'a': 1, 'b': 4}

Define Payload:

Payloads are defined by overriding the _generate_payload function:

Payload defined by _generate_payload function:

Simply override _generate_payload to give the Persistable object generate functionality. Note that generate here means to create the payload. The term is not meeant to indicate that a python generator is being produced.


In [18]:
# ML Example:
"""
def _generate_payload(self):
    X = pd.read_csv(self.params['datafile'])
    model = XGboost(X)
    model.fit()
    self.payload['model'] = model
"""

# Silly Example:
def _generate_payload(self):
    self.payload['sum'] = self.params['a'] + self.params['b']
    self.payload['msg'] = self.params['hello']

Now we will monkeypatch the payload generator to override its counterpart in Persistable object (only necessary because we've defined the generator outside of an IDE).


In [19]:
def bind(instance, method):
    def binding_scope_fn(*args, **kwargs): 
        return method(instance, *args, **kwargs)
    return binding_scope_fn

p._generate_payload = bind(p, _generate_payload)

In [20]:
p.generate()


2018-03-07 12:37:34,685 - Persistable - generate - INFO - Now generating first_payload payload...
2018-03-07 12:37:34,689 - PersistLoadWithParameters - _persist_with_params - INFO - PERSISTING first_payload to:
 ---> first_payload{a=1,another_dict={test=[1, 2, 3]},b=4,hello='world'}.pkl <---

Persistable as a Super Class:

The non Monkey Patching equivalent to what we did above:


In [22]:
class SillyPersistableExample(Persistable):
    def _generate_payload(self):
        self.payload['sum'] = self.params['a'] + self.params['b']
        self.payload['msg'] = self.params['hello']
    
p2 = SillyPersistableExample(payload_name="silly_example", params=params, workingdatapath=LOCALDATAPATH / "knowledgeshare_20170929")
p2.generate()


2018-03-07 12:38:36,282 - SillyPersistableExample - __init__ - INFO - ---- NEW PERSISTABLE SESSION ---- (/Users/aloosley/Alex/Repos/persistable/examples/knowledgeshare_20170929)
2018-03-07 12:38:36,284 - SillyPersistableExample - __init__ - INFO - Payload named silly_example; Parameters set to {'hello': 'world', 'another_dict': {'test': [1, 2, 3]}, 'a': 1, 'b': 4}
2018-03-07 12:38:36,286 - SillyPersistableExample - generate - INFO - Now generating silly_example payload...
2018-03-07 12:38:36,288 - PersistLoadWithParameters - _persist_with_params - INFO - PERSISTING silly_example to:
 ---> silly_example{a=1,another_dict={test=[1, 2, 3]},b=4,hello='world'}.pkl <---

Load:


In [24]:
p_test = Persistable(
    "first_payload",
    params=params,
    workingdatapath=LOCALDATAPATH/"knowledgeshare_20170929"
)
p_test.load()


2018-03-07 12:39:31,099 - Persistable - __init__ - INFO - ---- NEW PERSISTABLE SESSION ---- (/Users/aloosley/Alex/Repos/persistable/examples/knowledgeshare_20170929)
2018-03-07 12:39:31,102 - Persistable - __init__ - INFO - Payload named first_payload; Parameters set to {'hello': 'world', 'another_dict': {'test': [1, 2, 3]}, 'a': 1, 'b': 4}
2018-03-07 12:39:31,104 - Persistable - load - INFO - Now loading first_payload payload...
2018-03-07 12:39:31,106 - PersistLoadWithParameters - load - INFO - Attempting to LOAD first_payload from:
 <--- first_payload{a=1,another_dict={test=[1, 2, 3]},b=4,hello='world'}.pkl --->
2018-03-07 12:39:31,108 - PersistLoadWithParameters - _load_with_params - INFO - Exact first_payload file found and LOADED!

In [25]:
p_test.payload


Out[25]:
defaultdict(<function persistable.util.dict.recdefaultdict>,
            {'msg': 'world', 'sum': 5})