Advanced Usage of Domain

Domain and auxiliary classes (KV, Option, ConfigAlias) are used to define combinations of parameters to try in Research.

We start with some useful imports and constant definitions


In [1]:
import sys
import os
import shutil

import matplotlib
%matplotlib inline

In [2]:
sys.path.append('../../..')

from batchflow import NumpySampler as NS
from batchflow.research import KV, Option, Domain

In [3]:
def drop_repetition(config_alias):
    res = []
    for item in config_alias:
        item.pop_alias('repetition')
        res.append(item)
    return res

Basic usage

Option is a class for parameter name and values that will be used in a Research. Values of the Option can be defined as array or Sampler. Can be easily transformed to Domain to construct iterator which will produce configs.


In [4]:
domain = Domain(Option('p', ['v1', 'v2']))

Each instance of Domain class has attribute iterator: generator which produces configs from the domain.


In [5]:
list(domain.iterator)


Out[5]:
[ConfigAlias({'p': 'v1', 'repetition': '0'}),
 ConfigAlias({'p': 'v2', 'repetition': '0'})]

Each item is ConfigAlias: wrapper for Config with methods config and alias, methods return wrapped Config and corresponding dict with str representations of values. To set or reset iterator use set_iter method. It also accepts some parameters that will be described below. If you get attribute iterator without set_iter, firstly it will be called with default parameters.


In [6]:
domain.set_iter()
config = next(domain.iterator)

config.config(), config.alias()


Out[6]:
(Config({'p': 'v1', 'repetition': 0}), {'p': 'v1', 'repetition': '0'})

Alias is used to create str representation of each value of the domain because they will be used as folder names and to have more readable representation of configs with non-string values. Alias is __name__ attribute of the value or str representation. One can define custom alias by using KV class.


In [7]:
domain = Domain(Option('p', [ KV('v1', 'alias'), NS]))

config = next(domain.iterator)
print('alias: {:14} value: {}'.format(config.alias()['p'], config.config()['p']))

config = next(domain.iterator)
print('alias: {:14} value: {}'.format(config.alias()['p'], config.config()['p']))


alias: alias          value: v1
alias: NumpySampler   value: <class 'batchflow.sampler.NumpySampler'>

You can define the number of times to produce each item of the domain as n_reps parameter of set_iter. Each produced ConfigAlias will have 'repetition' key.


In [8]:
domain.set_iter(n_reps=2)

list(domain.iterator)


Out[8]:
[ConfigAlias({'p': 'alias', 'repetition': '0'}),
 ConfigAlias({'p': 'NumpySampler', 'repetition': '0'}),
 ConfigAlias({'p': 'alias', 'repetition': '1'}),
 ConfigAlias({'p': 'NumpySampler', 'repetition': '1'})]

Also you can define n_iters parameter to define the number of configs that we will get from Domain. By default it is equel to the actual number of unique elements.


In [9]:
domain.set_iter(n_iters=3, n_reps=2)

list(domain.iterator)


Out[9]:
[ConfigAlias({'p': 'alias', 'repetition': '0'}),
 ConfigAlias({'p': 'NumpySampler', 'repetition': '0'}),
 ConfigAlias({'p': 'alias', 'repetition': '0'}),
 ConfigAlias({'p': 'alias', 'repetition': '1'}),
 ConfigAlias({'p': 'NumpySampler', 'repetition': '1'}),
 ConfigAlias({'p': 'alias', 'repetition': '1'})]

Operations

Multiplication

The resulting Domain will produce configs from Cartesian product of values. It means that we will get all possible combinations of Option values. Here and below we will pop 'repetition' key from configs to make cell output simpler except the cases while n_reps != 1.


In [10]:
domain = Option('p1', ['v1', 'v2']) * Option('p2', ['v3', 'v4'])

drop_repetition(domain.iterator)


Out[10]:
[ConfigAlias({'p1': 'v1', 'p2': 'v3'}),
 ConfigAlias({'p1': 'v1', 'p2': 'v4'}),
 ConfigAlias({'p1': 'v2', 'p2': 'v3'}),
 ConfigAlias({'p1': 'v2', 'p2': 'v4'})]

Sum

Plus unites lists of values.


In [11]:
domain = Option('p1', ['v1', 'v2']) + Option('p2', ['v3', 'v4'])

drop_repetition(domain.iterator)


Out[11]:
[ConfigAlias({'p1': 'v1'}),
 ConfigAlias({'p1': 'v2'}),
 ConfigAlias({'p2': 'v3'}),
 ConfigAlias({'p2': 'v4'})]

@ multiplication

Result is a scalar product of options.


In [12]:
op1 = Option('p1', ['v1', 'v2'])
op2 = Option('p2', ['v3', 'v4'])
op3 = Option('p3', ['v5', 'v6'])
domain = op1 @ op2 @ op3

drop_repetition(domain.iterator)


Out[12]:
[ConfigAlias({'p1': 'v1', 'p2': 'v3', 'p3': 'v5'}),
 ConfigAlias({'p1': 'v2', 'p2': 'v4', 'p3': 'v6'})]

You also can combine all operations because all of them can be applied to resulting domains.


In [13]:
op1 = Option('p1', ['v1', 'v2'])
op2 = Option('p2', ['v3', 'v4'])
op3 = Option('p3', list(range(2)))
op4 = Option('p4', list(range(3, 5)))

domain = (op1 @ op2 + op3) * op4

drop_repetition(domain.iterator)


Out[13]:
[ConfigAlias({'p1': 'v1', 'p2': 'v3', 'p4': '3'}),
 ConfigAlias({'p1': 'v1', 'p2': 'v3', 'p4': '4'}),
 ConfigAlias({'p1': 'v2', 'p2': 'v4', 'p4': '3'}),
 ConfigAlias({'p1': 'v2', 'p2': 'v4', 'p4': '4'}),
 ConfigAlias({'p3': '0', 'p4': '3'}),
 ConfigAlias({'p3': '0', 'p4': '4'}),
 ConfigAlias({'p3': '1', 'p4': '3'}),
 ConfigAlias({'p3': '1', 'p4': '4'})]

size attribute will return the size of resulting domain


In [14]:
print(domain.size)


8

Note that you will get the total number of produced confgs. For example, if you have one Option with two values and n_iters=5 and n_reps=2 in set_iter then the size will be 10.


In [15]:
domain = Domain(Option('p1', list(range(3))))
domain.set_iter(n_iters=5, n_reps=2)
domain.size


Out[15]:
10

Options with Samplers

Instead of array-like options you can use Sampler instances as Option value. Iterator will produce independent samples from domain.


In [16]:
domain = Domain(Option('p1', NS('n')))
domain.set_iter(n_iters=3)

drop_repetition(domain.iterator)


Out[16]:
[ConfigAlias({'p1': '-0.18666036858901167'}),
 ConfigAlias({'p1': '-0.7764567425273557'}),
 ConfigAlias({'p1': '0.4112263968670785'})]

If n_reps > 1 then samples will be repeated.


In [17]:
domain.set_iter(n_iters=3, n_reps=2)

list(domain.iterator)


Out[17]:
[ConfigAlias({'p1': '0.6559810584264794', 'repetition': '0'}),
 ConfigAlias({'p1': '-0.9878018244023918', 'repetition': '0'}),
 ConfigAlias({'p1': '0.6922709777654654', 'repetition': '0'}),
 ConfigAlias({'p1': '0.6559810584264794', 'repetition': '1'}),
 ConfigAlias({'p1': '-0.9878018244023918', 'repetition': '1'}),
 ConfigAlias({'p1': '0.6922709777654654', 'repetition': '1'})]

If set_iter will be called with n_iters=None then resulting iterator will be infinite.


In [18]:
domain.set_iter(n_iters=None)

print('size: ', domain.size)

for _ in range(5):
    print(next(domain.iterator))


size:  None
ConfigAlias({'p1': '1.1794451621376802', 'repetition': '0'})
ConfigAlias({'p1': '-0.8377350091935624', 'repetition': '0'})
ConfigAlias({'p1': '1.2203144314279863', 'repetition': '0'})
ConfigAlias({'p1': '0.7257703418143013', 'repetition': '0'})
ConfigAlias({'p1': '-0.5393569429280896', 'repetition': '0'})

repeat_each parameter defines how often elements from infinite generator will be repeated (by default, repeat_each=100).


In [19]:
domain.set_iter(n_iters=None, n_reps=2, repeat_each=2)

print('Domain size: {} \n'.format(domain.size))

for _ in range(8):
    print(next(domain.iterator))


Domain size: None 

ConfigAlias({'p1': '-0.8045105032749361', 'repetition': '0'})
ConfigAlias({'p1': '0.49516405159733623', 'repetition': '0'})
ConfigAlias({'p1': '-0.8045105032749361', 'repetition': '1'})
ConfigAlias({'p1': '0.49516405159733623', 'repetition': '1'})
ConfigAlias({'p1': '0.20568893464639518', 'repetition': '0'})
ConfigAlias({'p1': '0.8531878953938405', 'repetition': '0'})
ConfigAlias({'p1': '0.20568893464639518', 'repetition': '1'})
ConfigAlias({'p1': '0.8531878953938405', 'repetition': '1'})

If one multiply array-like options and sampler options, resulting iterator will produce combinations of array-like options with independent sampler from sampler options.


In [20]:
domain = Option('p1', NS('n')) * Option('p2', NS('u')) * Option('p3', [1, 2, 3])

drop_repetition(domain.iterator)


Out[20]:
[ConfigAlias({'p1': '0.027281320022883283', 'p2': '0.676603672080008', 'p3': '1'}),
 ConfigAlias({'p1': '0.9204453494550416', 'p2': '0.7773722425900274', 'p3': '2'}),
 ConfigAlias({'p1': '-1.5074022911301577', 'p2': '0.470577354404912', 'p3': '3'})]

Domains with Weights

By default configs are consequently produced from option in a sum from the left to the right.


In [21]:
op1 = Option('p1', ['v1', 'v2'])
op2 = Option('p2', ['v3', 'v4'])
op3 = Option('p3', ['v5', 'v6'])
domain = op1 + op2 + op3

drop_repetition(domain.iterator)


Out[21]:
[ConfigAlias({'p1': 'v1'}),
 ConfigAlias({'p1': 'v2'}),
 ConfigAlias({'p2': 'v3'}),
 ConfigAlias({'p2': 'v4'}),
 ConfigAlias({'p3': 'v5'}),
 ConfigAlias({'p3': 'v6'})]

To sample options from sum independently with some probabilities you can multiply corresponding options by float.


In [22]:
domain = 0.3 * op1 + 0.2 * op2 + 0.5 * op3

drop_repetition(domain.iterator)


Out[22]:
[ConfigAlias({'p3': 'v5'}),
 ConfigAlias({'p2': 'v3'}),
 ConfigAlias({'p3': 'v6'}),
 ConfigAlias({'p2': 'v4'}),
 ConfigAlias({'p1': 'v1'}),
 ConfigAlias({'p1': 'v2'})]

If you sum options with and without weights,

  • they are grouped into consequent groups where all options has or not weights,
  • consequently for each group configs are generated consequently (for groups with weights) or sampled as described above.

In [23]:
domain = op1 + 1.0 * op2 + 1.0 * op3

drop_repetition(domain.iterator)


Out[23]:
[ConfigAlias({'p1': 'v1'}),
 ConfigAlias({'p1': 'v2'}),
 ConfigAlias({'p2': 'v3'}),
 ConfigAlias({'p3': 'v5'}),
 ConfigAlias({'p3': 'v6'}),
 ConfigAlias({'p2': 'v4'})]

Thus, we firstly get all configs from op1, then configs uniformly sampled from op2 and op3. Obviously, if we define some weight too large, firstly we get all samples from corresponding option.


In [24]:
domain = op1 + 1.0 * op2 + 100.0 * op3

drop_repetition(domain.iterator)


Out[24]:
[ConfigAlias({'p1': 'v1'}),
 ConfigAlias({'p1': 'v2'}),
 ConfigAlias({'p3': 'v5'}),
 ConfigAlias({'p3': 'v6'}),
 ConfigAlias({'p2': 'v3'}),
 ConfigAlias({'p2': 'v4'})]

Consider more dificult situation. We will get

  • all configs from options[0]
  • configs will be sampled from 1.2 * options[1] + 2.3 * options[2]
  • all configs from options[3]
  • configs will be sampled from 1.7 * options[4] + 3.4 * options[5]

In [25]:
options = [Option('p'+str(i), ['v'+str(i)]) for i in range(6)]
domain = options[0] + 1.2 * options[1] + 2.3 * options[2] + options[3] + 1.7 * options[4] + 3.4 * options[5]

domain.set_iter(12)

drop_repetition(domain.iterator)


Out[25]:
[ConfigAlias({'p0': 'v0'}),
 ConfigAlias({'p2': 'v2'}),
 ConfigAlias({'p1': 'v1'}),
 ConfigAlias({'p3': 'v3'}),
 ConfigAlias({'p5': 'v5'}),
 ConfigAlias({'p4': 'v4'}),
 ConfigAlias({'p0': 'v0'}),
 ConfigAlias({'p2': 'v2'}),
 ConfigAlias({'p1': 'v1'}),
 ConfigAlias({'p3': 'v3'}),
 ConfigAlias({'p4': 'v4'}),
 ConfigAlias({'p5': 'v5'})]