Primitive generators

This notebook contains tests for tohu's primitive generators.

In [1]:
import tohu
from tohu.v4.primitive_generators import *
from tohu.v4.dispatch_generators import *
from tohu.v4.utils import print_generated_sequence

In [2]:
print(f'Tohu version: {tohu.__version__}')

Tohu version: v0.5.1+5.g734d94f.dirty


Constant simply returns the same, constant value every time.

In [3]:
g = Constant('quux')

In [4]:
print_generated_sequence(g, num=10, seed=12345)

Generated sequence: quux, quux, quux, quux, quux, quux, quux, quux, quux, quux


Boolean returns either True or False, optionally with different probabilities.

In [5]:
g1 = Boolean()
g2 = Boolean(p=0.8)

In [6]:
print_generated_sequence(g1, num=20, seed=12345)
print_generated_sequence(g2, num=20, seed=99999)

Generated sequence: True, True, False, True, True, True, False, True, True, True, False, True, False, True, False, True, False, True, False, True
Generated sequence: True, True, False, True, True, True, True, False, True, False, True, True, True, True, True, True, True, True, False, True


Integer returns a random integer between low and high (both inclusive).

In [7]:
g = Integer(low=100, high=200)

In [8]:
print_generated_sequence(g, num=10, seed=12345)

Generated sequence: 153, 193, 101, 138, 147, 124, 134, 172, 155, 120


Float returns a random float between low and high (both inclusive).

In [9]:
g = Float(low=2.3, high=4.2)

In [10]:
print_generated_sequence(g, num=10, sep='\n', fmt='.12f', seed=12345)

Generated sequence:



HashDigest returns hex strings representing hash digest values (or alternatively raw bytes).

HashDigest hex strings (uppercase)

In [11]:
g = HashDigest(length=6)

In [12]:
print_generated_sequence(g, num=10, seed=12345)

Generated sequence: E251FB, E52DE1, 1DFDFD, 810876, A44D15, A9AD2D, FE0F5E, 7E5191, 656D56, 224236

HashDigest hex strings (lowercase)

In [13]:
g = HashDigest(length=6, uppercase=False)

In [14]:
print_generated_sequence(g, num=10, seed=12345)

Generated sequence: e251fb, e52de1, 1dfdfd, 810876, a44d15, a9ad2d, fe0f5e, 7e5191, 656d56, 224236

HashDigest byte strings

In [15]:
g = HashDigest(length=10, as_bytes=True)

In [16]:
print_generated_sequence(g, num=5, seed=12345, sep='\n')

Generated sequence:



This generator can produce random numbers using any of the random number generators supported by numpy.

In [17]:
g1 = NumpyRandomGenerator(method="normal", loc=3.0, scale=5.0)
g2 = NumpyRandomGenerator(method="poisson", lam=30)
g3 = NumpyRandomGenerator(method="exponential", scale=0.3)

In [18]:
g1.reset(seed=12345); print_generated_sequence(g1, num=4)
g2.reset(seed=12345); print_generated_sequence(g2, num=15)
g3.reset(seed=12345); print_generated_sequence(g3, num=4)

Generated sequence: 1.9764617025764353, 5.394716690287741, 0.40280642471630923, 0.22134847826254989
Generated sequence: 40, 24, 31, 34, 27, 32, 29, 29, 35, 38, 30, 32, 38, 36, 36
Generated sequence: 0.7961371899305246, 0.11410397056571128, 0.060972430042086474, 0.06865806254932436


FakerGenerator gives access to any of the methods supported by the faker module. Here are a couple of examples.

Example: random names

In [19]:
g = FakerGenerator(method='name')

In [20]:
print_generated_sequence(g, num=8, seed=12345)

Generated sequence: Adam Bryan, Jacob Lee, Candice Martinez, Justin Thompson, Heather Rubio, William Jenkins, Brittany Ball, Glenn Johnson

Example: random addresses

In [21]:
g = FakerGenerator(method='address')

In [22]:
print_generated_sequence(g, num=8, seed=12345, sep='\n---\n')

Generated sequence:

453 Ryan Islands
Greenstad, FL 97251
USS Irwin
FPO AA 66552
55075 William Rest
North Elizabeth, NH 38062
926 Alexandra Road
Romanberg, HI 99597
8202 Michelle Branch
Baileyborough, AL 08481
205 William Coves
Alexanderport, WI 72565
821 Patricia Hill Apt. 242
Apriltown, MO 24730
486 Karen Lodge Apt. 205
West Gregory, MT 33130


IterateOver is a generator which simply iterates over a given sequence. Note that once the generator has been exhausted (by iterating over all its elements), it needs to be reset before it can produce elements again.

In [23]:
seq = ['a', 'b', 'c', 'd', 'e']

In [24]:
g = IterateOver(seq)

In [25]:
print([x for x in g])
print([x for x in g])
print([x for x in g])

['a', 'b', 'c', 'd', 'e']
['a', 'b', 'c', 'd', 'e']


In [26]:
some_items = ['aa', 'bb', 'cc', 'dd', 'ee']

In [27]:
g = SelectOne(some_items)

In [28]:
print_generated_sequence(g, num=30, seed=12345)

Generated sequence: dd, aa, cc, cc, bb, cc, ee, dd, bb, cc, aa, dd, cc, ee, bb, ee, ee, bb, cc, aa, ee, dd, ee, ee, bb, bb, bb, aa, bb, cc

By default, all possible values are chosen with equal probability, but this can be changed by passing a distribution as the parameter p.

In [29]:
g = SelectOne(some_items, p=[0.1, 0.05, 0.7, 0.03, 0.12])

In [30]:
print_generated_sequence(g, num=30, seed=99999)

Generated sequence: cc, ee, cc, aa, cc, cc, cc, cc, cc, aa, cc, cc, cc, cc, aa, cc, cc, cc, ee, cc, cc, cc, cc, cc, cc, ee, cc, ee, cc, cc

We can see that the item 'cc' has the highest chance of being selected (70%), followed by 'ee' and 'aa' (12% and 10%, respectively).


Timestamp produces random timestamps between a start and end time (both inclusive).

In [31]:
g = Timestamp(start='1998-03-01 00:02:00', end='1998-03-01 00:02:15')

In [32]:
print_generated_sequence(g, num=10, sep='\n', seed=99999)

Generated sequence:

1998-03-01 00:02:03
1998-03-01 00:02:09
1998-03-01 00:02:07
1998-03-01 00:02:11
1998-03-01 00:02:13
1998-03-01 00:02:06
1998-03-01 00:02:08
1998-03-01 00:02:12
1998-03-01 00:02:06
1998-03-01 00:02:01

If start or end are dates of the form YYYY-MM-DD (without the exact HH:MM:SS timestamp), they are interpreted as start='YYYY-MM-DD 00:00:00 and end='YYYY-MM-DD 23:59:59', respectively - i.e., as the beginning and the end of the day.

In [33]:
g = Timestamp(start='2018-02-14', end='2018-02-18')

In [34]:
print_generated_sequence(g, num=5, sep='\n', seed=12345)

Generated sequence:

2018-02-16 12:40:28
2018-02-18 10:42:18
2018-02-14 01:28:51
2018-02-18 23:26:47
2018-02-18 20:55:23

For convenience, one can also pass a single date, which will produce timestamps during this particular date.

In [35]:
g = Timestamp(date='2018-01-01')

In [36]:
print_generated_sequence(g, num=5, sep='\n', seed=12345)

Generated sequence:

2018-01-01 15:10:07
2018-01-01 00:22:12
2018-01-01 10:52:23
2018-01-01 13:24:48
2018-01-01 07:03:03

Note that the generated items are datetime objects (even though they appear as strings when printed above).

In [37]:
[next(g), next(g), next(g)]

[datetime.datetime(2018, 1, 1, 15, 10, 7),
 datetime.datetime(2018, 1, 1, 0, 22, 12),
 datetime.datetime(2018, 1, 1, 10, 52, 23)]

We can use the .strftime() method to create another generator which returns timestamps as strings instead of datetime objects.

In [38]:
h = Timestamp(date='2018-01-01').strftime('%-d %b %Y, %H:%M (%a)')

In [39]:
[next(h), next(h), next(h)]

['1 Jan 2018, 15:10 (Mon)',
 '1 Jan 2018, 00:22 (Mon)',
 '1 Jan 2018, 10:52 (Mon)']


In [40]:
g = CharString(length=15)
print_generated_sequence(g, num=5, seed=12345)
print_generated_sequence(g, num=5, seed=99999)

Generated sequence: bFj7lCDM5eUVwz8, QG5ThX0t5TMklKn, Qule67xq5QaV597, SA4TteJc6OZuDxy, HxzQkefvT0jmCgC
Generated sequence: Ylx3SYjPqrPO0vC, udVUmJ5f2xi6RRv, 8ZYmUYrEgjY5INZ, B9cgzt0nNwfbstm, h84ObqDckapVKgd

It is possible to explicitly specify the character set.

In [41]:
g = CharString(length=12, charset="ABCDEFG")
print_generated_sequence(g, num=5, sep='\n', seed=12345)

Generated sequence:


There are also a few pre-defined character sets.

In [42]:
g1 = CharString(length=12, charset="<lowercase>")
g2 = CharString(length=12, charset="<alphanumeric_uppercase>")
print_generated_sequence(g1, num=5, sep='\n', seed=12345); print()
print_generated_sequence(g2, num=5, sep='\n', seed=12345)

Generated sequence:


Generated sequence:



DigitString is the same as CharString with charset='0123456789'.

In [43]:
g = DigitString(length=15)
print_generated_sequence(g, num=5, seed=12345)
print_generated_sequence(g, num=5, seed=99999)

Generated sequence: 051914469077349, 659717839761152, 631099329607999, 749730509683433, 534610037812414
Generated sequence: 813878162266834, 307715908319673, 988278241189568, 490143826300232, 199602401027500


Generates a sequence of sequentially numbered strings with a given prefix.

In [44]:
g = Sequential(prefix='Foo_', digits=3)

Calling reset() on the generator makes the numbering start from 1 again.

In [45]:
print_generated_sequence(g, num=5)
print_generated_sequence(g, num=5)
print_generated_sequence(g, num=5)

Generated sequence: Foo_001, Foo_002, Foo_003, Foo_004, Foo_005
Generated sequence: Foo_006, Foo_007, Foo_008, Foo_009, Foo_010

Generated sequence: Foo_001, Foo_002, Foo_003, Foo_004, Foo_005

Note that the method Sequential.reset() supports the seed argument for consistency with other generators, but its value is ignored - the generator is simply reset to its initial value. This is illustrated here:

In [46]:
g.reset(seed=12345); print_generated_sequence(g, num=5)
g.reset(seed=99999); print_generated_sequence(g, num=5)

Generated sequence: Foo_001, Foo_002, Foo_003, Foo_004, Foo_005
Generated sequence: Foo_001, Foo_002, Foo_003, Foo_004, Foo_005

In [ ]: