Primitive generators

This notebook contains tests for tohu's primitive generators.


In [1]:
import tohu
from tohu.v4.primitive_generators import *
from tohu.v4.dispatch_generators import *
from tohu.v4.utils import print_generated_sequence

In [2]:
print(f'Tohu version: {tohu.__version__}')


Tohu version: v0.5.1+5.g734d94f.dirty

Constant

Constant simply returns the same, constant value every time.


In [3]:
g = Constant('quux')

In [4]:
print_generated_sequence(g, num=10, seed=12345)


Generated sequence: quux, quux, quux, quux, quux, quux, quux, quux, quux, quux

Boolean

Boolean returns either True or False, optionally with different probabilities.


In [5]:
g1 = Boolean()
g2 = Boolean(p=0.8)

In [6]:
print_generated_sequence(g1, num=20, seed=12345)
print_generated_sequence(g2, num=20, seed=99999)


Generated sequence: True, True, False, True, True, True, False, True, True, True, False, True, False, True, False, True, False, True, False, True
Generated sequence: True, True, False, True, True, True, True, False, True, False, True, True, True, True, True, True, True, True, False, True

Integer

Integer returns a random integer between low and high (both inclusive).


In [7]:
g = Integer(low=100, high=200)

In [8]:
print_generated_sequence(g, num=10, seed=12345)


Generated sequence: 153, 193, 101, 138, 147, 124, 134, 172, 155, 120

Float

Float returns a random float between low and high (both inclusive).


In [9]:
g = Float(low=2.3, high=4.2)

In [10]:
print_generated_sequence(g, num=10, sep='\n', fmt='.12f', seed=12345)


Generated sequence:

3.091577757836
2.319321421968
3.867892367582
2.867415724879
2.999982210028
2.667956563186
3.375415520585
2.607206865466
2.536107080139
3.122578909219

HashDigest

HashDigest returns hex strings representing hash digest values (or alternatively raw bytes).

HashDigest hex strings (uppercase)


In [11]:
g = HashDigest(length=6)

In [12]:
print_generated_sequence(g, num=10, seed=12345)


Generated sequence: E251FB, E52DE1, 1DFDFD, 810876, A44D15, A9AD2D, FE0F5E, 7E5191, 656D56, 224236

HashDigest hex strings (lowercase)


In [13]:
g = HashDigest(length=6, uppercase=False)

In [14]:
print_generated_sequence(g, num=10, seed=12345)


Generated sequence: e251fb, e52de1, 1dfdfd, 810876, a44d15, a9ad2d, fe0f5e, 7e5191, 656d56, 224236

HashDigest byte strings


In [15]:
g = HashDigest(length=10, as_bytes=True)

In [16]:
print_generated_sequence(g, num=5, seed=12345, sep='\n')


Generated sequence:

b'\xe2Q\xfb\xed\xe5-\xe1\xe3\x1d\xfd'
b'\x81\x08v!\xa4M\x15/\xa9\xad'
b'\xfe\x0f^4~Q\x91\xd3em'
b'"B6\x88\x1d\x9eu\x98\x01\xbb'
b'vl\xea\xf6q\xcd@v;\x9d'

NumpyRandomGenerator

This generator can produce random numbers using any of the random number generators supported by numpy.


In [17]:
g1 = NumpyRandomGenerator(method="normal", loc=3.0, scale=5.0)
g2 = NumpyRandomGenerator(method="poisson", lam=30)
g3 = NumpyRandomGenerator(method="exponential", scale=0.3)

In [18]:
g1.reset(seed=12345); print_generated_sequence(g1, num=4)
g2.reset(seed=12345); print_generated_sequence(g2, num=15)
g3.reset(seed=12345); print_generated_sequence(g3, num=4)


Generated sequence: 1.9764617025764353, 5.394716690287741, 0.40280642471630923, 0.22134847826254989
Generated sequence: 40, 24, 31, 34, 27, 32, 29, 29, 35, 38, 30, 32, 38, 36, 36
Generated sequence: 0.7961371899305246, 0.11410397056571128, 0.060972430042086474, 0.06865806254932436

FakerGenerator

FakerGenerator gives access to any of the methods supported by the faker module. Here are a couple of examples.

Example: random names


In [19]:
g = FakerGenerator(method='name')

In [20]:
print_generated_sequence(g, num=8, seed=12345)


Generated sequence: Adam Bryan, Jacob Lee, Candice Martinez, Justin Thompson, Heather Rubio, William Jenkins, Brittany Ball, Glenn Johnson

Example: random addresses


In [21]:
g = FakerGenerator(method='address')

In [22]:
print_generated_sequence(g, num=8, seed=12345, sep='\n---\n')


Generated sequence:

453 Ryan Islands
Greenstad, FL 97251
---
USS Irwin
FPO AA 66552
---
55075 William Rest
North Elizabeth, NH 38062
---
926 Alexandra Road
Romanberg, HI 99597
---
8202 Michelle Branch
Baileyborough, AL 08481
---
205 William Coves
Alexanderport, WI 72565
---
821 Patricia Hill Apt. 242
Apriltown, MO 24730
---
486 Karen Lodge Apt. 205
West Gregory, MT 33130

IterateOver

IterateOver is a generator which simply iterates over a given sequence. Note that once the generator has been exhausted (by iterating over all its elements), it needs to be reset before it can produce elements again.


In [23]:
seq = ['a', 'b', 'c', 'd', 'e']

In [24]:
g = IterateOver(seq)

In [25]:
g.reset()
print([x for x in g])
print([x for x in g])
g.reset()
print([x for x in g])


['a', 'b', 'c', 'd', 'e']
[]
['a', 'b', 'c', 'd', 'e']

SelectOne


In [26]:
some_items = ['aa', 'bb', 'cc', 'dd', 'ee']

In [27]:
g = SelectOne(some_items)

In [28]:
print_generated_sequence(g, num=30, seed=12345)


Generated sequence: dd, aa, cc, cc, bb, cc, ee, dd, bb, cc, aa, dd, cc, ee, bb, ee, ee, bb, cc, aa, ee, dd, ee, ee, bb, bb, bb, aa, bb, cc

By default, all possible values are chosen with equal probability, but this can be changed by passing a distribution as the parameter p.


In [29]:
g = SelectOne(some_items, p=[0.1, 0.05, 0.7, 0.03, 0.12])

In [30]:
print_generated_sequence(g, num=30, seed=99999)


Generated sequence: cc, ee, cc, aa, cc, cc, cc, cc, cc, aa, cc, cc, cc, cc, aa, cc, cc, cc, ee, cc, cc, cc, cc, cc, cc, ee, cc, ee, cc, cc

We can see that the item 'cc' has the highest chance of being selected (70%), followed by 'ee' and 'aa' (12% and 10%, respectively).

Timestamp

Timestamp produces random timestamps between a start and end time (both inclusive).


In [31]:
g = Timestamp(start='1998-03-01 00:02:00', end='1998-03-01 00:02:15')

In [32]:
print_generated_sequence(g, num=10, sep='\n', seed=99999)


Generated sequence:

1998-03-01 00:02:03
1998-03-01 00:02:09
1998-03-01 00:02:07
1998-03-01 00:02:11
1998-03-01 00:02:13
1998-03-01 00:02:06
1998-03-01 00:02:08
1998-03-01 00:02:12
1998-03-01 00:02:06
1998-03-01 00:02:01

If start or end are dates of the form YYYY-MM-DD (without the exact HH:MM:SS timestamp), they are interpreted as start='YYYY-MM-DD 00:00:00 and end='YYYY-MM-DD 23:59:59', respectively - i.e., as the beginning and the end of the day.


In [33]:
g = Timestamp(start='2018-02-14', end='2018-02-18')

In [34]:
print_generated_sequence(g, num=5, sep='\n', seed=12345)


Generated sequence:

2018-02-16 12:40:28
2018-02-18 10:42:18
2018-02-14 01:28:51
2018-02-18 23:26:47
2018-02-18 20:55:23

For convenience, one can also pass a single date, which will produce timestamps during this particular date.


In [35]:
g = Timestamp(date='2018-01-01')

In [36]:
print_generated_sequence(g, num=5, sep='\n', seed=12345)


Generated sequence:

2018-01-01 15:10:07
2018-01-01 00:22:12
2018-01-01 10:52:23
2018-01-01 13:24:48
2018-01-01 07:03:03

Note that the generated items are datetime objects (even though they appear as strings when printed above).


In [37]:
g.reset(seed=12345)
[next(g), next(g), next(g)]


Out[37]:
[datetime.datetime(2018, 1, 1, 15, 10, 7),
 datetime.datetime(2018, 1, 1, 0, 22, 12),
 datetime.datetime(2018, 1, 1, 10, 52, 23)]

We can use the .strftime() method to create another generator which returns timestamps as strings instead of datetime objects.


In [38]:
h = Timestamp(date='2018-01-01').strftime('%-d %b %Y, %H:%M (%a)')

In [39]:
h.reset(seed=12345)
[next(h), next(h), next(h)]


Out[39]:
['1 Jan 2018, 15:10 (Mon)',
 '1 Jan 2018, 00:22 (Mon)',
 '1 Jan 2018, 10:52 (Mon)']

CharString


In [40]:
g = CharString(length=15)
print_generated_sequence(g, num=5, seed=12345)
print_generated_sequence(g, num=5, seed=99999)


Generated sequence: bFj7lCDM5eUVwz8, QG5ThX0t5TMklKn, Qule67xq5QaV597, SA4TteJc6OZuDxy, HxzQkefvT0jmCgC
Generated sequence: Ylx3SYjPqrPO0vC, udVUmJ5f2xi6RRv, 8ZYmUYrEgjY5INZ, B9cgzt0nNwfbstm, h84ObqDckapVKgd

It is possible to explicitly specify the character set.


In [41]:
g = CharString(length=12, charset="ABCDEFG")
print_generated_sequence(g, num=5, sep='\n', seed=12345)


Generated sequence:

ADBGBDDEGAFF
CCGEDGFAFFCG
FEBBEBECBAGG
CBGEAFGGGFDG
FCAEAGEFCDCC

There are also a few pre-defined character sets.


In [42]:
g1 = CharString(length=12, charset="<lowercase>")
g2 = CharString(length=12, charset="<alphanumeric_uppercase>")
print_generated_sequence(g1, num=5, sep='\n', seed=12345); print()
print_generated_sequence(g2, num=5, sep='\n', seed=12345)


Generated sequence:

andyelmqybtt
jkzrnytduvhy
tqeepfrifbyz
jgyratyzzslx
sibpayqvimjk

Generated sequence:

ASF8GQRW7C11
NO9YS70E24L7
0WGGVHYMGC78
NJ7YA1798ZP6
0LCUB8X4MRNN

DigitString

DigitString is the same as CharString with charset='0123456789'.


In [43]:
g = DigitString(length=15)
print_generated_sequence(g, num=5, seed=12345)
print_generated_sequence(g, num=5, seed=99999)


Generated sequence: 051914469077349, 659717839761152, 631099329607999, 749730509683433, 534610037812414
Generated sequence: 813878162266834, 307715908319673, 988278241189568, 490143826300232, 199602401027500

Sequential

Generates a sequence of sequentially numbered strings with a given prefix.


In [44]:
g = Sequential(prefix='Foo_', digits=3)

Calling reset() on the generator makes the numbering start from 1 again.


In [45]:
g.reset()
print_generated_sequence(g, num=5)
print_generated_sequence(g, num=5)
print()
g.reset()
print_generated_sequence(g, num=5)


Generated sequence: Foo_001, Foo_002, Foo_003, Foo_004, Foo_005
Generated sequence: Foo_006, Foo_007, Foo_008, Foo_009, Foo_010

Generated sequence: Foo_001, Foo_002, Foo_003, Foo_004, Foo_005

Note that the method Sequential.reset() supports the seed argument for consistency with other generators, but its value is ignored - the generator is simply reset to its initial value. This is illustrated here:


In [46]:
g.reset(seed=12345); print_generated_sequence(g, num=5)
g.reset(seed=99999); print_generated_sequence(g, num=5)


Generated sequence: Foo_001, Foo_002, Foo_003, Foo_004, Foo_005
Generated sequence: Foo_001, Foo_002, Foo_003, Foo_004, Foo_005

In [ ]: