This notebook contains tests for tohu's primitive generators.
In [1]:
import tohu
from tohu.v4.primitive_generators import *
from tohu.v4.dispatch_generators import *
from tohu.v4.utils import print_generated_sequence
In [2]:
print(f'Tohu version: {tohu.__version__}')
Constant
simply returns the same, constant value every time.
In [3]:
g = Constant('quux')
In [4]:
print_generated_sequence(g, num=10, seed=12345)
Boolean
returns either True
or False
, optionally with different probabilities.
In [5]:
g1 = Boolean()
g2 = Boolean(p=0.8)
In [6]:
print_generated_sequence(g1, num=20, seed=12345)
print_generated_sequence(g2, num=20, seed=99999)
Integer
returns a random integer between low
and high
(both inclusive).
In [7]:
g = Integer(low=100, high=200)
In [8]:
print_generated_sequence(g, num=10, seed=12345)
Float
returns a random float between low
and high
(both inclusive).
In [9]:
g = Float(low=2.3, high=4.2)
In [10]:
print_generated_sequence(g, num=10, sep='\n', fmt='.12f', seed=12345)
HashDigest
returns hex strings representing hash digest values (or alternatively raw bytes).
In [11]:
g = HashDigest(length=6)
In [12]:
print_generated_sequence(g, num=10, seed=12345)
In [13]:
g = HashDigest(length=6, uppercase=False)
In [14]:
print_generated_sequence(g, num=10, seed=12345)
In [15]:
g = HashDigest(length=10, as_bytes=True)
In [16]:
print_generated_sequence(g, num=5, seed=12345, sep='\n')
This generator can produce random numbers using any of the random number generators supported by numpy.
In [17]:
g1 = NumpyRandomGenerator(method="normal", loc=3.0, scale=5.0)
g2 = NumpyRandomGenerator(method="poisson", lam=30)
g3 = NumpyRandomGenerator(method="exponential", scale=0.3)
In [18]:
g1.reset(seed=12345); print_generated_sequence(g1, num=4)
g2.reset(seed=12345); print_generated_sequence(g2, num=15)
g3.reset(seed=12345); print_generated_sequence(g3, num=4)
FakerGenerator
gives access to any of the methods supported by the faker module. Here are a couple of examples.
In [19]:
g = FakerGenerator(method='name')
In [20]:
print_generated_sequence(g, num=8, seed=12345)
In [21]:
g = FakerGenerator(method='address')
In [22]:
print_generated_sequence(g, num=8, seed=12345, sep='\n---\n')
IterateOver
is a generator which simply iterates over a given sequence. Note that once the generator has been exhausted (by iterating over all its elements), it needs to be reset before it can produce elements again.
In [23]:
seq = ['a', 'b', 'c', 'd', 'e']
In [24]:
g = IterateOver(seq)
In [25]:
g.reset()
print([x for x in g])
print([x for x in g])
g.reset()
print([x for x in g])
In [26]:
some_items = ['aa', 'bb', 'cc', 'dd', 'ee']
In [27]:
g = SelectOne(some_items)
In [28]:
print_generated_sequence(g, num=30, seed=12345)
By default, all possible values are chosen with equal probability, but this can be changed by passing a distribution as the parameter p
.
In [29]:
g = SelectOne(some_items, p=[0.1, 0.05, 0.7, 0.03, 0.12])
In [30]:
print_generated_sequence(g, num=30, seed=99999)
We can see that the item 'cc'
has the highest chance of being selected (70%), followed by 'ee'
and 'aa'
(12% and 10%, respectively).
Timestamp
produces random timestamps between a start and end time (both inclusive).
In [31]:
g = Timestamp(start='1998-03-01 00:02:00', end='1998-03-01 00:02:15')
In [32]:
print_generated_sequence(g, num=10, sep='\n', seed=99999)
If start
or end
are dates of the form YYYY-MM-DD
(without the exact HH:MM:SS
timestamp), they are interpreted as start='YYYY-MM-DD 00:00:00
and end='YYYY-MM-DD 23:59:59'
, respectively - i.e., as the beginning and the end of the day.
In [33]:
g = Timestamp(start='2018-02-14', end='2018-02-18')
In [34]:
print_generated_sequence(g, num=5, sep='\n', seed=12345)
For convenience, one can also pass a single date, which will produce timestamps during this particular date.
In [35]:
g = Timestamp(date='2018-01-01')
In [36]:
print_generated_sequence(g, num=5, sep='\n', seed=12345)
Note that the generated items are datetime
objects (even though they appear as strings when printed above).
In [37]:
g.reset(seed=12345)
[next(g), next(g), next(g)]
Out[37]:
We can use the .strftime()
method to create another generator which returns timestamps as strings instead of datetime objects.
In [38]:
h = Timestamp(date='2018-01-01').strftime('%-d %b %Y, %H:%M (%a)')
In [39]:
h.reset(seed=12345)
[next(h), next(h), next(h)]
Out[39]:
In [40]:
g = CharString(length=15)
print_generated_sequence(g, num=5, seed=12345)
print_generated_sequence(g, num=5, seed=99999)
It is possible to explicitly specify the character set.
In [41]:
g = CharString(length=12, charset="ABCDEFG")
print_generated_sequence(g, num=5, sep='\n', seed=12345)
There are also a few pre-defined character sets.
In [42]:
g1 = CharString(length=12, charset="<lowercase>")
g2 = CharString(length=12, charset="<alphanumeric_uppercase>")
print_generated_sequence(g1, num=5, sep='\n', seed=12345); print()
print_generated_sequence(g2, num=5, sep='\n', seed=12345)
DigitString
is the same as CharString
with charset='0123456789'
.
In [43]:
g = DigitString(length=15)
print_generated_sequence(g, num=5, seed=12345)
print_generated_sequence(g, num=5, seed=99999)
Generates a sequence of sequentially numbered strings with a given prefix.
In [44]:
g = Sequential(prefix='Foo_', digits=3)
Calling reset()
on the generator makes the numbering start from 1 again.
In [45]:
g.reset()
print_generated_sequence(g, num=5)
print_generated_sequence(g, num=5)
print()
g.reset()
print_generated_sequence(g, num=5)
Note that the method Sequential.reset()
supports the seed
argument for consistency with other generators, but its value is ignored - the generator is simply reset to its initial value. This is illustrated here:
In [46]:
g.reset(seed=12345); print_generated_sequence(g, num=5)
g.reset(seed=99999); print_generated_sequence(g, num=5)
In [ ]: