This notebook contains tests for tohu's primitive generators.
In [1]:
    
import tohu
from tohu.v4.primitive_generators import *
from tohu.v4.dispatch_generators import *
from tohu.v4.utils import print_generated_sequence
    
In [2]:
    
print(f'Tohu version: {tohu.__version__}')
    
    
Constant simply returns the same, constant value every time.
In [3]:
    
g = Constant('quux')
    
In [4]:
    
print_generated_sequence(g, num=10, seed=12345)
    
    
Boolean returns either True or False, optionally with different probabilities.
In [5]:
    
g1 = Boolean()
g2 = Boolean(p=0.8)
    
In [6]:
    
print_generated_sequence(g1, num=20, seed=12345)
print_generated_sequence(g2, num=20, seed=99999)
    
    
Integer returns a random integer between low and high (both inclusive).
In [7]:
    
g = Integer(low=100, high=200)
    
In [8]:
    
print_generated_sequence(g, num=10, seed=12345)
    
    
Float returns a random float between low and high (both inclusive).
In [9]:
    
g = Float(low=2.3, high=4.2)
    
In [10]:
    
print_generated_sequence(g, num=10, sep='\n', fmt='.12f', seed=12345)
    
    
HashDigest returns hex strings representing hash digest values (or alternatively raw bytes).
In [11]:
    
g = HashDigest(length=6)
    
In [12]:
    
print_generated_sequence(g, num=10, seed=12345)
    
    
In [13]:
    
g = HashDigest(length=6, uppercase=False)
    
In [14]:
    
print_generated_sequence(g, num=10, seed=12345)
    
    
In [15]:
    
g = HashDigest(length=10, as_bytes=True)
    
In [16]:
    
print_generated_sequence(g, num=5, seed=12345, sep='\n')
    
    
This generator can produce random numbers using any of the random number generators supported by numpy.
In [17]:
    
g1 = NumpyRandomGenerator(method="normal", loc=3.0, scale=5.0)
g2 = NumpyRandomGenerator(method="poisson", lam=30)
g3 = NumpyRandomGenerator(method="exponential", scale=0.3)
    
In [18]:
    
g1.reset(seed=12345); print_generated_sequence(g1, num=4)
g2.reset(seed=12345); print_generated_sequence(g2, num=15)
g3.reset(seed=12345); print_generated_sequence(g3, num=4)
    
    
FakerGenerator gives access to any of the methods supported by the faker module. Here are a couple of examples.
In [19]:
    
g = FakerGenerator(method='name')
    
In [20]:
    
print_generated_sequence(g, num=8, seed=12345)
    
    
In [21]:
    
g = FakerGenerator(method='address')
    
In [22]:
    
print_generated_sequence(g, num=8, seed=12345, sep='\n---\n')
    
    
IterateOver is a generator which simply iterates over a given sequence. Note that once the generator has been exhausted (by iterating over all its elements), it needs to be reset before it can produce elements again.
In [23]:
    
seq = ['a', 'b', 'c', 'd', 'e']
    
In [24]:
    
g = IterateOver(seq)
    
In [25]:
    
g.reset()
print([x for x in g])
print([x for x in g])
g.reset()
print([x for x in g])
    
    
In [26]:
    
some_items = ['aa', 'bb', 'cc', 'dd', 'ee']
    
In [27]:
    
g = SelectOne(some_items)
    
In [28]:
    
print_generated_sequence(g, num=30, seed=12345)
    
    
By default, all possible values are chosen with equal probability, but this can be changed by passing a distribution as the parameter p.
In [29]:
    
g = SelectOne(some_items, p=[0.1, 0.05, 0.7, 0.03, 0.12])
    
In [30]:
    
print_generated_sequence(g, num=30, seed=99999)
    
    
We can see that the item 'cc' has the highest chance of being selected (70%), followed by 'ee' and 'aa' (12% and 10%, respectively).
Timestamp produces random timestamps between a start and end time (both inclusive).
In [31]:
    
g = Timestamp(start='1998-03-01 00:02:00', end='1998-03-01 00:02:15')
    
In [32]:
    
print_generated_sequence(g, num=10, sep='\n', seed=99999)
    
    
If start or end are dates of the form YYYY-MM-DD (without the exact HH:MM:SS timestamp), they are interpreted as start='YYYY-MM-DD 00:00:00 and end='YYYY-MM-DD 23:59:59', respectively - i.e., as the beginning and the end of the day.
In [33]:
    
g = Timestamp(start='2018-02-14', end='2018-02-18')
    
In [34]:
    
print_generated_sequence(g, num=5, sep='\n', seed=12345)
    
    
For convenience, one can also pass a single date, which will produce timestamps during this particular date.
In [35]:
    
g = Timestamp(date='2018-01-01')
    
In [36]:
    
print_generated_sequence(g, num=5, sep='\n', seed=12345)
    
    
Note that the generated items are datetime objects (even though they appear as strings when printed above).
In [37]:
    
g.reset(seed=12345)
[next(g), next(g), next(g)]
    
    Out[37]:
We can use the .strftime() method to create another generator which returns timestamps as strings instead of datetime objects.
In [38]:
    
h = Timestamp(date='2018-01-01').strftime('%-d %b %Y, %H:%M (%a)')
    
In [39]:
    
h.reset(seed=12345)
[next(h), next(h), next(h)]
    
    Out[39]:
In [40]:
    
g = CharString(length=15)
print_generated_sequence(g, num=5, seed=12345)
print_generated_sequence(g, num=5, seed=99999)
    
    
It is possible to explicitly specify the character set.
In [41]:
    
g = CharString(length=12, charset="ABCDEFG")
print_generated_sequence(g, num=5, sep='\n', seed=12345)
    
    
There are also a few pre-defined character sets.
In [42]:
    
g1 = CharString(length=12, charset="<lowercase>")
g2 = CharString(length=12, charset="<alphanumeric_uppercase>")
print_generated_sequence(g1, num=5, sep='\n', seed=12345); print()
print_generated_sequence(g2, num=5, sep='\n', seed=12345)
    
    
DigitString is the same as CharString with charset='0123456789'.
In [43]:
    
g = DigitString(length=15)
print_generated_sequence(g, num=5, seed=12345)
print_generated_sequence(g, num=5, seed=99999)
    
    
Generates a sequence of sequentially numbered strings with a given prefix.
In [44]:
    
g = Sequential(prefix='Foo_', digits=3)
    
Calling reset() on the generator makes the numbering start from 1 again.
In [45]:
    
g.reset()
print_generated_sequence(g, num=5)
print_generated_sequence(g, num=5)
print()
g.reset()
print_generated_sequence(g, num=5)
    
    
Note that the method Sequential.reset() supports the seed argument for consistency with other generators, but its value is ignored - the generator is simply reset to its initial value. This is illustrated here:
In [46]:
    
g.reset(seed=12345); print_generated_sequence(g, num=5)
g.reset(seed=99999); print_generated_sequence(g, num=5)
    
    
In [ ]: