Solution to a problem posted here:
Say I have a list of valid X = [1, 2, 3, 4, 5] and a list of valid Y = [1, 2, 3, 4, 5].
I need to generate all combinations of every element in X and every element in Y (in this case, 25) and get those combinations in random order.
This in itself would be simple, but there is an additional requirement: In this random order, there cannot be a repetition of the same x in succession. For example, this is okay:
[1, 3]
[2, 5]
[1, 2]
...
[1, 4]
This is not:
[1, 3]
[1, 2] <== the "1" cannot repeat, because there was already one before
[2, 5]
...
[1, 4]
Now, the least efficient idea would be to simply randomize the full set as long as there are no more repetitions. My approach was a bit different, repeatedly creating a shuffled variant of X, and a list of all Y * X, then picking a random next one from that. So far, I've come up with this:
[...]
But I'm sure this can be done even more efficiently or in a more succinct way?
Also, my solution first goes through all X values before continuing with the full set again, which is not perfectly random. I can live with that for my particular application case.
My solution uses NumPy
In [1]:
import numpy as np
Here are some example values for x and y. I assume that there are no repeated values in x.
In [2]:
xs = np.arange(10, 14)
ys = np.arange(20, 25)
print(xs, ys)
indices
is the list of indices I'll choose from at random:
In [3]:
n = len(xs)
m = len(ys)
indices = np.arange(n)
Now I'll make an array to hold the values of y:
In [4]:
array = np.tile(ys, (n, 1))
print(array)
And shuffle the rows independently
In [5]:
[np.random.shuffle(array[i]) for i in range(n)]
print(array)
I'll keep track of how many unused ys there are in each row
In [6]:
counts = np.full_like(xs, m)
print(counts)
Now I'll choose a row, using the counts as weights
In [7]:
weights = np.array(counts, dtype=float)
weights /= np.sum(weights)
print(weights)
i
is the row I chose, which corresponds to a value of x.
In [8]:
i = np.random.choice(indices, p=weights)
print(i)
Now I decrement the counter associated with i
, assemble a pair by choosing a value of x and a value of y.
I also clobber the array value I used, which is not necessary, but helps with visualization.
In [9]:
counts[i] -= 1
pair = xs[i], array[i, counts[i]]
array[i, counts[i]] = -1
print(pair)
We can check that the counts got decremented
In [10]:
print(counts)
And one of the values in array got used
In [11]:
print(array)
The next time through is almost the same, except that when we assemble the weights, we give zero weight to the index we just used.
In [12]:
weights = np.array(counts, dtype=float)
weights[i] = 0
weights /= np.sum(weights)
print(weights)
Everything else is the same
In [13]:
i = np.random.choice(indices, p=weights)
counts[i] -= 1
pair = xs[i], array[i, counts[i]]
array[i, counts[i]] = -1
print(pair)
In [14]:
print(counts)
In [15]:
print(array)
Now we can wrap all that up in a function, using a special value for i
during the first iteration.
In [16]:
def generate_pairs(xs, ys):
n = len(xs)
m = len(ys)
indices = np.arange(n)
array = np.tile(ys, (n, 1))
[np.random.shuffle(array[i]) for i in range(n)]
counts = np.full_like(xs, m)
i = -1
for _ in range(n * m):
weights = np.array(counts, dtype=float)
if i != -1:
weights[i] = 0
weights /= np.sum(weights)
i = np.random.choice(indices, p=weights)
counts[i] -= 1
pair = xs[i], array[i, counts[i]]
array[i, counts[i]] = -1
yield pair
And here's how it works:
In [17]:
for pairs in generate_pairs(xs, ys):
print(pairs)
Inside the loop, we have to copy the weights, add them up, and choose a random index using the weights. These are all linear in n
. So the overall complexity to generate all pairs is O(n^2 m)
In [ ]: