# Discrete Random Variables and Sampling

### George Tzanetakis, University of Victoria

In this notebook we will explore discrete random variables and sampling. After defining a helper class and associated functions we will be able to create both symbolic and numeric random variables and generate samples from them.

Define a helper random variable class based on the scipy discrete random variable functionality providing both numeric and symbolic RVs

``````

In :

%matplotlib inline
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np

``````
``````

In :

class Random_Variable:

def __init__(self, name, values, probability_distribution):
self.name = name
self.values = values
self.probability_distribution = probability_distribution
if all(type(item) is np.int64 for item in values):
self.type = 'numeric'
self.rv = stats.rv_discrete(name = name, values = (values, probability_distribution))
elif all(type(item) is str for item in values):
self.type = 'symbolic'
self.rv = stats.rv_discrete(name = name, values = (np.arange(len(values)), probability_distribution))
self.symbolic_values = values
else:
self.type = 'undefined'

def sample(self,size):
if (self.type =='numeric'):
return self.rv.rvs(size=size)
elif (self.type == 'symbolic'):
numeric_samples = self.rv.rvs(size=size)
mapped_samples = [values[x] for x in numeric_samples]
return mapped_samples

``````

Let's first create some random samples of symbolic random variables corresponding to a coin and a dice

``````

In :

values = ['H', 'T']
probabilities = [0.9, 0.1]
coin = Random_Variable('coin', values, probabilities)
samples = coin.sample(20)
print(samples)

``````
``````

['H', 'H', 'H', 'H', 'H', 'T', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'T', 'H', 'H']

``````
``````

In :

values = ['1', '2', '3', '4', '5', '6']
probabilities = [1/6.] * 6
dice = Random_Variable('dice', values, probabilities)
samples = dice.sample(10)
print(samples);
 * 10
[1 / 6.] * 3

``````
``````

['6', '3', '3', '3', '6', '6', '4', '3', '3', '6']

Out:

[0.16666666666666666, 0.16666666666666666, 0.16666666666666666]

``````

Now let's look at a numeric random variable corresponding to a dice so that we can more easily make plots and histograms

``````

In :

values = np.arange(1,7)
probabilities = [1/6.] * 6
dice = Random_Variable('dice', values, probabilities)
samples = dice.sample(100)
plt.stem(samples, markerfmt= ' ')

``````
``````

Out:

<Container object of 3 artists>

``````

Let's now look at a histogram of these generated samples. Notice that even with 500 samples the bars are not equal length so the calculated frequencies are only approximating the probabilities used to generate them

``````

In :

plt.figure()
plt.hist(samples,bins=[1,2,3,4,5,6,7],normed=1, rwidth=0.5,align='left');

``````
``````

``````

Let's plot the cumulative histogram of the samples

``````

In :

plt.hist(samples,bins=[1,2,3,4,5,6,7],normed=1, rwidth=0.5,align='left', cumulative=True);

``````
``````

``````

Let's now estimate the frequency of the event roll even number in different ways. First let's count the number of even numbers in the generated samples. Then let's take the sum of the counts of the individual estimated probabilities.

``````

In :

# we can also write the predicates directly using lambda notation
est_even = len([x for x in samples if x%2==0]) / len(samples)
est_2 = len([x for x in samples if x==2]) / len(samples)
est_4 = len([x for x in samples if x==4]) / len(samples)
est_6 = len([x for x in samples if x==6]) / len(samples)
print(est_even)
# Let's print some estimates
print('Estimates of 2,4,6 = ', (est_2, est_4, est_6))
print('Direct estimate = ', est_even)
print('Sum of estimates = ', est_2 + est_4 + est_6)
print('Theoretical value = ', 0.5)

``````
``````

0.495
Estimates of 2,4,6 =  (0.156, 0.169, 0.17)
Direct estimate =  0.495
Sum of estimates =  0.495
Theoretical value =  0.5

``````

Notice that we can always estimate the probability of an event by simply counting how many times it occurs in the samples of an experiment. However if we have multiple events we are interested in then it can be easier to calculate the probabilities of the values of invdividual random variables and then use the rules of probability to estimate the probabilities of more complex events.