In [1]:
from discrete_probability import *

First we want to call init_printing in sympy to get SymPy to output nice MathJax LaTeX.


In [2]:
import sympy
sympy.init_printing()

Now we'll define our simple example with data to use later.


In [3]:
E, H, S = variables = map(Variable, 'EHS')
e, e_ = E
h, h_ = H
s, s_ = S
data_header = [H, S, E]
data_samples = [
    [True, False, True],
    [True, False, True],
    [False, True, False],
    [False, False, True],
    [True, False, False],
    [True, False, True],
    [False, False, False],
    [True, False, True],
    [True, False, True],
    [False, False, True],
    [True, False, True],
    [True, True, True],
    [True, False, True],
    [True, True, True],
    [True, False, True],
    [True, False, True]]
data_assignments = data_to_assignments(data_header, data_samples)

A quick example of using SymPy with pyDiscreteProbability is enforcing simplified fractions for probabilities. For example, here is the table from the basic example.


In [4]:
P = Table(variables)
P.learn_from_complete_data(data_assignments)
P


Out[4]:
H     | S     | E     | P(H,S,E)
---------------------------------
True  | True  | True  | 0.125
True  | True  | False | 0.0
True  | False | True  | 0.5625
True  | False | False | 0.0625
False | True  | True  | 0.0
False | True  | False | 0.0625
False | False | True  | 0.125
False | False | False | 0.0625

By using the SymPy number system we can let SymPy handle and preserve the original fractions instead of relying on floating point numbers.


In [5]:
P = Table(variables, number_system=sympy_number_system)
P.learn_from_complete_data(data_assignments)
P


Out[5]:
H     | S     | E     | P(H,S,E)
---------------------------------
True  | True  | True  | 1/8
True  | True  | False | 0
True  | False | True  | 9/16
True  | False | False | 1/16
False | True  | True  | 0
False | True  | False | 1/16
False | False | True  | 1/8
False | False | False | 1/16

The number system is preserved through operations as well.


In [6]:
P(E,H)


Out[6]:
H     | E     | P(H,E)
-----------------------
True  | True  | 11/16
True  | False | 1/16
False | True  | 1/8
False | False | 1/8

In [7]:
P(E|S,H)


Out[7]:
H     | S     || E     | P(E|H,S)
----------------------------------
True  | True  || True  | 1
True  | True  || False | 0
True  | False || True  | 9/10
True  | False || False | 1/10
False | True  || True  | 0
False | True  || False | 1
False | False || True  | 2/3
False | False || False | 1/3

We can also use SymPy to store expressions in our probability table. First we'll define the symbols that will form our probability expressions.


In [8]:
x, y, z = sympy.symbols('x y z')

Now ee'll create a simple joint marginal Table. We need to set it to ignore validity or else it will raise an error because the Table is not "valid" since the probabilities do not sum to one. We'll also tell it to use the SymPy number system. By default Table uses a float number system.


In [9]:
A, B, C = variables = map(Variable, 'ABC')
a, a_ = A
b, b_ = B
c, c_ = C
P = Table(variables, ignore_validity=True, number_system=sympy_number_system)

Now we'll define our probability table some simple expressions as a silly demonstration of using SymPy with the table.


In [10]:
P[a , b , c ] = 0.11
P[a , b , c_] = 0.05+z
P[a , b_, c ] = 0.07
P[a , b_, c_] = 0.21
P[a_, b , c ] = 0.32
P[a_, b , c_] = 0.04
P[a_, b_, c ] = sympy.sqrt(x)
P[a_, b_, c_] = y
P


Out[10]:
A     | C     | B     | P(A,C,B)
---------------------------------
True  | True  | True  | 0.11
True  | True  | False | 0.07
True  | False | True  | z + 0.05
True  | False | False | 0.21
False | True  | True  | 0.32
False | True  | False | sqrt(x)
False | False | True  | 0.04
False | False | False | y

Notice that the table output preserves the expressions thanks to SymPy. Now we'll define some constraints so we can demonstrate SymPy's solver. The following two constraints are simply to fully define the system.


In [11]:
constraint_one = sympy.Eq(P(a, b), sympy.S('0.15'))
constraint_one


Out[11]:
$$\frac{z + 0.16}{\sqrt{x} + y + z + 0.8} = 0.15$$

In [12]:
constraint_two = sympy.Eq(x, sympy.S('2*y'))
constraint_two


Out[12]:
$$x = 2 y$$

This constraint below, however, is fundamental. It is an expression of the second fundamental axiom of probability which states that the sum of probabilities of all possible world instantiations must be equal to one.


In [13]:
second_axiom = sympy.Eq(P.total_probability(), sympy.S('1'))
second_axiom


Out[13]:
$$\sqrt{x} + y + z + 0.8 = 1$$

Now we'll use SymPy's solver to find appropriate values for $x$ and $y$. I will admit that I couldn't get it to behave with the first fundamental axiom of probability which requires that the individual world probabilites be greater than or equal to zero. Instead the example is contrived to return positive solutions.


In [14]:
soln = sympy.solve([second_axiom, constraint_one, constraint_two])[0]
soln


Out[14]:
$$\begin{Bmatrix}x : 0.036724942437403, & y : 0.0183624712187015, & z : -0.01\end{Bmatrix}$$

To demonstrate the validity of the solution we'll redefine the table with it. The normalize_number_system method ensures that the number types all match the table's number system. This means it will convert the Python floats to SymPy floats.


In [15]:
P[a , b , c ] = 0.11
P[a , b , c_] = 0.05+soln[z]
P[a , b_, c ] = 0.07
P[a , b_, c_] = 0.21
P[a_, b , c ] = 0.32
P[a_, b , c_] = 0.04
P[a_, b_, c ] = sympy.sqrt(soln[x])
P[a_, b_, c_] = soln[y]
P.normalize_number_system()
print(P)


A     | C     | B     | P(A,C,B)
---------------------------------
True  | True  | True  | 0.110000000000000
True  | True  | False | 0.0700000000000000
True  | False | True  | 0.0400000000000000
True  | False | False | 0.210000000000000
False | True  | True  | 0.320000000000000
False | True  | False | 0.191637528781298
False | False | True  | 0.0400000000000000
False | False | False | 0.0183624712187015

Now we can check that the solution satisfies the second axiom of probability.


In [16]:
print(P.total_probability())
print(P.is_valid)


1.00000000000000
True