In [83]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
In [84]:
from IPython.html.widgets import interact
Write a function char_probs that takes a string and computes the probabilities of each character in the string:
In [85]:
o='ahjshd'
list(o)
x,y=letter_prob(list(o))
dict(zip(x,y))
Out[85]:
In [86]:
def letter_prob(data):
letter_dictionary={}
for i in data:
if i not in letter_dictionary:
letter_dictionary[i]=1
else:
letter_dictionary[i]=letter_dictionary[i]+1
x=list(letter_dictionary)
y=list(letter_dictionary.values())
for i in range(len(x)):
y[i]=y[i]/(len(data))
return x,y
In [87]:
def char_probs(s):
"""Find the probabilities of the unique characters in the string s.
Parameters
----------
s : str
A string of characters.
Returns
-------
probs : dict
A dictionary whose keys are the unique characters in s and whose values
are the probabilities of those characters.
"""
S=list(s)
letter2, prob2 =letter_prob(S)
ans=dict(zip(letter2,prob2))
return ans
In [88]:
test1 = char_probs('aaaa')
assert np.allclose(test1['a'], 1.0)
test2 = char_probs('aabb')
assert np.allclose(test2['a'], 0.5)
assert np.allclose(test2['b'], 0.5)
test3 = char_probs('abcd')
assert np.allclose(test3['a'], 0.25)
assert np.allclose(test3['b'], 0.25)
assert np.allclose(test3['c'], 0.25)
assert np.allclose(test3['d'], 0.25)
The entropy is a quantiative measure of the disorder of a probability distribution. It is used extensively in Physics, Statistics, Machine Learning, Computer Science and Information Science. Given a set of probabilities $P_i$, the entropy is defined as:
$$H = - \Sigma_i P_i \log_2(P_i)$$In this expression $\log_2$ is the base 2 log (np.log2), which is commonly used in information science. In Physics the natural log is often used in the definition of entropy.
Write a funtion entropy that computes the entropy of a probability distribution. The probability distribution will be passed as a Python dict: the values in the dict will be the probabilities.
To compute the entropy, you should:
dict to a Numpy array of probabilities.np.log2, etc.) to compute the entropy.for or while loops in your code.
In [89]:
entropy({'a': 0.5, 'b': 0.5})
Out[89]:
In [90]:
def entropy(d):
"""Compute the entropy of a dict d whose values are probabilities."""
x=np.array(list(char_probs(d)))
x
y=np.array(list(char_probs(d).values()))
y
z=list(zip(x,y))
H=-sum(y*np.log2(y))
return H
In [91]:
assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
assert np.allclose(entropy({'a': 1.0}), 0.0)
In [115]:
interact?
Use IPython's interact function to create a user interface that allows you to type a string into a text box and see the entropy of the character probabilities of the string.
In [117]:
d=interact(char_probs,s=(''))
In [93]:
assert True # use this for grading the pi digits histogram
In [ ]: