Algorithms Exercise 3

Imports


In [92]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

In [93]:
from IPython.html.widgets import interact

Character counting and entropy

Write a function char_probs that takes a string and computes the probabilities of each character in the string:

  • First do a character count and store the result in a dictionary.
  • Then divide each character counts by the total number of character to compute the normalized probabilties.
  • Return the dictionary of characters (keys) and probabilities (values).

In [94]:
def char_probs(s):
    """Find the probabilities of the unique characters in the string s.
    
    Parameters
    ----------
    s : str
        A string of characters.
    
    Returns
    -------
    probs : dict
        A dictionary whose keys are the unique characters in s and whose values
        are the probabilities of those characters.
    """
#     # YOUR CODE HERE
#     f = s.split()

#     b = 0
#     a = []
#     while b < len(f):
#         a.append(''.join([c for c in f[b]]))
#         b+=1
#     return a

#     return s[1]
# char_probs('aaaa')


    result_dict = dict([(i, s.count(i)) for i in s])
    prob = dict([(l, result_dict[l]/len(s)) for l in s])
    return prob
char_probs('aaaa')


Out[94]:
{'a': 1.0}

In [ ]:


In [95]:
test1 = char_probs('aaaa')
assert np.allclose(test1['a'], 1.0)
test2 = char_probs('aabb')
assert np.allclose(test2['a'], 0.5)
assert np.allclose(test2['b'], 0.5)
test3 = char_probs('abcd')
assert np.allclose(test3['a'], 0.25)
assert np.allclose(test3['b'], 0.25)
assert np.allclose(test3['c'], 0.25)
assert np.allclose(test3['d'], 0.25)

The entropy is a quantiative measure of the disorder of a probability distribution. It is used extensively in Physics, Statistics, Machine Learning, Computer Science and Information Science. Given a set of probabilities $P_i$, the entropy is defined as:

$$H = - \Sigma_i P_i \log_2(P_i)$$

In this expression $\log_2$ is the base 2 log (np.log2), which is commonly used in information science. In Physics the natural log is often used in the definition of entropy.

Write a funtion entropy that computes the entropy of a probability distribution. The probability distribution will be passed as a Python dict: the values in the dict will be the probabilities.

To compute the entropy, you should:

  • First convert the values (probabilities) of the dict to a Numpy array of probabilities.
  • Then use other Numpy functions (np.log2, etc.) to compute the entropy.
  • Don't use any for or while loops in your code.

In [96]:
def entropy(d):
    """Compute the entropy of a dict d whose values are probabilities."""
    #prob = np.array(d[1])

    prob = sorted(d.items(), key=lambda d: d[1],reverse = True)
    
    prob_1 = [x[1] for x in prob]
    H = -sum(prob_1 * np.log2(prob_1))
    return H
entropy({'a': 0.5, 'b': 0.5})


Out[96]:
1.0

In [97]:
assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
assert np.allclose(entropy({'a': 1.0}), 0.0)

Use IPython's interact function to create a user interface that allows you to type a string into a text box and see the entropy of the character probabilities of the string.


In [107]:
w = interact(entropy,d = ('Enter String Here'))

#I have to do char_probs of the string that is entered somehow. Once I do that it will work fine.


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-96-3fea0d984368> in entropy(d)
      3     #prob = np.array(d[1])
      4 
----> 5     prob = sorted(d.items(), key=lambda d: d[1],reverse = True)
      6 
      7     prob_1 = [x[1] for x in prob]

AttributeError: 'str' object has no attribute 'items'

In [ ]:
assert True # use this for grading the pi digits histogram