Algorithms Exercise 3

Imports


In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

In [2]:
from IPython.html.widgets import interact


:0: FutureWarning: IPython widgets are experimental and may change in the future.

Character counting and entropy

Write a function char_probs that takes a string and computes the probabilities of each character in the string:

  • First do a character count and store the result in a dictionary.
  • Then divide each character counts by the total number of character to compute the normalized probabilties.
  • Return the dictionary of characters (keys) and probabilities (values).

In [12]:
def char_probs(s):
    """Find the probabilities of the unique characters in the string s.
    
    Parameters
    ----------
    s : str
        A string of characters.
    
    Returns
    -------
    probs : dict
        A dictionary whose keys are the unique characters in s and whose values
        are the probabilities of those characters.
    """
    split_string = s.split()
    string_dict = {}
    
    for item in split_string:
        string_dict[item] = 0
        
    for item in split_string:
        if item in string_dict.keys():
            return string_dict

In [36]:
?splitlines


Object `splitlines` not found.

In [16]:
def count_words(a_string):
    """Return a word count dictionary from the list of words in data."""
    split_string = a_string.splitlines()
    string_dict = {}
  #first populate the dictionary with the keys being each word in the string, all having zero for their values.
    for item in split_string:
        string_dict[item] = 0

  #Then cycle through the split string again and if the word is one of the keys in the dictionary add 1 each time.
    for item in split_string:
        if item in string_dict.keys():
            string_dict[item] += 1

    return string_dict

In [17]:
test1 = char_probs('aaaa')
assert np.allclose(test1['a'], 1.0)
test2 = char_probs('aabb')
assert np.allclose(test2['a'], 0.5)
assert np.allclose(test2['b'], 0.5)
test3 = char_probs('abcd')
assert np.allclose(test3['a'], 0.25)
assert np.allclose(test3['b'], 0.25)
assert np.allclose(test3['c'], 0.25)
assert np.allclose(test3['d'], 0.25)


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-17-7884295ffbe7> in <module>()
      1 test1 = char_probs('aaaa')
----> 2 assert np.allclose(test1['a'], 1.0)
      3 test2 = char_probs('aabb')
      4 assert np.allclose(test2['a'], 0.5)
      5 assert np.allclose(test2['b'], 0.5)

KeyError: 'a'

The entropy is a quantiative measure of the disorder of a probability distribution. It is used extensively in Physics, Statistics, Machine Learning, Computer Science and Information Science. Given a set of probabilities $P_i$, the entropy is defined as:

$$H = - \Sigma_i P_i \log_2(P_i)$$

In this expression $\log_2$ is the base 2 log (np.log2), which is commonly used in information science. In Physics the natural log is often used in the definition of entropy.

Write a funtion entropy that computes the entropy of a probability distribution. The probability distribution will be passed as a Python dict: the values in the dict will be the probabilities.

To compute the entropy, you should:

  • First convert the values (probabilities) of the dict to a Numpy array of probabilities.
  • Then use other Numpy functions (np.log2, etc.) to compute the entropy.
  • Don't use any for or while loops in your code.

In [27]:
?np.sum

In [38]:
def entropy(d):
    """Compute the entropy of a dict d whose values are probabilities."""
    H = -np.sum(d*np.log2(d))
    return H

In [31]:
assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
assert np.allclose(entropy({'a': 1.0}), 0.0)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-2b809aa64202> in <module>()
----> 1 assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
      2 assert np.allclose(entropy({'a': 1.0}), 0.0)

<ipython-input-30-606c1c6e57c6> in entropy(d)
      1 def entropy(d):
      2     """Compute the entropy of a dict d whose values are probabilities."""
----> 3     H = -np.sum(d*np.log2*(d))
      4     return H

TypeError: unsupported operand type(s) for *: 'dict' and 'numpy.ufunc'

Use IPython's interact function to create a user interface that allows you to type a string into a text box and see the entropy of the character probabilities of the string.


In [40]:
# I am going to go for credit if at all possible here:

interact(entropy, d = 'Text Box')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-67464006ab42> in entropy(d)
      1 def entropy(d):
      2     """Compute the entropy of a dict d whose values are probabilities."""
----> 3     H = -np.sum(d*np.log2(d))
      4     return H

TypeError: Not implemented for this type
Out[40]:
<function __main__.entropy>

In [ ]:
assert True # use this for grading the pi digits histogram