Algorithms Exercise 3

Imports


In [14]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

In [15]:
from IPython.html.widgets import interact

Character counting and entropy

Write a function char_probs that takes a string and computes the probabilities of each character in the string:

  • First do a character count and store the result in a dictionary.
  • Then divide each character counts by the total number of character to compute the normalized probabilties.
  • Return the dictionary of characters (keys) and probabilities (values).

In [52]:
def char_probs(s):
    """Find the probabilities of the unique characters in the string s.
    
    Parameters
    ----------
    s : str
        A string of characters.
    
    Returns
    -------
    probs : dict
        A dictionary whose keys are the unique characters in s and whose values
        are the probabilities of those characters.
    """
    # YOUR CODE HERE
    h =    
    for i in s:
        h.append(str(s))
    e = np.array(h)
    g = char_probs(e)
    y = np.frequency(g)

In [53]:
test1 = char_probs('aaaa')
assert np.allclose(test1['a'], 1.0)
test2 = char_probs('aabb')
assert np.allclose(test2['a'], 0.5)
assert np.allclose(test2['b'], 0.5)
test3 = char_probs('abcd')
assert np.allclose(test3['a'], 0.25)
assert np.allclose(test3['b'], 0.25)
assert np.allclose(test3['c'], 0.25)
assert np.allclose(test3['d'], 0.25)


---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-53-7884295ffbe7> in <module>()
----> 1 test1 = char_probs('aaaa')
      2 assert np.allclose(test1['a'], 1.0)
      3 test2 = char_probs('aabb')
      4 assert np.allclose(test2['a'], 0.5)
      5 assert np.allclose(test2['b'], 0.5)

<ipython-input-52-687e58d6cf44> in char_probs(s)
     18         h.append(str(s))
     19     e = np.array(h)
---> 20     g = char_probs(e)
     21     y = np.frequency(g)
     22 

<ipython-input-52-687e58d6cf44> in char_probs(s)
     18         h.append(str(s))
     19     e = np.array(h)
---> 20     g = char_probs(e)
     21     y = np.frequency(g)
     22 

<ipython-input-52-687e58d6cf44> in char_probs(s)
     18         h.append(str(s))
     19     e = np.array(h)
---> 20     g = char_probs(e)
     21     y = np.frequency(g)
     22 

<ipython-input-52-687e58d6cf44> in char_probs(s)
     18         h.append(str(s))
     19     e = np.array(h)
---> 20     g = char_probs(e)
     21     y = np.frequency(g)
     22 

<ipython-input-52-687e58d6cf44> in char_probs(s)
     18         h.append(str(s))
     19     e = np.array(h)
---> 20     g = char_probs(e)
     21     y = np.frequency(g)
     22 

<ipython-input-52-687e58d6cf44> in char_probs(s)
     18         h.append(str(s))
     19     e = np.array(h)
---> 20     g = char_probs(e)
     21     y = np.frequency(g)
     22 

<ipython-input-52-687e58d6cf44> in char_probs(s)
     18         h.append(str(s))
     19     e = np.array(h)
---> 20     g = char_probs(e)
     21     y = np.frequency(g)
     22 

<ipython-input-52-687e58d6cf44> in char_probs(s)
     16     h = [s]
     17     for i in s:
---> 18         h.append(str(s))
     19     e = np.array(h)
     20     g = char_probs(e)

/usr/local/lib/python3.4/dist-packages/numpy/core/numeric.py in array_str(a, max_line_width, precision, suppress_small)
   1713 
   1714     """
-> 1715     return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
   1716 
   1717 def set_string_function(f, repr=True):

/usr/local/lib/python3.4/dist-packages/numpy/core/arrayprint.py in array2string(a, max_line_width, precision, suppress_small, separator, prefix, style, formatter)
    452     else:
    453         lst = _array2string(a, max_line_width, precision, suppress_small,
--> 454                             separator, prefix, formatter=formatter)
    455     return lst
    456 

/usr/local/lib/python3.4/dist-packages/numpy/core/arrayprint.py in _array2string(a, max_line_width, precision, suppress_small, separator, prefix, formatter)
    326     lst = _formatArray(a, format_function, len(a.shape), max_line_width,
    327                        next_line_prefix, separator,
--> 328                        _summaryEdgeItems, summary_insert)[:-1]
    329     return lst
    330 

/usr/local/lib/python3.4/dist-packages/numpy/core/arrayprint.py in _formatArray(a, format_function, rank, max_line_len, next_line_prefix, separator, edge_items, summary_insert)
    498             s, line = _extendLine(s, line, word, max_line_len, next_line_prefix)
    499 
--> 500         word = format_function(a[-1])
    501         s, line = _extendLine(s, line, word, max_line_len, next_line_prefix)
    502         s += line + "]\n"

/usr/local/lib/python3.4/dist-packages/numpy/core/arrayprint.py in repr_format(x)
    229 
    230 def repr_format(x):
--> 231     return repr(x)
    232 
    233 def _array2string(a, max_line_width, precision, suppress_small, separator=' ',

KeyboardInterrupt: 

The entropy is a quantiative measure of the disorder of a probability distribution. It is used extensively in Physics, Statistics, Machine Learning, Computer Science and Information Science. Given a set of probabilities $P_i$, the entropy is defined as:

$$H = - \Sigma_i P_i \log_2(P_i)$$

In this expression $\log_2$ is the base 2 log (np.log2), which is commonly used in information science. In Physics the natural log is often used in the definition of entropy.

Write a funtion entropy that computes the entropy of a probability distribution. The probability distribution will be passed as a Python dict: the values in the dict will be the probabilities.

To compute the entropy, you should:

  • First convert the values (probabilities) of the dict to a Numpy array of probabilities.
  • Then use other Numpy functions (np.log2, etc.) to compute the entropy.
  • Don't use any for or while loops in your code.

In [50]:
def entropy(d):
    """Compute the entropy of a dict d whose values are probabilities."""
    # YOUR CODE HERE
    np.array(d)

In [51]:
assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
assert np.allclose(entropy({'a': 1.0}), 0.0)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-51-2b809aa64202> in <module>()
----> 1 assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
      2 assert np.allclose(entropy({'a': 1.0}), 0.0)

/usr/local/lib/python3.4/dist-packages/numpy/core/numeric.py in allclose(a, b, rtol, atol)
   2217     y = array(y, dtype=dtype, copy=False)
   2218 
-> 2219     xinf = isinf(x)
   2220     yinf = isinf(y)
   2221     if any(xinf) or any(yinf):

TypeError: ufunc 'isinf' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Use IPython's interact function to create a user interface that allows you to type a string into a text box and see the entropy of the character probabilities of the string.


In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

In [ ]:
assert True # use this for grading the pi digits histogram