Algorithms Exercise 3

Imports


In [83]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

In [84]:
from IPython.html.widgets import interact

Character counting and entropy

Write a function char_probs that takes a string and computes the probabilities of each character in the string:

  • First do a character count and store the result in a dictionary.
  • Then divide each character counts by the total number of character to compute the normalized probabilties.
  • Return the dictionary of characters (keys) and probabilities (values).

In [85]:
o='ahjshd'
list(o)
x,y=letter_prob(list(o))
dict(zip(x,y))


Out[85]:
{'a': 0.16666666666666666,
 'd': 0.16666666666666666,
 'h': 0.3333333333333333,
 'j': 0.16666666666666666,
 's': 0.16666666666666666}

In [86]:
def letter_prob(data):
    letter_dictionary={}
    for i in data:
        if i not in letter_dictionary:
            letter_dictionary[i]=1
        else:
            letter_dictionary[i]=letter_dictionary[i]+1
    x=list(letter_dictionary)
    y=list(letter_dictionary.values())
    for i in range(len(x)):
        y[i]=y[i]/(len(data))
    return x,y

In [87]:
def char_probs(s):
    """Find the probabilities of the unique characters in the string s.
    
    Parameters
    ----------
    s : str
        A string of characters.
    
    Returns
    -------
    probs : dict
        A dictionary whose keys are the unique characters in s and whose values
        are the probabilities of those characters.
    """
    S=list(s)
    letter2, prob2 =letter_prob(S)
    ans=dict(zip(letter2,prob2))
    return ans

In [88]:
test1 = char_probs('aaaa')
assert np.allclose(test1['a'], 1.0)
test2 = char_probs('aabb')
assert np.allclose(test2['a'], 0.5)
assert np.allclose(test2['b'], 0.5)
test3 = char_probs('abcd')
assert np.allclose(test3['a'], 0.25)
assert np.allclose(test3['b'], 0.25)
assert np.allclose(test3['c'], 0.25)
assert np.allclose(test3['d'], 0.25)

The entropy is a quantiative measure of the disorder of a probability distribution. It is used extensively in Physics, Statistics, Machine Learning, Computer Science and Information Science. Given a set of probabilities $P_i$, the entropy is defined as:

$$H = - \Sigma_i P_i \log_2(P_i)$$

In this expression $\log_2$ is the base 2 log (np.log2), which is commonly used in information science. In Physics the natural log is often used in the definition of entropy.

Write a funtion entropy that computes the entropy of a probability distribution. The probability distribution will be passed as a Python dict: the values in the dict will be the probabilities.

To compute the entropy, you should:

  • First convert the values (probabilities) of the dict to a Numpy array of probabilities.
  • Then use other Numpy functions (np.log2, etc.) to compute the entropy.
  • Don't use any for or while loops in your code.

In [89]:
entropy({'a': 0.5, 'b': 0.5})


Out[89]:
1.0

In [90]:
def entropy(d):
    """Compute the entropy of a dict d whose values are probabilities."""
    x=np.array(list(char_probs(d)))
    x
    y=np.array(list(char_probs(d).values()))
    y
    z=list(zip(x,y))
    H=-sum(y*np.log2(y))
    return H

In [91]:
assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
assert np.allclose(entropy({'a': 1.0}), 0.0)

In [115]:
interact?

Use IPython's interact function to create a user interface that allows you to type a string into a text box and see the entropy of the character probabilities of the string.


In [117]:
d=interact(char_probs,s=(''))


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-117-d60a97893f79> in <module>()
----> 1 interact(entropy, d=interact(char_probs,s=('')))

/usr/local/lib/python3.4/dist-packages/IPython/html/widgets/interaction.py in interact(__interact_f, **kwargs)
    308         #        ...
    309         f = __interact_f
--> 310         w = interactive(f, **kwargs)
    311         try:
    312             f.widget = w

/usr/local/lib/python3.4/dist-packages/IPython/html/widgets/interaction.py in interactive(__interact_f, **kwargs)
    194     getcallargs(f, **{n:v for n,v,_ in new_kwargs})
    195     # Now build the widgets from the abbreviations.
--> 196     kwargs_widgets.extend(_widgets_from_abbreviations(new_kwargs))
    197 
    198     # This has to be done as an assignment, not using container.children.append,

/usr/local/lib/python3.4/dist-packages/IPython/html/widgets/interaction.py in _widgets_from_abbreviations(seq)
    153     result = []
    154     for name, abbrev, default in seq:
--> 155         widget = _widget_from_abbrev(abbrev, default)
    156         if not widget.description:
    157             widget.description = name

/usr/local/lib/python3.4/dist-packages/IPython/html/widgets/interaction.py in _widget_from_abbrev(abbrev, default)
    113             pass
    114     if widget is None:
--> 115         raise ValueError("%r cannot be transformed to a Widget" % (abbrev,))
    116     return widget
    117 

ValueError: <function char_probs at 0x7f839eaa9ae8> cannot be transformed to a Widget

In [93]:
assert True # use this for grading the pi digits histogram

In [ ]: