Algorithms Exercise 3

Imports



In [6]:

    
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np



In [7]:

    
from IPython.html.widgets import interact

Character counting and entropy

Write a function char_probs that takes a string and computes the probabilities of each character in the string:

First do a character count and store the result in a dictionary.
Then divide each character counts by the total number of character to compute the normalized probabilties.
Return the dictionary of characters (keys) and probabilities (values).



In [8]:

    
def char_probs(s):
    """Find the probabilities of the unique characters in the string s.
    
    Parameters
    ----------
    s : str
        A string of characters.
    
    Returns
    -------
    probs : dict
        A dictionary whose keys are the unique characters in s and whose values
        are the probabilities of those characters.
    """
    x = []
    for char in s:
        x.append(char)
    r = {x[i]: x.count(x[i])/len(x) for i in range(len(x))}
    return(r)



In [9]:

    
char_probs("abbc")









    Out[9]:





{'a': 0.25, 'b': 0.5, 'c': 0.25}



In [10]:

    
test1 = char_probs('aaaa')
assert np.allclose(test1['a'], 1.0)
test2 = char_probs('aabb')
assert np.allclose(test2['a'], 0.5)
assert np.allclose(test2['b'], 0.5)
test3 = char_probs('abcd')
assert np.allclose(test3['a'], 0.25)
assert np.allclose(test3['b'], 0.25)
assert np.allclose(test3['c'], 0.25)
assert np.allclose(test3['d'], 0.25)

The entropy is a quantiative measure of the disorder of a probability distribution. It is used extensively in Physics, Statistics, Machine Learning, Computer Science and Information Science. Given a set of probabilities $P_i$, the entropy is defined as:

$$H = - \Sigma_i P_i \log_2(P_i)$$

In this expression $\log_2$ is the base 2 log (np.log2), which is commonly used in information science. In Physics the natural log is often used in the definition of entropy.

Write a funtion entropy that computes the entropy of a probability distribution. The probability distribution will be passed as a Python dict: the values in the dict will be the probabilities.

To compute the entropy, you should:

First convert the values (probabilities) of the dict to a Numpy array of probabilities.
Then use other Numpy functions (np.log2, etc.) to compute the entropy.
Don't use any for or while loops in your code.



In [11]:

    
def entropy(d):
    """Compute the entropy of a dict d whose values are probabilities."""
    a = np.array(list(d.values()))
    entropy = -(sum(a))*np.log2(a)
    return(entropy)



In [12]:

    
entropy({'a': 0.25, 'b': 0.5, 'c': 0.25})









    Out[12]:





array([ 1.,  2.,  2.])



In [13]:

    
assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
assert np.allclose(entropy({'a': 1.0}), 0.0)

Use IPython's interact function to create a user interface that allows you to type a string into a text box and see the entropy of the character probabilities of the string.



In [14]:

    
from IPython.display import display



In [15]:

    
def new_func(n):
    """Made so interact can take a string and output entropy of char_prob("string")"""
    a = char_probs(n)
    print(entropy(a))



In [16]:

    
interact(new_func, n = "aaa");









    



[ 1.5849625  1.5849625  1.5849625]



In [ ]:

    
assert True # use this for grading the pi digits histogram