Algorithms Exercise 3

Imports


In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

In [2]:
from IPython.html.widgets import interact


:0: FutureWarning: IPython widgets are experimental and may change in the future.

Character counting and entropy

Write a function char_probs that takes a string and computes the probabilities of each character in the string:

  • First do a character count and store the result in a dictionary.
  • Then divide each character counts by the total number of character to compute the normalized probabilties.
  • Return the dictionary of characters (keys) and probabilities (values).

In [10]:
def char_probs(s):
    """Find the probabilities of the unique characters in the string s.
    
    Parameters
    ----------
    s : str
        A string of characters.
    
    Returns
    -------
    probs : dict
        A dictionary whose keys are the unique characters in s and whose values
        are the probabilities of those characters.
    """
    alp={'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z':0}
    a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z=0
    alph=[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z]
    for let in s:
        if let=='q':
            q+=1
        if let=='w':
            w+=1
        if let=='e':
            e+=1
        if let=='r':
            r+=1
        if let=='t':
            t+=1
        if let=='y':
            y+=1
        if let=='u':
            u+=1
        if let=='i':
            i+=1
        if let=='o':
            o+=1
        if let=='p':
            p+=1
        if let=='a':
            a+=1
        if let=='s':
            s+=1
        if let=='d':
            d+=1
        if let=='f':
            f+=1
        if let=='g':
            g+=1
        if let=='h':
            h+=1
        if let=='j':
            j+=1
        if let=='k':
            k+=1
        if let=='l':
            l+=1
        if let=='z':
            z+=1
        if let=='x':
            x+=1
        if let=='c':
            c+=1
        if let=='v':
            v+=1
        if let=='b':
            b+=1
        if let=='n':
            n+=1
        if let=='m':
            m+=1
        Prob=[]
        L=len(s)
        for let in alph:
            prob=let/L
            Prob.append(prob)
        return Prob

In [11]:
test1 = char_probs('aaaa')
assert np.allclose(test1['a'], 1.0)
test2 = char_probs('aabb')
assert np.allclose(test2['a'], 0.5)
assert np.allclose(test2['b'], 0.5)
test3 = char_probs('abcd')
assert np.allclose(test3['a'], 0.25)
assert np.allclose(test3['b'], 0.25)
assert np.allclose(test3['c'], 0.25)
assert np.allclose(test3['d'], 0.25)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-7884295ffbe7> in <module>()
----> 1 test1 = char_probs('aaaa')
      2 assert np.allclose(test1['a'], 1.0)
      3 test2 = char_probs('aabb')
      4 assert np.allclose(test2['a'], 0.5)
      5 assert np.allclose(test2['b'], 0.5)

<ipython-input-10-2c0256a59df8> in char_probs(s)
     14     """
     15     alp={'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z':0}
---> 16     a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z=0
     17     alph=[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z]
     18     for let in s:

TypeError: 'int' object is not iterable

The entropy is a quantiative measure of the disorder of a probability distribution. It is used extensively in Physics, Statistics, Machine Learning, Computer Science and Information Science. Given a set of probabilities $P_i$, the entropy is defined as:

$$H = - \Sigma_i P_i \log_2(P_i)$$

In this expression $\log_2$ is the base 2 log (np.log2), which is commonly used in information science. In Physics the natural log is often used in the definition of entropy.

Write a funtion entropy that computes the entropy of a probability distribution. The probability distribution will be passed as a Python dict: the values in the dict will be the probabilities.

To compute the entropy, you should:

  • First convert the values (probabilities) of the dict to a Numpy array of probabilities.
  • Then use other Numpy functions (np.log2, etc.) to compute the entropy.
  • Don't use any for or while loops in your code.

In [ ]:
def entropy(d):
    """Compute the entropy of a dict d whose values are probabilities."""
    # YOUR CODE HERE
    raise NotImplementedError()

In [ ]:
assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
assert np.allclose(entropy({'a': 1.0}), 0.0)

Use IPython's interact function to create a user interface that allows you to type a string into a text box and see the entropy of the character probabilities of the string.


In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

In [ ]:
assert True # use this for grading the pi digits histogram