# Algorithms Exercise 3

## Imports

``````

In [6]:

%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

``````
``````

In [7]:

from IPython.html.widgets import interact

``````

## Character counting and entropy

Write a function `char_probs` that takes a string and computes the probabilities of each character in the string:

• First do a character count and store the result in a dictionary.
• Then divide each character counts by the total number of character to compute the normalized probabilties.
• Return the dictionary of characters (keys) and probabilities (values).
``````

In [8]:

def char_probs(s):
"""Find the probabilities of the unique characters in the string s.

Parameters
----------
s : str
A string of characters.

Returns
-------
probs : dict
A dictionary whose keys are the unique characters in s and whose values
are the probabilities of those characters.
"""
x = []
for char in s:
x.append(char)
r = {x[i]: x.count(x[i])/len(x) for i in range(len(x))}
return(r)

``````
``````

In [9]:

char_probs("abbc")

``````
``````

Out[9]:

{'a': 0.25, 'b': 0.5, 'c': 0.25}

``````
``````

In [10]:

test1 = char_probs('aaaa')
assert np.allclose(test1['a'], 1.0)
test2 = char_probs('aabb')
assert np.allclose(test2['a'], 0.5)
assert np.allclose(test2['b'], 0.5)
test3 = char_probs('abcd')
assert np.allclose(test3['a'], 0.25)
assert np.allclose(test3['b'], 0.25)
assert np.allclose(test3['c'], 0.25)
assert np.allclose(test3['d'], 0.25)

``````

The entropy is a quantiative measure of the disorder of a probability distribution. It is used extensively in Physics, Statistics, Machine Learning, Computer Science and Information Science. Given a set of probabilities \$P_i\$, the entropy is defined as:

\$\$H = - \Sigma_i P_i \log_2(P_i)\$\$

In this expression \$\log_2\$ is the base 2 log (`np.log2`), which is commonly used in information science. In Physics the natural log is often used in the definition of entropy.

Write a funtion `entropy` that computes the entropy of a probability distribution. The probability distribution will be passed as a Python `dict`: the values in the `dict` will be the probabilities.

To compute the entropy, you should:

• First convert the values (probabilities) of the `dict` to a Numpy array of probabilities.
• Then use other Numpy functions (`np.log2`, etc.) to compute the entropy.
• Don't use any `for` or `while` loops in your code.
``````

In [11]:

def entropy(d):
"""Compute the entropy of a dict d whose values are probabilities."""
a = np.array(list(d.values()))
entropy = -(sum(a))*np.log2(a)
return(entropy)

``````
``````

In [12]:

entropy({'a': 0.25, 'b': 0.5, 'c': 0.25})

``````
``````

Out[12]:

array([ 1.,  2.,  2.])

``````
``````

In [13]:

assert np.allclose(entropy({'a': 0.5, 'b': 0.5}), 1.0)
assert np.allclose(entropy({'a': 1.0}), 0.0)

``````

Use IPython's `interact` function to create a user interface that allows you to type a string into a text box and see the entropy of the character probabilities of the string.

``````

In [14]:

from IPython.display import display

``````
``````

In [15]:

def new_func(n):
"""Made so interact can take a string and output entropy of char_prob("string")"""
a = char_probs(n)
print(entropy(a))

``````
``````

In [16]:

interact(new_func, n = "aaa");

``````
``````

[ 1.5849625  1.5849625  1.5849625]

``````
``````

In [ ]:

assert True # use this for grading the pi digits histogram

``````