```
In [ ]:
```from __future__ import print_function, division
from collections import Counter
import numpy as np

```
In [ ]:
```def is_anagram(word1, word2):
"""Checks whether the words are anagrams.
word1: string
word2: string
returns: boolean
"""
return Counter(word1) == Counter(word2)

```
In [ ]:
```is_anagram('tachymetric', 'mccarthyite')

```
In [ ]:
```is_anagram('banana', 'peach')

**Exercise:** The `Counter`

class inherits from `dict`

so all methods and functions that work with a dictionary will also work with a `Counter`

.

Read the documentation of Counter, then use a `Counter`

to find the three most common letters in the word "pneumonoultramicroscopicsilicovolcanoconiosis".

```
In [ ]:
```# Solution
Counter('pneumonoultramicroscopicsilicovolcanoconiosis').most_common(3)

A `Counter`

is a natural representation of a multiset, which is a set where the elements can appear more than once.
You could use multisets for a game like Scrabble to see if a given set of tiles can be used to spell a given word.

**Exercise:** Write a definition for a class called `Multiset`

that inherits from `Counter`

and defines an additional method called `is_subset`

, which should take `self`

and `other`

as parameters, where `other`

is another `Multiset`

.

It should check whether `self`

is a subset of `other`

; for multisets, that means that every element of `self`

appears in `other`

with at least the same frequency. For example, `aa`

is a subset of `aaab`

, but `aabb`

is not.

```
In [ ]:
```# Solution
class Multiset(Counter):
"""A multiset is a set where elements can appear more than once."""
def is_subset(self, other):
"""Checks whether self is a subset of other.
other: Multiset
returns: boolean
"""
for char, count in self.items():
if other[char] < count:
return False
return True

`Multiset.is_subset`

to check whether a particular word can be spelled using a particular set of tiles.

```
In [ ]:
```def can_spell(word, tiles):
"""Checks whether a set of tiles can spell a word.
word: string
tiles: string
returns: boolean
"""
return Multiset(word).is_subset(Multiset(tiles))

```
In [ ]:
```can_spell('SYZYGY', 'AGSYYYZ')

```
In [ ]:
```can_spell('omelette', 'breaking a few eggs')

**Optional Exercise:** If you change the name of `is_subset`

to `__le__`

, you can use the `<=`

operator to test whether one `Multiset`

is a subset of another.

You can also extend `Counter`

to represent a probability mass function (PMF). A PMF is a map from possible outcomes to their probabilities. The probabilities in a PMF are "normalized" if they add up to 1 (and they are all non-negative).

The following `PMF`

class inherits from `Counter`

and adds the following methods:

`normalize`

computes the total of the frequencies and divides through, yielding probabilities that add to 1.`__add__`

enumerates all pairs of outcomes and returns a new Pmf that represents the distribution of the sum.`render`

returns the outcomes and probabilities in a form ready for plotting.

```
In [ ]:
```class Pmf(Counter):
"""A Counter with probabilities."""
def normalize(self):
"""Normalizes the PMF so the probabilities add to 1."""
total = sum(self.values())
for key in self:
self[key] /= total
def __add__(self, other):
"""Adds two distributions.
The result is the distribution of sums of outcomes from the
two distributions.
Note that this method is only correct if the selections from
the two distributions are independent; that is, if the outcome
of the first selection does not affect the probabilities of
the outcomes for the second selection.
other: Pmf
returns: new Pmf
"""
pmf = Pmf()
for key1, prob1 in self.items():
for key2, prob2 in other.items():
pmf[key1 + key2] += prob1 * prob2
return pmf
def render(self):
"""Returns outcomes and their probabilities, suitable for plotting."""
return zip(*sorted(self.items()))

As an example, we can make a Pmf object that represents a 6-sided die.

```
In [ ]:
```d6 = Pmf([1,2,3,4,5,6])
d6.normalize()
d6.name = 'one die'
print(d6)

Using the add operator, we can compute the distribution for the sum of two dice.

```
In [ ]:
```d6_twice = d6 + d6
d6_twice.name = 'two dice'
for key, prob in d6_twice.items():
print(key, prob)

Using `sum`

or `np.sum`

, we can compute the distribution for the sum of three dice.

```
In [ ]:
```# if we use the built-in sum we have to provide a Pmf additive identity
pmf_ident = Pmf([0])
d6_thrice = sum([d6]*3, pmf_ident)

```
In [ ]:
```# with np.sum, we don't need an identity
d6_thrice = np.sum([d6]*3)
d6_thrice.name = 'three dice'

And then plot the results (using Pmf.render)

```
In [ ]:
```import matplotlib.pyplot as plt
%matplotlib inline

```
In [ ]:
```for die in [d6, d6_twice, d6_thrice]:
xs, ys = die.render()
plt.plot(xs, ys, label=die.name, linewidth=3, alpha=0.5)
plt.xlabel('Total')
plt.ylabel('Probability')
plt.legend()
plt.show()

**Exercise:** Suppose you are fighting an orc who will die if he suffers 9 or more hit points of damage. You attack successfully with short sword and dagger, so you can roll a d6 and a d4 for total damage. What is the probability that you kill the orc?

```
In [ ]:
```# Solution
d4 = Pmf([1,2,3,4])
d4.normalize()
hp = d6 + d4
hp

```
In [ ]:
```# Solution
hp[9] + hp[10]

A `Suite`

is a `Pmf`

that represents a set of hypotheses and their probabilities; it provides `bayesian_update`

, which updates the probability of the hypotheses based on new data.

`Suite`

is an abstract parent class; child classes should provide a `likelihood`

method that evaluates the likelihood of the data under a given hypothesis. `update_bayesian`

loops through the hypotheses, evaluates the likelihood of the data under each hypothesis, and updates the probabilities accordingly. Then it re-normalizes the PMF.

```
In [ ]:
```class Suite(Pmf):
"""Map from hypothesis to probability."""
def bayesian_update(self, data):
"""Performs a Bayesian update.
Note: called bayesian_update to avoid overriding dict.update
data: result of a die roll
"""
for hypo in self:
like = self.likelihood(data, hypo)
self[hypo] *= like
self.normalize()
def print_probs(self):
for hypo in sorted(self):
print(hypo, self[hypo])

As an example, I'll use `Suite`

to solve the "Dice Problem," from Chapter 3 of *Think Bayes*.

"Suppose I have a box of dice that contains a 4-sided die, a 6-sided die, an 8-sided die, a 12-sided die, and a 20-sided die. If you have ever played Dungeons & Dragons, you know what I am talking about. Suppose I select a die from the box at random, roll it, and get a 6. What is the probability that I rolled each die?"

I'll start by defining `DiceSuite`

, which inherits `bayesian_update`

from Suite and provides `likelihood`

.

`data`

is the observed die roll, 6 in the example.

`hypo`

is the hypothetical number of sides on the die.

If `data > hypo`

, that means the outcome exceeds the number of sides on the die; that's impossible, so it has probability 0.

Otherwise, the probability of any outcome is `1/hypo`

, where `hypo`

is the number of sides on the die.

```
In [ ]:
```class DiceSuite(Suite):
def likelihood(self, data, hypo):
"""Computes the likelihood of the data under the hypothesis.
data: result of a die roll
hypo: integer number of sides on the die
"""
if data > hypo:
return 0
else:
return 1/hypo

Now we can make a `DiceSuite`

object that represents the possible number of sides on the die. By default, all dice have the same prior probability.

Then I update the distribution with the given outcome and print the results:

```
In [ ]:
```dice_suite = DiceSuite([4, 6, 8, 12, 20])
dice_suite.normalize()
dice_suite.print_probs()

```
In [ ]:
```dice_suite.bayesian_update(6)
dice_suite.print_probs()

As expected, the 4-sided die has been eliminated; it now has 0 probability. The 6-sided die is the most likely, but the 8-sided die is still quite possible.

Now suppose I roll the die again and get an 8. We can update the Suite again with the new data.

```
In [ ]:
```dice_suite.bayesian_update(8)
dice_suite.print_probs()

**Exercise:** Draw a UML class diagram that shows the relationships among all classes in this notebook, plus `Counter`

and `dict`

.

**Exercise:** Suppose you know that up to 100 sequentially-numbered raffle tickets have been sold, and you think it is equally likely that the number sold is anywhere from 1 to 100. You find a randomly discarded ticket that is number 37. What is the probability that it is the winning ticket?

Hint: Define a class named `TicketSuite`

that inherits from `Suite`

and provides an appropriate `likelihood`

function. It should also define a method that computes the average probability of winning, weighted by the probability of each outcome.

Note: This exercise is pretty challenging if you are not familiar with Bayesian statistics. But if it intrigues you, consider taking Computational Bayesian Statistics.

```
In [ ]:
```# Solution
class TicketSuite(Suite):
def likelihood(self, data, hypo):
"""Computes the likelihood of the data under the hypothesis.
data: result of a die roll
hypo: integer number of sides on the die
"""
if data > hypo:
return 0
else:
return 1/hypo
def weighted_average(self):
"""Compute the probability of winning.
If the number of tickets is hypo, the probability of
winning is 1/hypo.
"""
total = 0
for hypo, prob in self.items():
total += prob / hypo
return total

```
In [ ]:
```# Solution
ts = TicketSuite(range(1, 101))
ts.bayesian_update(37)
xs, ys = ts.render()
plt.plot(xs, ys, linewidth=3, alpha=0.5)
plt.xlabel('Number of tickets')
plt.ylabel('Probability')
plt.show()

```
In [ ]:
```# Solution
ts.weighted_average()