Q1

In this question, we'll compute some basic probabilities of events using loops, lists, and dictionaries.

Part A

The Polya urn model is a popular model for both statistics and to illustrate certain mental exercises.

polyaurn

Typically, these exercises involve randomly selecting colored balls, and these selection exercises can vary the properties of the remaining contents of the urn. A common question to ask is: given some number of colors and some number of balls, what are the chances of randomly selecting a ball of a specific color?

Write a function which:

is named urn_to_dict
takes 1 argument: a list of color names (e.g. "blue", "red", "green", etc)
returns 1 value: a dictionary, with color names for keys and the frequency counts of those colors as values

The contents of the urn will be handed to you in a list form (the input argument), where each element of the list represents a ball in an urn, and the element itself will be a certain color. You then need to count how many times each color occurs in the list, and assemble those counts in the dictionary that your function should return.

For example, the list ["blue", "blue", "green", "blue"] should result in the dictionary {"blue": 3, "green": 1}. Use the urn_dict dictionary object to store the results.



In [ ]:



In [ ]:

    
u1 = ["green", "green", "blue", "green"]
a1 = set({("green", 3), ("blue", 1)})
assert a1 == set(urn_to_dict(u1).items())



In [ ]:

    
u2 = ["red", "blue", "blue", "green", "yellow", "black", "black", "green", "blue", "yellow", "red", "green", "blue", "black", "yellow", "yellow", "yellow", "green", "blue", "red", "red", "blue", "red", "blue", "yellow", "yellow", "yellow"]
a2 = set({('black', 3), ('blue', 7), ('green', 4), ('red', 5), ('yellow', 8)})
assert a2 == set(urn_to_dict(u2).items())

Part B

In this part, you'll write code to compute the probabilities of certain colors using the dictionary object in the previous part. Your code will receive a dictionary of colors with their relative counts (i.e., the output of Part A), and a "query" color, and you will need to return the chances of randomly selecting a ball of that query color.

Write a function which:

is named chances_of_color
takes 2 arguments: a dictionary mapping colors to counts (output of Part A), and a string that will contain a query color
returns 1 value: a floating-point number, the probability of selecting the "query" color at random

Remember, probability is a fraction: the numerator is the number of occurrences of the event you're interested in, and the denominator is the number of all possible events. It's kind of like an average.

For example, if the input dictionary is {"red": 3, "blue": 1} and the query color is "blue", then the fraction you would return is 1/4, or 0.25 (probabilities should always be between 0 and 1).



In [ ]:



In [ ]:

    
import numpy.testing as t
c1 = {"blue": 3, "red": 1}
t.assert_allclose(chances_of_color(c1, "blue"), 0.75)



In [ ]:

    
import numpy.testing as t
c2 = {"red": 934, "blue": 493859, "yellow": 31, "green": 3892, "black": 487}
t.assert_allclose(chances_of_color(c2, "green"), 0.007796427505443677)



In [ ]:

    
import numpy.testing as t
c3 = {"red": 5, "blue": 5, "yellow": 5, "green": 5, "black": 5}
t.assert_allclose(chances_of_color(c2, "orange"), 0.0)

Part C

In this part, you'll do the opposite of what you implemented in Part B: you'll get a dictionary and a query color, but you'll need to return the chances of drawing a ball that is not the same color as the query.

Write a function which:

is named chances_of_not_color
takes 2 arguments: a dictionary mapping colors to counts (output of Part A), and a string that will contain a query color
returns 1 value: a floating-point number, the probability of NOT selecting the "query" color at random

For example, if the input dictionary is {"red": 3, "blue": 1} and the query color is "blue", then the fraction you would return is 3/4, or 0.75.

HINT: You can use the function you wrote in Part B to help!



In [ ]:



In [ ]:

    
import numpy.testing as t
c1 = {"blue": 3, "red": 1}
t.assert_allclose(chances_of_not_color(c1, "blue"), 0.25)



In [ ]:

    
import numpy.testing as t
c2 = {"red": 934, "blue": 493859, "yellow": 31, "green": 3892, "black": 487}
t.assert_allclose(chances_of_not_color(c2, "blue"), 0.010705063871811693)



In [ ]:

    
import numpy.testing as t
c3 = {"red": 5, "blue": 5, "yellow": 5, "green": 5, "black": 5}
t.assert_allclose(chances_of_not_color(c2, "orange"), 1.0)

Part D

Even more interesting is when we start talking about combinations of colors. Let's say I'm reaching into a Polya urn to pull out two balls; it's valuable to know what my chances of at least 1 ball being a certain color would be.

Write a function which:

is named select_chances
takes 3 arguments: a list of colors of balls in an urn (same as input to Part A), an integer number (number of balls to draw out of the urn), and a string containing a single color
returns 1 value: a floating-point number, the probability that at least one ball from the "number" drawn from the urn is the specified color

Remember, you compute probability exactly as before--the number of events of interest (selecting a certain number of balls with at least one of a certain color) divided by the total number of possible events (all possible draws)--only this time you'll need to account for combinations of multiple balls.

For example, if I give you an urn list of ["blue", "green", "red"], the number 2, and the query color "blue", then you would return 2/3, or 0.66666 (There are three possible combinations of groupings of 2 balls: blue-green, blue-red, and green-red. Two of these three combinations contain the query color blue).

HINT: It will be very, very helpful if make use of the itertools module for generating combinations of colored balls. If you can't remember how the module works, consult its documentation. Seriously though, it will vastly simplify your life in this question.



In [ ]:



In [ ]:

    
import numpy.testing as t
q1 = ["blue", "green", "red"]
t.assert_allclose(select_chances(q1, 2, "red"), 2/3)



In [ ]:

    
q2 = ["red", "blue", "blue", "green", "yellow", "black", "black", "green", "blue", "yellow", "red", "green", "blue", "black", "yellow", "yellow", "yellow", "green", "blue", "red", "red", "blue", "red", "blue", "yellow", "yellow", "yellow"]
t.assert_allclose(select_chances(q2, 3, "red"), 0.4735042735042735)

Part E

One final wrinkle: let's say I'm no longer picking colored balls simultaneously from the urn, but rather in sequence--that is, one right after the other. Now I can ask, for a given urn and a certain number of balls I'm going to pick, what are the chances that I draw a ball of a certain color first?

For example, if I give you an urn list of ["blue", "green", "red"], the number 2, and the query color "blue", then you would return 2/6, or 0.33333.

(There are six possible ways of drawing two balls in sequence:

BLUE then GREEN
BLUE then RED
GREEN then BLUE
GREEN then RED
RED then GREEN
RED then BLUE

and two of those six involve drawing the blue one first)

Write a function which:

is named select_chances_first
takes 3 arguments: a list of colors in the urn (same input as Part A and Part D), an integer number of balls to draw in sequence, and a string containing the query color for the first draw
returns 1 value: a floating-point number, the probability of drawing the query color first in a sequence of draws of the specified length

You are welcome to again use itertools.



In [ ]:



In [ ]:

    
import numpy.testing as t
q1 = ["blue", "green", "red"]
t.assert_allclose(select_chances_first(q1, 2, "red"), 2/6)



In [ ]:

    
q2 = ["red", "blue", "blue", "green", "yellow", "black", "black", "green", "blue", "yellow", "red", "green", "blue", "black", "yellow", "yellow", "yellow", "green", "blue", "red", "red", "blue", "red", "blue", "yellow", "yellow", "yellow"]
t.assert_allclose(select_chances_first(q2, 3, "red"), 0.18518518518518517)