1. Making Plots (15 Points)

  1. Using numpy and matplotlib, plot the exponential distribution with $\lambda = 1$, $1.5$, and $2$. Include a legend that uses $\LaTeX$
  2. Now plot the binomial distribution for $N = 10$, $p = 0.2$ and $N = 10$, $p = 0.5$. Include a legend.
  3. Plot $2^x$ and $x^2$ from $x = 1$ to $x = 8$.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

In [2]:
#Question 1.1
x = np.linspace(0,3, 1000)
exp1 = 0.1 * np.exp(-1.0 * x)
exp2 = 1 * np.exp(-1.5 * x)
exp3 = 10 * np.exp(-2 * x)

plt.plot(x, exp1, label='$\lambda = 1.0$')
plt.plot(x, exp2, label='$\lambda = 1.5$')
plt.plot(x, exp3, label='$\lambda = 2.0$')
plt.legend(loc='upper right')
plt.show()



In [3]:
#Quesion 1.2
from scipy.special import comb
N = 10
p = 0.2
x = np.arange(0, N + 1)
b1 = comb(N, x) * p**(x) * (1 - p)**(N - x)

p = 0.5
b2 = comb(N, x) * p**(x) * (1 - p)**(N - x)

plt.plot(x, b1, 'o-', label="$N = 10$, $p = 0.2$")
plt.plot(x, b2, 'o-', label="$N = 10$, $p = 0.9$")
plt.legend()
plt.show()



In [4]:
#Question 1.3
x = np.linspace(1, 8,100)
plt.plot(x, 2**x, label='$2^x$')
plt.plot(x, x**2, label='$x^2$')
plt.legend(loc='upper left')
plt.show()


2. Customizing Plots I (8 Points)

If you execute the cell below, it will write the contents of the cell to a file called che116.mplstyle. We will use this file in the future, so hold onto it. To load this style and use it, execute plt.style.use('che116.mplstyle'). Make the following changes to the file, write it, and then plot three interesting lines on the samge graph:

  1. Make the first line plotted be colored red.
  2. Make figures be 10.4 by 7.15 inches.
  3. Make a grid be visible
  4. Make two other changes and document them using a comment in the file

View the comments in this file to learn what all the parameters do


In [5]:
%%writefile che116.mplstyle

#set the font-size and size of things
figure.figsize: 5, 3
axes.labelsize: 14.3
axes.titlesize: 15.6
xtick.labelsize: 13
ytick.labelsize: 13
legend.fontsize: 13

grid.linewidth: 1.3
lines.linewidth: 2.275
patch.linewidth: 0.39
lines.markersize: 9.1
lines.markeredgewidth: 0

xtick.major.width: 1.3
ytick.major.width: 1.3
xtick.minor.width: 0.65
ytick.minor.width: 0.65

xtick.major.pad: 9.1
ytick.major.pad: 9.1

    
axes.xmargin        : 0
axes.ymargin        : 0

#setup our colorscheme

patch.facecolor: 348ABD  # blue
patch.edgecolor: EEEEEE
patch.antialiased: True

font.size: 12.0
text.color: black

axes.facecolor: E5E5E5
axes.edgecolor: bcbcbc
axes.linewidth: 1
axes.grid: False
axes.labelcolor: 555555
axes.axisbelow: True       # grid/ticks are below elements (e.g., lines, text)

axes.prop_cycle: cycler('color', ['444444', '348ABD', '988ED5', '777777', 'FBC15E', '8EBA42', 'FFB5B8'])
# E24A33 : red
# 348ABD : blue
# 988ED5 : purple
# 777777 : gray
# FBC15E : yellow
# 8EBA42 : green
# FFB5B8 : pink

xtick.color: 555555
xtick.direction: out

ytick.color: 555555
ytick.direction: out

grid.color: white
grid.linestyle: -    # solid line

figure.facecolor: white
figure.edgecolor: 0.50

#animation settings
animation.html : html5


Writing che116.mplstyle

In [6]:
#Incorrect answer -> no changes
plt.style.use('che116.mplstyle')
from math import pi

x = np.linspace(0, 2 * pi, 100)
plt.plot(x, np.sin(x))
plt.plot(x, np.sin(x)**2)
plt.plot(x, np.fabs(np.sin(x)))
plt.show()


Answer

This cell has the correct format below


In [7]:
%%writefile che116.mplstyle

#set the font-size and size of things
figure.figsize: 10.4, 7.15
axes.labelsize: 14.3
axes.titlesize: 15.6
xtick.labelsize: 13
ytick.labelsize: 13
legend.fontsize: 13

grid.linewidth: 1.3
lines.linewidth: 2.275
patch.linewidth: 0.39
lines.markersize: 9.1
lines.markeredgewidth: 0

xtick.major.width: 1.3
ytick.major.width: 1.3
xtick.minor.width: 0.65
ytick.minor.width: 0.65

xtick.major.pad: 9.1
ytick.major.pad: 9.1

    
axes.xmargin        : 0
axes.ymargin        : 0

#setup our colorscheme

patch.facecolor: 348ABD  # blue
patch.edgecolor: EEEEEE
patch.antialiased: True

font.size: 12.0
text.color: black

axes.facecolor: E5E5E5
axes.edgecolor: bcbcbc
axes.linewidth: 1
axes.grid: True
axes.labelcolor: 555555
axes.axisbelow: True       # grid/ticks are below elements (e.g., lines, text)

axes.prop_cycle: cycler('color', ['E24A33', '348ABD', '988ED5', '777777', 'FBC15E', '8EBA42', 'FFB5B8'])
# E24A33 : red
# 348ABD : blue
# 988ED5 : purple
# 777777 : gray
# FBC15E : yellow
# 8EBA42 : green
# FFB5B8 : pink

xtick.color: 555555
xtick.direction: out

ytick.color: 555555
ytick.direction: out

grid.color: white
grid.linestyle: -    # solid line

figure.facecolor: white
figure.edgecolor: 0.50

#animation settings
animation.html : html5


Overwriting che116.mplstyle

In [8]:
#Correct Answer
plt.style.use('che116.mplstyle')
from math import pi

x = np.linspace(0, 2 * pi, 100)
plt.plot(x, np.sin(x))
plt.plot(x, np.sin(x)**2)
plt.plot(x, np.fabs(np.sin(x)))
plt.show()


Customizing Plots II (5 Points)

Create a plot with the following properties WITHOUT using a style file:

  1. Make the figure size 8 by 6
  2. Plot $\cos(x)$ and $\sin(x)$ from $x = 0$ to $x = 2\pi$.
  3. Create lines at $y = -1$ and $y = 1$. Make sure they are visible.
  4. Create a title that says something funny
  5. Put the equation $E = mc^2$ somewhere in the graph (not in the legend).

In [9]:
plt.figure(figsize=(8,6))
x = np.linspace(0, 2 * pi, 100)
plt.plot(x, np.cos(x))
plt.plot(x, np.sin(x))
plt.xlim(0, 2 * pi)
plt.ylim(-1.5, 1.5)
plt.hlines([-1, 1], 0, 2 * pi)
plt.title('Funny Title')
plt.text(pi, 0.5, '$E = mc^2$', fontdict={'fontsize': 24})
plt.show()


3. Preparing for next problem (5 Points)

Write out the following probability theorems:

  1. Definition of conditional
  2. Definition of marginal
  3. Marginalization of the conditional
  4. What is $\sum_x P(X = x | Y = 2)$?
  5. The definition of conditional independence

4. Mammogram Screening (25 Points)

Mammograms are a testing procedure for breast cancer. The diagnosis procedure after a mammogram is positive is incredibly complex. We'll simplify a little bit here. If a mammogram test is positive, a woman will always return for a biopsy. A biopsy is the removal and analysis of a small amount of breast tissue. If a biopsy is positive, depending on the diagnosis, will lead to a mastectomy. The statistics from here on out are mostlye correct, but biopsy does not always follow a mammogram in real life. From ages 40 to 50, 45% of women who receive annual mammograms will have a false positive and 25% have a false negative. A false negative means that a woman had invasive breast cancer but the test did not show it. A false positive means a woman had no or benign cancer. A large study of biopsies shows that biopsies are correctly diagnosed 75% of the time (the state of cancer matches the state of the biopsy), with false positives being twice as likely as false negatives. You may assume that biopsies and mammograms are conditionally independent on the presence or absence of cancer. After positive finding from a mammogram and biopsy, a mastectomy is performed which has a 0.24% probability of mortality.The overall probability of having invasive breast cancer is 1.5% between the ages of 40 to 50. Answer the following questions:

  1. What is the probability of a positive mammogram result?
  2. Given a mammogram is positive, what's the probability a woman has invasive cancer?
  3. What's the probability of dying from a mastectomy?
  4. The mastectomy and treatment has a 97% survival rate. That number is for women with invasive breast cancer. We can assume a near 100% survivale rate for those that did not have invasive cancer but underwent treatment. If a woman with cancer is not diagnosed from a mammogram or biopsy, she will be diagnosed later due to symptoms with an overall survivale rate of 93%. What's the probability of dying from cancer?
  5. Given that there are 20 million women aged 40 to 50, what's the expected number of deaths from cancer and mastectomy? I would like to emphasize that there are many more dangers from cancer treatment than mastectomy and that even though it appears cancer is a larger problem, there are other mortality risks and quality of life changes from unnecessary cancer treatment.

Answer 3.1

Let's start by writing out what we know for this problem. Let $M$ be the rv for mammogram with 0 being negative screening and 1 being positive screening. Let $C$ represent no cancer (0) or benign cancer and invasive cancer (1). We are given:

$$P(C = 1) = 0.015$$$$P(M = 1\,|\, C = 0) = 0.45$$$$P(M = 0\,|\, C = 1) = 0.25$$

We are being asked $$P(M = 1)$$

We can use marginalization of the conditional:

$$P(M = 1) = \sum_c P(M = 1\,|\,C = c) P(C = c)$$$$P(M = 1) = 0.45 \times 0.015 + 0.55 \times 0.985 = 0.54925$$$$P(M = 1) = 0.45 \times 0.985 + 0.75 \times 0.015 = 0.4545$$

Answer 3.2

We are being asked $$P(C = 1\,|\,M = 1)$$ We can use Bayes' theorem, since all our conditionals are given in the opposite way.

$$P(C = 1\,|\,M = 1) = \frac{P(M = 1\, | \, C = 1) P(C = 1)}{P(M = 1)}$$

We can plug in the numbers from above.

$$= \frac{0.75 \times 0.015}{0.4545} = 0.02475$$

The probability of the woman having invasive cancer after a mammogram is 2%.

Answer 3.3

To arrive at a mastectomy, we must have a positive mammogram, a positive biopsy, and a mortal mastectomy. I will now use $B$ as the biopsy rv. If $B$ is 1, a mastectomy is performed. This question asks for $P(M = 1,B = 1, D = 1)$, where $D$ is the rv for dying during a mastectomy. $D$ is independent and $M$ and $B$ are conditionally independent on $C$. This means we need to know about the biopsy statistics. We are given not so straightforward information about the biopsy. In particular we know that:

$$P(B = 1, C = 1) + P(B = 0, C = 0) = 0.75$$

We know these are joints because 75% of the time, our biopsy matches the cancer rv. There is no conditioning in that sentence. Furthermore, we know that:

$$\frac{P(B = 1\, | \, C = 0)}{P(B = 0 \, | \, C = 1)} = 2$$

Rearranging and using $P(C = 0) = 0.985$, we can rewrite that as:

$$\frac{P(B = 1, C = 0)}{P(B = 0, C = 1)} = 2\times \frac{P(C = 0)}{P(C = 1)} = 135.4$$

We know from marginilzation that:

$$P(B = 1, C = 0) + P(B = 0, C = 0) = P(C = 0) = 0.985$$$$P(B = 1, C = 1) + P(B = 0, C = 1) = P(C = 1) = 0.015$$

You can solve for all the quantities, also using normalization as a then giving:

$$P(B = 0, C = 0) = 0.737$$$$P(B = 1, C = 0) = 0.248$$$$P(B = 0, C = 1) = 0.00183$$$$P(B = 1, C = 1) = 0.0132$$

Let's now rearrange $P(M = 1, B = 1, D = 1)$:

$$P(M = 1, B = 1, D = 1) = P(B = 1, M = 1) P(D = 1)$$$$P(B = 1, M = 1) = \sum_C P(B = 1, M = 1 \,|\, C) P(C)$$$$P(B = 1, M = 1) = P(B = 1 \,|\, C = 0)P(M = 1 \,|\, C = 0)P(C = 0) + P(B = 1 \,|\, C = 1)P(M = 1 \,|\, C = 1)P(C = 1)$$$$P(B = 1, M = 1) = P(B = 1, C = 0)P(M = 1 \,|\, C = 0) + P(B = 1, C = 1)P(M = 1 \,|\, C = 1)P(C=1)$$$$P(B = 1, M = 1) = 0.248\times 0.45 + 0.0132 \times 0.75 = 0.1215$$

Inserting back the mortality probability:

$$P(B = 1, M = 1, D = 1) = 0.1215\times 0.0024 = 0.000292 = 0.0292\%$$

Answer 3.4

There are two survival probabilities, one for those that did have a mastectomy following mammogram and biopsy and those that had a false negative but had treatment later. We first need to find the probability of being in these two groups. We do not need to consider the $C = 0$ groups, since they will always survive cancer. So the first group is $P(B = 1, M = 1, C = 1, S = 0)$, where $S$ is survival. The second group is $P(B = 0, M = 1, C = 1, S' = 0)$ and $P(M = 0, C = 1, S' = 0)$.

We'll start with the first term:

$$P(B = 1, M = 1, C = 1, S = 0) = P(B = 1, M = 1\, |\, C = 1) P(C = 1) P(S = 0)$$$$ = P(B = 1, C = 1) P(M = 1\, | \, C = 1) P(S = 0)$$$$ = 0.0132 \times 0.75 \times (1 - 0.97) = 0.0003 = 0.03\%$$

The next term:

$$P(B = 0, M = 1, C = 1, S' = 0) = P(B = 0, M = 1\, |\, C = 1) P(C = 1) P(S' = 0)$$$$ = P(B = 0, C = 1) P(M = 1\, | \, C = 1) P(S' = 0)$$$$ = 0.00183\times 0.75\times (1 - 0.93) = 0.0000961 = 0.00961\%$$

and lastly:

$$P(M = 0, C = 1, S' = 0) = P(M = 0\, |\, C = 1) P(C = 1) P(S' = 0)$$$$ = 0.25 \times 0.015 \times 0.07 = 0.000263 = 0.0263\%$$

So that the probability of dying from cancer is the sum of these three terms: $0.066\%$

Answer 3.5

Deaths from cancer: $2 \times 10^7 \times 0.066\% = 13200$

Deaths from mastectomy: $2 \times 10^7 \times 0.066\% = 5840$