Homework 4

CHE 116: Numerical Methods and Statistics

Prof. Andrew White

Version 1.1 (2/2/2016)


0. Revise a Problem (5 Bonus Points)

Revist a problem you got wrong on homework 2. If you got a perfect score in homework 2, state that fact. Go through each part you missed and state what your answer was and what your mistake was. For example:

Problem 1.1

My answer used the scipy comb function instead of factorial.

Problem 1.2-1.4

No mistakes

Problem 1.5

I used this equation:

$$\frac{100!}{5!(100 - 5)!} \frac{95!}{5!(95 - 5)!}$$

which is too high by a factor of $2$. I did not consider that which team picks first is irrelevant.

1. Making Plots (15 Points)

  1. Using numpy and matplotlib, plot the exponential distribution with $\lambda = 1$, $1.5$, and $2$. Include a legend that uses $\LaTeX$
  2. Now plot the binomial distribution for $N = 10$, $p = 0.2$ and $N = 10$, $p = 0.5$. Include a legend.
  3. Plot $2^x$ and $x^2$ from $x = 1$ to $x = 8$.

2. Customizing Plots I (8 Points)

If you execute the cell below, it will write the contents of the cell to a file called che116.mplstyle. We will use this file in the future, so hold onto it. To load this style and use it, execute plt.style.use('che116.mplstyle'). Make the following changes to the file, write it, and then plot three interesting lines on the samge graph:

  1. Make the first line plotted be colored red.
  2. Make figures be 10.4 by 7.15 inches.
  3. Make a grid be visible
  4. Make two other changes and document them using a comment in the file

View the comments in this file to learn what all the parameters do


In [5]:
%%writefile che116.mplstyle

#set the font-size and size of things
figure.figsize: 5, 3
axes.labelsize: 14.3
axes.titlesize: 15.6
xtick.labelsize: 13
ytick.labelsize: 13
legend.fontsize: 13

grid.linewidth: 1.3
lines.linewidth: 2.275
patch.linewidth: 0.39
lines.markersize: 9.1
lines.markeredgewidth: 0

xtick.major.width: 1.3
ytick.major.width: 1.3
xtick.minor.width: 0.65
ytick.minor.width: 0.65

xtick.major.pad: 9.1
ytick.major.pad: 9.1

    
axes.xmargin        : 0
axes.ymargin        : 0

#setup our colorscheme

patch.facecolor: 348ABD  # blue
patch.edgecolor: EEEEEE
patch.antialiased: True

font.size: 12.0
text.color: black

axes.facecolor: E5E5E5
axes.edgecolor: bcbcbc
axes.linewidth: 1
axes.grid: False
axes.labelcolor: 555555
axes.axisbelow: True       # grid/ticks are below elements (e.g., lines, text)

axes.prop_cycle: cycler('color', ['444444', '348ABD', '988ED5', '777777', 'FBC15E', '8EBA42', 'FFB5B8'])
# E24A33 : red
# 348ABD : blue
# 988ED5 : purple
# 777777 : gray
# FBC15E : yellow
# 8EBA42 : green
# FFB5B8 : pink

xtick.color: 555555
xtick.direction: out

ytick.color: 555555
ytick.direction: out

grid.color: white
grid.linestyle: -    # solid line

figure.facecolor: white
figure.edgecolor: 0.50

#animation settings
animation.html : html5


Overwriting che116.mplstyle

2. Customizing Plots II (5 Points)

Create a plot with the following properties WITHOUT using a style file:

  1. Make the figure size 8 by 6
  2. Plot $\cos(x)$ and $\sin(x)$ from $x = 0$ to $x = 2\pi$.
  3. Create lines at $y = -1$ and $y = 1$. Make sure they are visible.
  4. Create a title that says something funny
  5. Put the equation $E = mc^2$ somewhere in the graph (not in the legend).

3. Preparing for next problem (5 Points)

Write out the following probability theorems in markdown:

  1. Definition of conditional
  2. Definition of marginal
  3. Marginalization of the conditional
  4. What is $\sum_x P(X = x | Y = 2)$?
  5. The definition of conditional independence

4. Mammogram Screening (25 Points)

This problem is testing your ability to turn sentences into equations, to rearrange probabilities, and to be aware of sample spaces and rv dependence.

Mammograms are a testing procedure for breast cancer. The diagnosis procedure after a mammogram is positive is incredibly complex. We'll simplify a little bit here. If a mammogram test is positive, a woman will always return for a biopsy. A biopsy is the removal and analysis of a small amount of breast tissue. If a biopsy is positive we will assume it leads to a mastectomy. The statistics from here on out are mostly correct, but biopsy does not always follow a mammogram in real life. From ages 40 to 50, 45% of women who receive annual mammograms will have a false positive and 25% have a false negative. A false negative means that a woman had invasive breast cancer but the test did not show it. A false positive means a woman had no or benign cancer. A large study of biopsies shows that biopsies are correctly diagnosed 75% of the time (the state of cancer matches the state of the biopsy), with false positives being twice as likely as false negatives. You may assume that biopsies and mammograms are conditionally independent on the presence or absence of cancer. After positive finding from a mammogram and biopsy, a mastectomy is performed which has a 0.24% probability of mortality. That mortality probability is independent of all factors. The overall probability of having invasive breast cancer is 1.5% between the ages of 40 to 50. Answer the following questions:

  1. What is the probability of a positive mammogram result?
  2. Given a mammogram is positive, what's the probability a woman has invasive cancer?
  3. What's the probability of dying from a mastectomy?
  4. The mastectomy and treatment has a 97% survival rate. That number is for women with invasive breast cancer. We can assume a near 100% survivale rate for those that did not have invasive cancer but underwent treatment. If a woman with cancer is not diagnosed from a mammogram or biopsy, she will be diagnosed later due to symptoms with an overall survivale rate of 93%. What's the probability of dying from cancer?
  5. Given that there are 20 million women aged 40 to 50, what's the expected number of deaths from cancer and mastectomy? I would like to emphasize that there are many more dangers from cancer treatment than mastectomy and that even though it appears cancer is a larger problem, there are other mortality risks and quality of life changes from unnecessary cancer treatment.

Answer each question first with equations and markdown (symbolically) as far as possible. You can use previous answers here or given parameters, but do not compute arithmetic until the end. Final answer may be given in Python or markdown.


In [ ]: