Homework 7

CHE 116: Numerical Methods and Statistics

3/8/2018


1. Conceptual Questions (8 Points)

Answer these in Markdown

  1. [1 point] In problem 4 from HW 3 we discussed probabilities of having HIV and results of a test being positive. What was the sample space for this problem?
  2. [4 points] One of the notations in the answer key is a random variable $H$ which indicated if a person has HIV. Make a table showing this functions inputs and outputs for the sample space. Making Markdown Tables
  3. [1 point] A probability density function is used for what types of probability distributions?
  4. [2 points] What is the probability of $t > 4$ in an exponential distribution with $\lambda = 1$? Leave your answer in terms of an exponential.

2. The Nile (10 Points)

Answer in Python

  1. [4 points] Load the Nile dataset and convert to a numpy array. It contains measurements of the annual flow of the river Nile at Aswan. Make a scatter plot of the year vs flow rate. If you get an error when loading pydataset that says No Module named 'pydataset', then execute this code in a new cell once: !pip install pydataset

  2. [2 points] Report the correlation coefficient between year and flow rate.

  3. [4 points] Create a histogram of the flow rates and show the median with a vertical line. Labels your axes and make a legend indicating what the vertical line is.

2. Insect Spray (10 Points)

Answer in Python

  1. [2 points] Load the 'InsectSpray' dataset, convert to a numpy array and print the number of rows and columns. Recall that numpy arrays can only hold one type of data (e.g., string, float, int). What is the data type of the loaded dataset?

  2. [2 points] Using np.unique, print out the list of insect spray used. This data is a count insects on a crop field with various insect sprays.

  3. [4 points] Create a violin plot of the data. Label your axes.

  4. [2 points] Which insect spray worked best? What is the mean number of insects for the best insect spray?

3. NY Air Quality (6 Points)

Load the 'airquality' dataset and convert into to a numpy array. Make a scatter plot of wind (column 2, mph) and ozone concentration (column 0, ppb). Using the plt.text command, display the correlation coefficient in the plot. This data has nans, which means "not a number". You can select non-nans by using x[~numpy.isnan(x)]. You'll need to remove these to calculate correlation coefficient.


In [ ]: