Discussion 5: Prediction and Inference

Relevant lectures: 8

In today's discussion, you'll get practice with inference concepts and dive deeper into the work we did in lecture 8.

This discussion will not be turned in. In fact, there is no code in this discussion; all your answers will be written in the text cells below.

The purpose of this exercise is to think about and communicate your point of view, so please work through these problems together in groups of 2 or 3.

Traffic Data Problem

Recall from lecture: J drives a daily commute to UC Berkeley from Beaumont Ave. in Oakland.

He wants to know what lane is best to take.

Specifically, he wants to know: is Lane 4 (the rightmost lane) better than Lane 1 (the leftmost lane)?

Dataset

Our dataset contains all the work day flows over 60 minute intervals (7-8am) near Beaumont Ave.

Here's a plot of the flows from 7-8am over the time period in our data:

And here are the distributions of the flows:

Question 1: Recap

First, let's walk through the steps we took during lecture to create our confidence interval.

Question 1a:

How did we change J's question into a more precise statistical question?

Write your answer here, replacing this text.

Question 1b:

What were our null and alternative hypotheses for this question?

Write your answer here, replacing this text.

Question 1c:

Let's suppose that we took our data and found the mean flow for Lane 1 was 1000 and the mean flow for Lane 4 was 980.

This results in a (Lane 1 - Lane 4) flow of 20.

At this point, why can't we conclude that Lane 1 has a different mean flow than Lane 4?

Write your answer here, replacing this text.

Question 1d:

In order to tell whether our difference is significant, we bootstrapped the mean difference between Lane 1 and 4.

This is the distribution we got:

According to this distribution, estimate the probability that we get a flow difference of 0 if the lane flows fluctuated by chance.

Why can we look at this distribution and find a probability?

Is our probability a p-value?

Finally, why did we look at the probability of getting 0 or more extreme rather than getting 20 (our previously computed mean difference) or more extreme?

Write your answer here, replacing this text.

Question 1e:

Use the distribution above to roughly estimate the bounds of a 95% confidence interval for this problem. (Remember to construct the correct type of interval for this problem, not just what was on the lecture slides.)

Write your answer here, replacing this text.

Question 1f:

Does our confidence interval suggest that J should prefer one lane over the other?

Why did we say that the confidence interval probably wasn't the right tool for the job?

Write your answer here, replacing this text.

Question 2

One good way to check whether you understand something is to tweak the problem and see if you can still figure it out. Let's do that!

Question 2a:

Let's suppose we didn't bootstrap the differences. Instead, we bootstrap the mean flow for Lane 1 and Lane 4 separately. Can we still answer our original question? If so, how? If not, explain why not.

Write your answer here, replacing this text.

Question 2b:

Rephrase the question, null, and alternative hypotheses so that you would construct a one-sided confidence interval instead of the two-sided one above.

Then, use the plot above to estimate your one-sided confidence interval. How do the bounds of this interval compare with your previous bounds?

Write your answer here, replacing this text.

Question 2c:

Let's suppose we constructed the interval, then looked at our EDA and decided to cut out the data from Oct to Nov 2016 out, then recreate the confidence interval.

What new assumption did we implicitly make in this process?

Write your answer here, replacing this text.

Question 2d:

Let's suppose we didn't have the bootstrap. How else could we estimate the sample distribution of mean differences?

There is an answer that is easy to state. There is also an answer that you might have learned if you've taken other Stats classes.

Write your answer here, replacing this text.



In [ ]: