This notebook is due on Friday, October 28th, 2016 at 11:59 p.m.. Please make sure to get started early, and come by the instructors' office hours if you have any questions. Office hours and locations can be found in the course syllabus. IMPORTANT: While it's fine if you talk to other people in class about this homework - and in fact we encourage it! - you are responsible for creating the solutions for this homework on your own, and each student must submit their own homework assignment.
Some links that you may find helpful:
FOR THIS HOMEWORK and for all future homework assignments: We will be grading you on:
To that end:
Put your name here!
We're going to finish up the "random walk" project that we started in class. In this section, we're going to do it in two dimensions, x and y. You need to write a program that performs a random walk that starts at the origin (x=y=0), picks a random direction (up, down, left, or right), and take one step in that direction. You will then randomly pick a new direction, take a step, and so on, for a total of $N_{step}$ = 1000 steps.
First: Write the code to do this for a single random walk, and keep track of the (x,y) position of your walker for each step in the random walk. Make a plot showing the path for this random walk. Make sure to write a function that decides what direction the walker will go on each step, and returns that information to your program.
In [ ]:
# put your code here
Second: Modify your code to only keep track of the final distance from the origin (magnitude, not x and y components - in other words, $d = (x^2 + y^2)^{1/2}$), and do the experiment $N_{trial} = 1000$ times, keeping track of the final distances for all of the trials. Plot the distribution of distances from the origin in a histogram, and calculate the mean value.
In [ ]:
# put your code here
Question: How does the 2D random walk behave similarly to, and differently from, the 1D random walk that you explored in class? Compare them below.
Put your answer here!
Now we want to see what happens in the 1D random walk when the "coin toss" is biased - in other words, when you're more likely to take a step in one direction than in the other (i.e., the probability of stepping to the right is $p_{step}$, of stepping to the left is $1-p_{step}$, and $p_{step} \neq 0.5$).
Modify the function for the 1D random walk that you wrote in class to take as an argument a probability that you step in one direction (say, to the right) and then to decide what direction to go. Use that to calculate a distribution of distances from the origin for $N_{step} = 1000$ and $N_{trial} = 1000$, as well as the mean distance from the origin. Plot a histogram of the distribution. Answer the following two questions:
Put your answers here
In [ ]:
# put your code here
The final part of this homework is the creation of a model of the 2016 Presidential Election that's similar to those used at election prediction sites such as FiveThirtyEight. You've been provided with a link to a CSV (comma-separated value) file containing information about each of the 50 states as well as the District of Columbia, and you'll use this to make election predictions.
A quick civics lesson: The United States does not directly elect the President and Vice President of the United States. Instead, they choose "Electors" that are apportioned to each state based on the most recent U.S. census results (equal to the number of members of Congress to which that state receives). Electors are typically required to vote for the candidate with the majority of the popular vote in their state. At the moment, there are 538 Electors in the "Electoral College.", and to win the presidency a candidate needs to recieve half plus one of those votes, or 270 votes.
The data file: Each row of the provided CSV file contains information about one of the 50 states or Washington D.C. This information includes: the state name and abbreviation, the the number of electoral votes that state receives in the Electoral College, the state's polling information, the number of people surveyed in the last poll, and then information for each of the three candidates with significant popular support: Hilary Clinton, Donald Trump, and Gary Johnson. For each candidate, the number corresponds to the percentage of the voting population that intends to vote for them according to the polls, as well as the margin of error of the poll. Note that we are only using the most recent poll for each state.
Margin of error: The "margin of error" of the poll represents uncertainty in polling data - typically only hundreds of people are polled at any given time, and pollsters are attempting to extrapolate from that sample of people to all of the likely voters in the state. This error is typically calculated assuming that the uncertainty is a "normal" (or "Gaussian") distribution, and the reported error is the "standard deviation". For example, Polls in Michigan indicate that Hilary Clinton is currently favored by 42% of the voting population, with a margin of error of 1.71%. This means that the most likely outcome is for 42% of the population to vote for Clinton, with a 68% likelihood that the real percentage is between 40.29-43.71%, and a 95% chance likelihood that the real percentage is between 38.58-45.42%.
We are going to attempt to duplicate the FiveThirtyEight election predictions, which predict not just who will win the election but what the range of likely outcomes will be. In particular, we're going to reproduce the expected distribution of Electoral College votes for each candidate. To do this, we need to run large numbers of model elections - say, $N_{elec} = 10,000$ - and keep track of the results for each one. To do so, you need to do the following for each of your model elections:
random.normalvariate()
function, or the NumPy random module's random.normal()
function to calculate the possible outcomes from a Gaussian distribution, given the mean and standard deviation.You will then be asked to answer several questions, as described below.
In [ ]:
import pandas
# reads in the CSV file and puts it into a Pandas data frame called "all_states"
all_states = pandas.read_csv('https://raw.githubusercontent.com/bwoshea/2016_election_info/master/State_polling_info.csv')
In [ ]:
# put your code here!
Question 1: Plot the histogram of expected Electoral College votes for the three candidates. Who do you expect to win the election?
Put your answer here!
In [ ]:
# put your code and plots here. Add additional cells if necessary.
Question 2: In what percentage of the model elections does Hilary Clinton win the Presidency? How about Donald Trump and Gary Johnson?
Put your answer here!
In [ ]:
# put your code and plots here. Add additional cells if necessary.
Question 3: Let's look at the difference between popular vote and Electoral College vote for one of the two main-party candidates - let's use Hilary Clinton as our example. Make a scatter plot of the expected Electoral College votes vs. the fraction of popular vote received for all of the elections, and put lines indicating 50% of the popular vote as well as the needed 270 Electoral College votes. Is it possible to win more than half of the Electoral College vote but get less than half of the popular votes, or vice versa? Why might this be true?
Put your answer here!
In [ ]:
# put your code and plots here. Add additional cells if necessary.
Question 4: Take a look at the results on the FiveThirtyEight election forecast page, and read their description of how they create these forecasts. How well does the distribution that you calculated in Question 1 agree with FiveThirtyEight's Electoral College forecast? If it is different, why do you think that might be?
Put your answer here!
In [ ]:
from IPython.display import HTML
HTML(
"""
<iframe
src="https://goo.gl/forms/q2zZVDznls9zeqJo2?embedded=true"
width="80%"
height="1200px"
frameborder="0"
marginheight="0"
marginwidth="0">
Loading...
</iframe>
"""
)