Apply logistic regression to categorize whether a county had high mortality rate due to contamination

1. Import the necessary packages to read in the data, plot, and create a logistic regression model


In [ ]:
import pandas as pd
%matplotlib inline
import numpy as np
from sklearn.linear_model import LogisticRegression

2. Read in the hanford.csv file in the data/ folder


In [ ]:

3. Calculate the basic descriptive statistics on the data


In [ ]:

4. Find a reasonable threshold to say exposure is high and recode the data


In [ ]:


In [ ]:


In [ ]:


In [ ]:

5. Create a logistic regression model


In [ ]:


In [ ]:


In [ ]:

6. Predict whether the mortality rate (Cancer per 100,000 man years) will be high at an exposure level of 50


In [ ]:


In [ ]: