Let's find out how we did!

We'll first need to load up our old classifier. We've supplied code below to load up the same data used by ML-4. You should be able to copy and paste in your code where noted below. First we're going to load up the appropriate data to train your models again.


In [0]:
# numpy provides python tools to easily load comma separated files.
import numpy as np

# use numpy to load disease #1 data
d1 = np.loadtxt(open("../31_Data_ML-IV/D1.csv", "rb"), delimiter=",")

# features are all rows for columns before 200
# The canonical way to name this is that X is our matrix of
# examples by features.
X1 = d1[:,:200]

# labels are in all rows at the 200th column
# The canonical way to name this is that y is our vector of
# labels.
y1 = d1[:,200]

# use numpy to load disease #2 data
d2 = np.loadtxt(open("../31_Data_ML-IV/D2.csv", "rb"), delimiter=",")

# features are all rows for columns before 200
X2 = d2[:,:200]
# labels are in all rows at the 200th column
y2 = d2[:,200]

Train your classifiers

Next you need to copy/paste in the code from your ML-4 Homework to train your classifiers on the data. Because we're using exactly the same data and you set a random seed, we can expect to get the same classifiers.


In [0]:
# DATASET 1 CLASSIFIER CODE GOES HERE

In [0]:
# DATASET 2 CLASSIFIER CODE GOES HERE

Evaluate your classifiers

If you've followed instructions thus far, you should be able to simply run the code below to report your results.


In [0]:
d1_test = np.loadtxt(open("../32_Data_ML-V/D1_test.csv", "rb"), delimiter=",")
X1_test = d1_test[:,:200]
y1_test = d1_test[:,200]

d2_test = np.loadtxt(open("../32_Data_ML-V/D2_test.csv", "rb"), delimiter=",")
X2_test = d2_test[:,:200]
y2_test = d2_test[:,200]

d1_score = d1_classifier.score(X1_test, y1_test)
print("D1 Testing Accuracy: " + str(d1_score))

d2_score = d2_classifier.score(X2_test, y2_test)
print("D2 Testing Accuracy: " + str(d2_score))

Record your results!

Once you've got your accuracies in hand, head over to the reporting form: https://goo.gl/forms/a6t9mxVGwYpdQAhH3

Use the same code name and enter the actual accuracies that you observed for D1 and D2.

Homework

Q1: How did the class do (check out this http://bit.ly/GCB535-Combinator)? Did we generally overestimate performance, underestimate, or accurately estimate performance? (2 pts)

Q2: Did you personally find it easier to get a good accuracy for D1 or D2? Which one required more tries to get good performance (2 pts)

Q3: In your final classifier, what type of algorithm did you use and what parameters did you supply? (2 pts)

Q4: What did you expect your own accuracy to be for each dataset? What did you observe? Was this surprising? (2 pts)

Q5: What are two items of feedback that you'd like to give on this exercise? (2 pts)