In [0]:
# numpy provides python tools to easily load comma separated files.
import numpy as np
# use numpy to load disease #1 data
d1 = np.loadtxt(open("../31_Data_ML-IV/D1.csv", "rb"), delimiter=",")
# features are all rows for columns before 200
# The canonical way to name this is that X is our matrix of
# examples by features.
X1 = d1[:,:200]
# labels are in all rows at the 200th column
# The canonical way to name this is that y is our vector of
# labels.
y1 = d1[:,200]
# use numpy to load disease #2 data
d2 = np.loadtxt(open("../31_Data_ML-IV/D2.csv", "rb"), delimiter=",")
# features are all rows for columns before 200
X2 = d2[:,:200]
# labels are in all rows at the 200th column
y2 = d2[:,200]
In [0]:
# DATASET 1 CLASSIFIER CODE GOES HERE
In [0]:
# DATASET 2 CLASSIFIER CODE GOES HERE
In [0]:
d1_test = np.loadtxt(open("../32_Data_ML-V/D1_test.csv", "rb"), delimiter=",")
X1_test = d1_test[:,:200]
y1_test = d1_test[:,200]
d2_test = np.loadtxt(open("../32_Data_ML-V/D2_test.csv", "rb"), delimiter=",")
X2_test = d2_test[:,:200]
y2_test = d2_test[:,200]
d1_score = d1_classifier.score(X1_test, y1_test)
print("D1 Testing Accuracy: " + str(d1_score))
d2_score = d2_classifier.score(X2_test, y2_test)
print("D2 Testing Accuracy: " + str(d2_score))
Once you've got your accuracies in hand, head over to the reporting form: https://goo.gl/forms/a6t9mxVGwYpdQAhH3
Use the same code name and enter the actual accuracies that you observed for D1 and D2.
Q1: How did the class do (check out this http://bit.ly/GCB535-Combinator)? Did we generally overestimate performance, underestimate, or accurately estimate performance? (2 pts)
Q2: Did you personally find it easier to get a good accuracy for D1 or D2? Which one required more tries to get good performance (2 pts)
Q3: In your final classifier, what type of algorithm did you use and what parameters did you supply? (2 pts)
Q4: What did you expect your own accuracy to be for each dataset? What did you observe? Was this surprising? (2 pts)
Q5: What are two items of feedback that you'd like to give on this exercise? (2 pts)