Much how you went out on your own in the Python module to implement a guessing game, now we're going to head off into the wilds of machine learning! Given what you've learned in the prelab (and feeling free to borrow some code from there, or earlier work), use your time in class to try to build an accurate model from these data.
The code below will load the data again.
In [46]:
# numpy provides python tools to easily load comma separated files.
import numpy as np
# use numpy to load disease #1 data
d1 = np.loadtxt(open("../31_Data_ML-IV/D1.csv", "rb"), delimiter=",")
# features are all rows for columns before 200
# The canonical way to name this is that X is our matrix of
# examples by features.
X1 = d1[:,:200]
# labels are in all rows at the 200th column
# The canonical way to name this is that y is our vector of
# labels.
y1 = d1[:,200]
# use numpy to load disease #2 data
d2 = np.loadtxt(open("../31_Data_ML-IV/D2.csv", "rb"), delimiter=",")
# features are all rows for columns before 200
X2 = d2[:,:200]
# labels are in all rows at the 200th column
y2 = d2[:,200]
In [0]:
# CODE FOR DATASET 1 GOES HERE
In [0]:
# CODE FOR DATASET 2 GOES HERE
Once you have parameters that you're happy with for each dataset, go ahead and make one final classifier. You must set the random state, and your code must produce a classifier named d1_classifier for the disease 1 dataset and d2_classifier from the disease two dataset. Each one should assume that data have been loaded into variables named X1 and X2 respectively for features, and y1 and y2 for labels. You can assume that these have already been loaded (just like we did above). You don't have to report everything you try below, but include the training score and testing score (using classifier.score) and accuracy + 2x standard deviation from cross validation (as in prelab) for your final classifiers.
In [0]:
# CODE TO CREATE THE d1_classifier
In [0]:
# CODE TO CREATE THE d2_classifier