We covered a lot of information today and I'd like you to practice developing classification trees on your own. For each exercise, work through the problem, determine the result, and provide the requested interpretation in comments along with the code. The point is to build classifiers, not necessarily good classifiers (that will hopefully come later)

1. Load the iris dataset and create a holdout set that is 50% of the data (50% in training and 50% in test). Output the results (don't worry about creating the tree visual unless you'd like to) and discuss them briefly (are they good or not?)


In [ ]:

2. Redo the model with a 75% - 25% training/test split and compare the results. Are they better or worse than before? Discuss why this may be.


In [ ]:

3. Perform 10-fold cross validation on the data and compare your results to the hold out method we used in 1 and 2. Take the average of the results. What do you notice about the accuracy measures in each of these?


In [ ]:

4. Open the seeds_dataset.txt and perform basic exploratory analysis. What attributes to we have? What are we trying to predict?

For context of the data, see the documentation here: https://archive.ics.uci.edu/ml/datasets/seeds


In [ ]:

5. Using the seeds_dataset.txt, create a classifier to predict the type of seed. Perform the above hold out evaluation (50-50, 75-25, 10-fold cross validation) and discuss the results.


In [ ]: