Assignment 1

  • Use the pseudocode you came up with in class to write your own 5-fold cross-validation function that splits the data set into 5 equal-sized sets
  • Don't forget to shuffle the input before assigning to sets
  • You can use the fit(), predict(), and score() functions of your model in your functions
  • Test the results with the sklearn cross_val_score
  • In your PR, discuss what challenges you had creating this function and if it helped you better understand cross validation

Pseudocode

function for 5-fold cross validation:

    Shuffle the index of the data. 

    Split the data in 5 parts.

    Loop, repeated 5 times:

        Train the system using four parts. 

        Compute the accuracy using the not used part.

        Store it in a dictionary.

    Compute the average for the 5 accuracy values from the loop.

In [50]:
import pandas as pd
%matplotlib inline
from sklearn import datasets
from sklearn import tree
import matplotlib.pyplot as plt
from random import shuffle

In [157]:
def lets_shake(x,y):
    zipped = zip(x,y) # Let's bind the x and the y together before shakin'.
    zipped_list = list(zipped) 
    shuffle(zipped_list)
    x, y = zip(*zipped_list)
    return x, y

In [222]:
def the_splitter(x,y,folds):
    divider = int(len(x)/folds)
    group_dict = {}
    
    for i in range(0,folds):
        temp_list = list()
        for j in range(i*divider,(i+1)*divider):
            temp_list.append((x[j], y[j]))
        group_dict['group' + str(i)] = temp_list
    return group_dict
# This function still neglect a small rest of the values.

In [246]:
def the_test(groups,folds):
    dt = tree.DecisionTreeClassifier()
    for group in groups:
        dt = dt.fit(x,y)
        #dt.predict()
        #score()

In [231]:
x = [[1, 1], [2, 2], [3, 3], [4, 4], [5,5], [6,6], [7,7], [8,8], [9,9], [10,10]]
y = [0,2,1,1,0,1,2,1,1,0]

In [232]:
x,y = lets_shake(x,y)
groups = the_splitter(x,y,5)

In [236]:
groups


Out[236]:
{'group0': [([3, 3], 1), ([4, 4], 1)],
 'group1': [([7, 7], 2), ([5, 5], 0)],
 'group2': [([9, 9], 1), ([10, 10], 0)],
 'group3': [([8, 8], 1), ([1, 1], 0)],
 'group4': [([2, 2], 2), ([6, 6], 1)]}

In [ ]:


In [ ]:


In [ ]: