Machine Learning Problems

Coding Patterns

Your Data

The scikit-learn API is a well-designed, consistent API. The inputs to the different models are, almost invariably:

Model Training/Fitting and Predictions

With the ML models implemented in scikit-learn, you usually would be doing:

mdl = ModelNameHere(params_here)
mdl.fit(X, Y)
preds = mdl.predict(new_data)

Cross-Validation

Cross-validation is always a good idea. Split your data into training and testing sets. Train on training set, test on testing set.

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)
mdl.fit(X_train, Y_train)
preds = mdl.fit(X_test)

Finally, evaluate your model based on testing set performance.

your_metric_here(preds, Y_test)

scikit-learn's pages

famous people/heroes

Please stick around until after the draw for a $5 Starbucks card is announced!



In [2]:

    
from random import randint



In [3]:

    
n_entries = 25
randint(1, n_entries+1)









    Out[3]:





5



In [ ]: