Machine Learning Problems

  • What are the 3 types of machine learning problems that we went through today?
  • How do we evaluate the different types of problems?
  • What are some statistical practices that may help in your ML analysis?

Coding Patterns

Your Data

The scikit-learn API is a well-designed, consistent API. The inputs to the different models are, almost invariably:

  • An X matrix, which is n_samples (rows) by n_features (columns)
  • A Y matrix, which is n_samples (rows) by m_outputs (columns)

Model Training/Fitting and Predictions

With the ML models implemented in scikit-learn, you usually would be doing:

mdl = ModelNameHere(params_here)
mdl.fit(X, Y)
preds = mdl.predict(new_data)


Cross-Validation

Cross-validation is always a good idea. Split your data into training and testing sets. Train on training set, test on testing set.

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)
mdl.fit(X_train, Y_train)
preds = mdl.fit(X_test)


Finally, evaluate your model based on testing set performance.

your_metric_here(preds, Y_test)

Other Resources

scikit-learn's pages

famous people/heroes

Feedback is much appreciated!

http://goo.gl/forms/n6x3gaaA57

Please stick around until after the draw for a $5 Starbucks card is announced!


In [2]:
from random import randint

In [3]:
n_entries = 25
randint(1, n_entries+1)


Out[3]:
5

In [ ]: