Your Data
The scikit-learn API is a well-designed, consistent API. The inputs to the different models are, almost invariably:
X matrix, which is n_samples (rows) by n_features (columns)Y matrix, which is n_samples (rows) by m_outputs (columns)Model Training/Fitting and Predictions
With the ML models implemented in scikit-learn, you usually would be doing:
mdl = ModelNameHere(params_here)
mdl.fit(X, Y)
preds = mdl.predict(new_data)
Cross-Validation
Cross-validation is always a good idea. Split your data into training and testing sets. Train on training set, test on testing set.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)
mdl.fit(X_train, Y_train)
preds = mdl.fit(X_test)
Finally, evaluate your model based on testing set performance.
your_metric_here(preds, Y_test)
scikit-learn's pages
scikit-learn flowchartscikit-learn's own resources pagefamous people/heroes
http://goo.gl/forms/n6x3gaaA57
Please stick around until after the draw for a $5 Starbucks card is announced!
In [2]:
from random import randint
In [3]:
n_entries = 25
randint(1, n_entries+1)
Out[3]:
In [ ]: