An Introduction to scikit-learn: Machine Learning in Python

Goals of this Tutorial

  • Introduce the basics of Machine Learning, and some skills useful in practice.
  • Introduce the syntax of scikit-learn, so that you can make use of the rich toolset available.

Schedule:

Preliminaries: Setup & introduction (15 min)

  • Making sure your computer is set-up

Basic Principles of Machine Learning and the Scikit-learn Interface (45 min)

  • What is Machine Learning?
  • Machine learning data layout
  • Supervised Learning
    • Classification
    • Regression
    • Measuring performance
  • Unsupervised Learning
    • Clustering
    • Dimensionality Reduction
    • Density Estimation
  • Evaluation of Learning Models
  • Choosing the right algorithm for your dataset

Supervised learning in-depth (15 minutes)

  • Decision Trees and Random Forests

Unsupervised learning in-depth (15 minutes)

  • Principal Component Analysis
  • K-means Clustering

Model Validation (15 minutes)

  • Validation and Cross-validation
  • GridSearchCV

Other topics

  • Pipeline

Preliminaries

This tutorial requires the following packages:

The easiest way to get these is to use the conda environment manager. I suggest downloading and installing miniconda.

The following command will install all required packages:

$ conda install numpy scipy matplotlib scikit-learn ipython-notebook

Alternatively, you can download and install the (very large) Anaconda software distribution, found at https://store.continuum.io/.

Checking your installation

You can run the following code to check the versions of the packages on your system:

(in IPython notebook, press shift and return together to execute the contents of a cell)


In [ ]:
from __future__ import print_function

import IPython
print('IPython:', IPython.__version__)

import numpy
print('numpy:', numpy.__version__)

import scipy
print('scipy:', scipy.__version__)

import matplotlib
print('matplotlib:', matplotlib.__version__)

import sklearn
print('scikit-learn:', sklearn.__version__)

Useful Resources


In [ ]: