This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com). Source and license info is on [GitHub](https://github.com/jakevdp/sklearn_tutorial/).

An Introduction to scikit-learn: Machine Learning in Python

Goals of this Tutorial

  • Introduce the basics of Machine Learning, and some skills useful in practice.
  • Introduce the syntax of scikit-learn, so that you can make use of the rich toolset available.

Schedule:

Preliminaries: Setup & introduction (15 min)

  • Making sure your computer is set-up

Basic Principles of Machine Learning and the Scikit-learn Interface (45 min)

  • What is Machine Learning?
  • Machine learning data layout
  • Supervised Learning
    • Classification
    • Regression
    • Measuring performance
  • Unsupervised Learning
    • Clustering
    • Dimensionality Reduction
    • Density Estimation
  • Evaluation of Learning Models
  • Choosing the right algorithm for your dataset

Supervised learning in-depth (1 hr)

  • Support Vector Machines
  • Decision Trees and Random Forests

Unsupervised learning in-depth (1 hr)

  • Principal Component Analysis
  • K-means Clustering
  • Gaussian Mixture Models

Model Validation (1 hr)

  • Validation and Cross-validation

Preliminaries

This tutorial requires the following packages:

The easiest way to get these is to use the conda environment manager. I suggest downloading and installing miniconda.

The following command will install all required packages:

$ conda install numpy scipy matplotlib scikit-learn ipython-notebook

Alternatively, you can download and install the (very large) Anaconda software distribution, found at https://store.continuum.io/.

Checking your installation

You can run the following code to check the versions of the packages on your system:

(in IPython notebook, press shift and return together to execute the contents of a cell)


In [1]:
from __future__ import print_function

import IPython
print('IPython:', IPython.__version__)

import numpy
print('numpy:', numpy.__version__)

import scipy
print('scipy:', scipy.__version__)

import matplotlib
print('matplotlib:', matplotlib.__version__)

import sklearn
print('scikit-learn:', sklearn.__version__)


IPython: 7.14.0
numpy: 1.18.4
scipy: 1.4.1
matplotlib: 3.2.1
scikit-learn: 0.22.2.post1

Useful Resources