This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for PyData Seattle 2015. Source and license info is on [GitHub](https://github.com/jakevdp/sklearn_pydata2015/).

An Introduction to scikit-learn: Machine Learning in Python

Goals of this Tutorial

  • Introduce the basics of Machine Learning, and some skills useful in practice.
  • Introduce the syntax of scikit-learn, so that you can make use of the rich toolset available.

Schedule:

10:00 - 10:15 Preliminaries: Setup & introduction

  • Making sure your computer is set-up

10:15 - 11:00 Basic Principles of Machine Learning and the Scikit-learn Interface

  • What is Machine Learning?
  • Machine learning data layout
  • Supervised Learning
    • Classification
    • Regression
    • Measuring performance
  • Unsupervised Learning
    • Clustering
    • Dimensionality Reduction
    • Density Estimation
  • Evaluation of Learning Models
  • Choosing the right algorithm for your dataset

11:00 - 12:00 Supervised learning in-depth

  • Support Vector Machines
  • Decision Trees and Random Forests

The tutorial repository contains additional material which we will not cover here. My hope is that you will find it useful to read-through on your own if you want to go deeper!

Preliminaries

This tutorial requires the following packages:

The easiest way to get these is to use the conda environment manager. I suggest downloading and installing miniconda.

The following command will install all required packages:

$ conda install numpy scipy matplotlib scikit-learn ipython-notebook

Alternatively, you can download and install the (very large) Anaconda software distribution, found at https://store.continuum.io/.

Checking your installation

You can run the following code to check the versions of the packages on your system:

(in IPython notebook, press shift and return together to execute the contents of a cell)


In [1]:
from __future__ import print_function

import IPython
print('IPython:', IPython.__version__)

import numpy
print('numpy:', numpy.__version__)

import scipy
print('scipy:', scipy.__version__)

import matplotlib
print('matplotlib:', matplotlib.__version__)

import sklearn
print('scikit-learn:', sklearn.__version__)

import seaborn
print('seaborn', seaborn.__version__)


IPython: 2.4.1
numpy: 1.9.2
scipy: 0.15.1
matplotlib: 1.4.3
scikit-learn: 0.15.2
seaborn 0.5.1

Useful Resources