Setting up Python for machine learning: scikit-learn and IPython Notebook

From the video series: Introduction to machine learning with scikit-learn


  • What are the benefits and drawbacks of scikit-learn?
  • How do I install scikit-learn?
  • How do I use the IPython Notebook?
  • What are some good resources for learning Python?

Benefits and drawbacks of scikit-learn


  • Consistent interface to machine learning models
  • Provides many tuning parameters but with sensible defaults
  • Exceptional documentation
  • Rich set of functionality for companion tasks
  • Active community for development and support

Potential drawbacks:

  • Harder (than R) to get started with machine learning
  • Less emphasis (than R) on model interpretability

Further reading:

Installing scikit-learn

Option 1: Install scikit-learn library and dependencies (NumPy and SciPy)

Option 2: Install Anaconda distribution of Python, which includes:

  • Hundreds of useful packages (including scikit-learn)
  • IPython and IPython Notebook
  • conda package manager
  • Spyder IDE

Using the IPython Notebook


  • IPython interpreter: enhanced version of the standard Python interpreter
  • Browser-based notebook interface: weave together code, formatted text, and plots


Launching the Notebook:

  • Type ipython notebook at the command line to open the dashboard
  • Don't close the command line window while the Notebook is running

Keyboard shortcuts:

Command mode (gray border)

  • Create new cells above (a) or below (b) the current cell
  • Navigate using the up arrow and down arrow
  • Convert the cell type to Markdown (m) or code (y)
  • See keyboard shortcuts using h
  • Switch to Edit mode using Enter

Edit mode (green border)

  • Ctrl+Enter to run a cell
  • Switch to Command mode using Esc

IPython and Markdown resources:

Resources for learning Python

Comments or Questions?

In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)