Machine Learning for Health Metricians

Week 4: Evaluating what has been learned


In [1]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns
%matplotlib inline
sns.set_context('poster')
sns.set_style('darkgrid')

Any Questions?

Week 4

  • Class 1: review homework, discuss cross-validation, begin exercise
  • Class 2: continue exercise, discuss metrics of success

  • Before this week:
    • Read 2 chapters (DM 5, ISL 5.1 and 5.2)
    • Complete Prediction computational exercise
  • During this week’s classes:
    • Guided tour and hands-on experience of cross-validation with VA data.
  • Outside of classes:
    • Read sections 6.1, 8.1
    • Complete “Cross-validation” computational exercise

Lecture 4a Outline:

  • Homework Solutions
  • Methods of cross-validation
  • Exercise 4: Cross-validation

Any (more) questions?

Anticipated questions: something about the homework, especially predicting categorical vars with a linear regression; train/test vs train/test/validation splits; training with full data after validation;

Homework Solutions

  • Naïve Bayes
  • Linear Models
  • Decision Trees
  • (Instance-based Models)

Using a student's solutions was a mess last week. This time I will work through my solutions and leave plenty of time to try things out if students did things differently in an interesting way (and are willing to admit to it).

Thought Experiment

  • Which do you expect to work best for predicting cell phone ownership?

Now see if you agree with your neighbors, and try to convince each other, if not.


In [2]:
import IPython.display

In [3]:
IPython.display.YouTubeVideo('IeVdnxiIwXk')


Out[3]:

Follow up on last week's aside on Open Source

Lecture 4a Outline:

  • Homework Solutions
  • Methods of cross-validation
  • Exercise 4: Cross-validation

Cross-validation

$K$-Fold

LOOCV

Bootstrap, 0.632

Challenge Question:

  • to find the best method for predicting mobile-phone ownership, do you need to use a test/train/validation split, or is a test/train split sufficient?

See if you agree with your neighbors, and, if not, try to convince each other.

Lecture 4a Outline:

  • Homework Solutions
  • Methods of cross-validation
  • Exercise 4: Cross-validation

Data: PHMRC VA Validation Study


In [16]:
import ipynb_style
reload(ipynb_style)
ipynb_style.presentation()


Out[16]: