Artificial Intelligence for Health Metricians

Lecture 1 Outline:

  • Administrivia
  • What is artificial intelligence, what is machine learning?
  • Exercise 1: Predicting XXX

Administrivia

Class expectations

  • Read
  • Participate in class
  • Do a project

Read

• Witten, Frank, and Hall, Data Mining: Practical Machine Learning Tools and Techniques.

Read

• James, Witten, Hastie, and Tibshirani, An Introduction to Statistical Learning with applications in R.

Read

• McKinney, Python for Data Analysis.

Participate in Class

Think of this class like a journal club. I am here, I'll facilitate it, but I want you to read (see previous expectation), and discuss what you are reading with each other. In the age of the MOOC, it is the classroom community that justifies old fashioned courses.

Do a project

This is the part that I am really excited about...

Where should we meet?

On campus has benefits, at IHME has benefits (for me). What about others?

  • Monday and Wednesday 10:30-11:50AM in the Health Sciences Library Teaching Lab (LTL First Room)
  • except for Feb 4th, 9th, and 23rd, which will meet in eScience Institute WRF Data Science Studio in the Physics/Astronomy Tower (PAT), 6th Floor, room 610C)

What is artificial intelligence?

What is machine learning?

One/Two/Four exercise, or writing and sharing, depending on class size and dynamics

A computer program is said to learn from experience $E$

with respect to some class of tasks $T$ and performance measure $P$, if its performance at tasks in $T$, as measured by $P$, improves with experience $E$.

---Tom Mitchell

What is the difference between machine learning and statistics?

The web suggests several criteria: searching hypotheses vs testing hypothesis; prediction vs inference; good marketing vs bad marketing; publishing in conferences vs publishing in journals; or simply sitting in a CS department vs sitting in a Stats dept.

I think there is a more fundamental difference, which may betray my CMU math department upbringing: a foundation of mathematical logic vs a foundation of real analysis.


In [2]:
import IPython.display

Can a machine do what we can do?


In [3]:
IPython.display.Image("http://upload.wikimedia.org/wikipedia/en/c/c8/Alan_Turing_photo.jpg")


Out[3]:

In [4]:
IPython.display.YouTubeVideo("W7Rq-PEW5qM")


Out[4]:

But they [computers] are useless. They can only give you answers.

---Picasso


In [4]:
IPython.display.Image('http://upload.wikimedia.org/wikipedia/en/1/1c/Stravinsky_picasso.png')


Out[4]:

What will we do in this class?

Exercise 1: Predicting Something

  • Contact Lenses
  • Weather
  • Iris
  • CPU Performance
  • Labor Negotiations
  • Soybean
  • Wage
  • Smarket
  • NCI60
  • Advertising
  • Income

Exercise 1: Predicting Weather

https://cloud.sagemath.com


In [12]:
import pandas as pd
df = pd.read_csv('https://github.com/aflaxman/AI4HM/raw/master/data/weather-numeric.csv')

In [13]:
df.head()


Out[13]:
outlook temperature humidity windy play
0 sunny 85 85 False no
1 sunny 80 90 True no
2 overcast 83 86 False yes
3 rainy 70 96 False yes
4 rainy 68 80 False yes

Abie's dumb predictor


In [14]:
def predict(s):
    if s['outlook'] == 'sunny':
        return 'no'
    else:
        return 'yes'

In [15]:
predict(df.loc[1]) # loc[1] means "location = row 1"


Out[15]:
'no'

How good is this dumb predictor?


In [16]:
i = 0
predict(df.loc[i]) == df.play[i]


Out[16]:
True

In [17]:
for i in df.index:
    # count how many predictions are correct
    pass

How much better can you do with a single rule?

Alternatively, how can you learn enough Python to do this?

http://software-carpentry.org/v5/novice/python/index.html

Homework:

  • Find the best "length-two decision list" for this
  • Start thinking about a machine learning project (possibly related to your IHME research)
  • Read
  • Think about an "elevator pitch"

Experience indicates that some students may feel that they do not yet know enough about the scope of AI/ML to develop a project yet, let alone an elevator pitch for it. Here is an example of a project that I hope someone does:

This all started with our work on smoking prevalence, the details of which do not fit into the elevator ride. But the key point is we want to know how much of the population is exhibiting this important risk factor. So we ask a representative sample, via telephone survey. And we start the questions off with a screening question, "have you smoked at least 100 cigarettes in your life?". Here is the problem: there are at least 3 common interpretations of this question: 50% think it means A, 25% B, 25% C. This is important to know, but finding out required hard work using qualitative methods, cognitive interviewing, think-aloud exercises, etc. Wouldn't it be cool if when you were developing a survey, you could just ask a computer for a list of possible interpretations of your candidate question? Project: make a computer do this, so that survey designers don't have to do all the hard work of cognitive interviewing. Or at least so that they are pretty sure things are going to work when they do the testing...


In [16]:
!cd /homes/abie/nbconvert/; cp /homes/abie/notebook/2013_03_31_ML4HM_Lecture_1.ipynb L1.ipynb; ./nbconvert.py --format reveal L1.ipynb


======================================================
Warning, we are deprecating this version of nbconvert,
please consider using the new version.
======================================================
    

In [1]:
import ipynb_style
reload(ipynb_style)
ipynb_style.presentation()


Out[1]:

In [ ]: