Artificial Intelligence for Health Metricians

Lecture 1 Outline:

Administrivia
What is artificial intelligence, what is machine learning?
Exercise 1: Predicting XXX

Administrivia

Class expectations

Read
Participate in class
Do a project

Read

• Witten, Frank, and Hall, Data Mining: Practical Machine Learning Tools and Techniques.

Read

• James, Witten, Hastie, and Tibshirani, An Introduction to Statistical Learning with applications in R.

Read

• McKinney, Python for Data Analysis.

Participate in Class

Think of this class like a journal club. I am here, I'll facilitate it, but I want you to read (see previous expectation), and discuss what you are reading with each other. In the age of the MOOC, it is the classroom community that justifies old fashioned courses.

Do a project

This is the part that I am really excited about...

Where should we meet?

On campus has benefits, at IHME has benefits (for me). What about others?

Monday and Wednesday 10:30-11:50AM in the Health Sciences Library Teaching Lab (LTL First Room)
except for Feb 4th, 9th, and 23rd, which will meet in eScience Institute WRF Data Science Studio in the Physics/Astronomy Tower (PAT), 6th Floor, room 610C)

What is artificial intelligence?

What is machine learning?

One/Two/Four exercise, or writing and sharing, depending on class size and dynamics

A computer program is said to learn from experience $E$

with respect to some class of tasks $T$ and performance measure $P$, if its performance at tasks in $T$, as measured by $P$, improves with experience $E$.

---Tom Mitchell

What is the difference between machine learning and statistics?

The web suggests several criteria: searching hypotheses vs testing hypothesis; prediction vs inference; good marketing vs bad marketing; publishing in conferences vs publishing in journals; or simply sitting in a CS department vs sitting in a Stats dept.

I think there is a more fundamental difference, which may betray my CMU math department upbringing: a foundation of mathematical logic vs a foundation of real analysis.



In [2]:

    
import IPython.display

Can a machine do what we can do?



In [3]:

    
IPython.display.Image("http://upload.wikimedia.org/wikipedia/en/c/c8/Alan_Turing_photo.jpg")









    Out[3]:



In [4]:

    
IPython.display.YouTubeVideo("W7Rq-PEW5qM")









    Out[4]:

But they [computers] are useless. They can only give you answers.

---Picasso



In [4]:

    
IPython.display.Image('http://upload.wikimedia.org/wikipedia/en/1/1c/Stravinsky_picasso.png')









    Out[4]:

What will we do in this class?

https://github.com/aflaxman/ai4hm

Exercise 1: Predicting Something

Contact Lenses
Weather
Iris
CPU Performance
Labor Negotiations
Soybean
Wage
Smarket
NCI60
Advertising
Income

Exercise 1: Predicting Weather

https://cloud.sagemath.com



In [12]:

    
import pandas as pd
df = pd.read_csv('https://github.com/aflaxman/AI4HM/raw/master/data/weather-numeric.csv')



In [13]:

    
df.head()









    Out[13]:






  
    
      
      outlook
      temperature
      humidity
      windy
      play
    
  
  
    
      0
          sunny
       85
       85
       False
        no
    
    
      1
          sunny
       80
       90
        True
        no
    
    
      2
       overcast
       83
       86
       False
       yes
    
    
      3
          rainy
       70
       96
       False
       yes
    
    
      4
          rainy
       68
       80
       False
       yes

Abie's dumb predictor



In [14]:

    
def predict(s):
    if s['outlook'] == 'sunny':
        return 'no'
    else:
        return 'yes'



In [15]:

    
predict(df.loc[1]) # loc[1] means "location = row 1"









    Out[15]:





'no'

How good is this dumb predictor?



In [16]:

    
i = 0
predict(df.loc[i]) == df.play[i]









    Out[16]:





True



In [17]:

    
for i in df.index:
    # count how many predictions are correct
    pass

How much better can you do with a single rule?

Alternatively, how can you learn enough Python to do this?

http://software-carpentry.org/v5/novice/python/index.html

Homework:

Find the best "length-two decision list" for this
Start thinking about a machine learning project (possibly related to your IHME research)
Read

Think about an "elevator pitch"

Experience indicates that some students may feel that they do not yet know enough about the scope of AI/ML to develop a project yet, let alone an elevator pitch for it. Here is an example of a project that I hope someone does:

This all started with our work on smoking prevalence, the details of which do not fit into the elevator ride. But the key point is we want to know how much of the population is exhibiting this important risk factor. So we ask a representative sample, via telephone survey. And we start the questions off with a screening question, "have you smoked at least 100 cigarettes in your life?". Here is the problem: there are at least 3 common interpretations of this question: 50% think it means A, 25% B, 25% C. This is important to know, but finding out required hard work using qualitative methods, cognitive interviewing, think-aloud exercises, etc. Wouldn't it be cool if when you were developing a survey, you could just ask a computer for a list of possible interpretations of your candidate question? Project: make a computer do this, so that survey designers don't have to do all the hard work of cognitive interviewing. Or at least so that they are pretty sure things are going to work when they do the testing...



In [16]:

    
!cd /homes/abie/nbconvert/; cp /homes/abie/notebook/2013_03_31_ML4HM_Lecture_1.ipynb L1.ipynb; ./nbconvert.py --format reveal L1.ipynb









    



======================================================
Warning, we are deprecating this version of nbconvert,
please consider using the new version.
======================================================



In [1]:

    
import ipynb_style
reload(ipynb_style)
ipynb_style.presentation()









    Out[1]:



In [ ]:

	outlook	temperature	humidity	windy	play
0	sunny	85	85	False	no
1	sunny	80	90	True	no
2	overcast	83	86	False	yes
3	rainy	70	96	False	yes
4	rainy	68	80	False	yes