Bayesian Logistic Regression with PyMC3

This is a reproduction with a few slight alterations of Bayesian Log Reg by J. Benjamin Cook
How likely am I to make more than $50,000 US Dollars?



In [25]:

    
%matplotlib inline
import pandas as pd
import numpy as np
import pymc3 as pm
import matplotlib.pyplot as plt
import seaborn
import warnings
warnings.filterwarnings('ignore')

The Adult Data Set is commonly used to benchmark machine learning algorithms. The goal is to use demographic features, or variables, to predict whether an individual makes more than \$50,000 per year. The data set is almost 20 years old, and therefore, not perfect for determining the probability that I will make more than \$50K, but it is a nice, simple dataset that can be used to showcase a few benefits of using Bayesian logistic regression over its frequentist counterpart.

The motivation for myself to reproduce this piece of work was to learn how to use Odd Ratio in Bayesian Regression.



In [26]:

    
data = pd.read_csv("data.csv", header=None, skiprows=1, names=['age', 'workclass', 'fnlwgt', 
                'education-categorical', 'educ', 
                'marital-status', 'occupation',
                'relationship', 'race', 'sex', 
                'captial-gain', 'capital-loss', 
                'hours', 'native-country', 
                'income'])



In [27]:

    
data









    Out[27]:






  
    
      
      age
      workclass
      fnlwgt
      education-categorical
      educ
      marital-status
      occupation
      relationship
      race
      sex
      captial-gain
      capital-loss
      hours
      native-country
      income
    
  
  
    
      0
      39
      State-gov
      77516
      Bachelors
      13
      Never-married
      Adm-clerical
      Not-in-family
      White
      Male
      2174
      0
      40
      United-States
      <=50K
    
    
      1
      50
      Self-emp-not-inc
      83311
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      13
      United-States
      <=50K
    
    
      2
      38
      Private
      215646
      HS-grad
      9
      Divorced
      Handlers-cleaners
      Not-in-family
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      3
      53
      Private
      234721
      11th
      7
      Married-civ-spouse
      Handlers-cleaners
      Husband
      Black
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      4
      28
      Private
      338409
      Bachelors
      13
      Married-civ-spouse
      Prof-specialty
      Wife
      Black
      Female
      0
      0
      40
      Cuba
      <=50K
    
    
      5
      37
      Private
      284582
      Masters
      14
      Married-civ-spouse
      Exec-managerial
      Wife
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      6
      49
      Private
      160187
      9th
      5
      Married-spouse-absent
      Other-service
      Not-in-family
      Black
      Female
      0
      0
      16
      Jamaica
      <=50K
    
    
      7
      52
      Self-emp-not-inc
      209642
      HS-grad
      9
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      45
      United-States
      >50K
    
    
      8
      31
      Private
      45781
      Masters
      14
      Never-married
      Prof-specialty
      Not-in-family
      White
      Female
      14084
      0
      50
      United-States
      >50K
    
    
      9
      42
      Private
      159449
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      5178
      0
      40
      United-States
      >50K
    
    
      10
      37
      Private
      280464
      Some-college
      10
      Married-civ-spouse
      Exec-managerial
      Husband
      Black
      Male
      0
      0
      80
      United-States
      >50K
    
    
      11
      30
      State-gov
      141297
      Bachelors
      13
      Married-civ-spouse
      Prof-specialty
      Husband
      Asian-Pac-Islander
      Male
      0
      0
      40
      India
      >50K
    
    
      12
      23
      Private
      122272
      Bachelors
      13
      Never-married
      Adm-clerical
      Own-child
      White
      Female
      0
      0
      30
      United-States
      <=50K
    
    
      13
      32
      Private
      205019
      Assoc-acdm
      12
      Never-married
      Sales
      Not-in-family
      Black
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      14
      40
      Private
      121772
      Assoc-voc
      11
      Married-civ-spouse
      Craft-repair
      Husband
      Asian-Pac-Islander
      Male
      0
      0
      40
      ?
      >50K
    
    
      15
      34
      Private
      245487
      7th-8th
      4
      Married-civ-spouse
      Transport-moving
      Husband
      Amer-Indian-Eskimo
      Male
      0
      0
      45
      Mexico
      <=50K
    
    
      16
      25
      Self-emp-not-inc
      176756
      HS-grad
      9
      Never-married
      Farming-fishing
      Own-child
      White
      Male
      0
      0
      35
      United-States
      <=50K
    
    
      17
      32
      Private
      186824
      HS-grad
      9
      Never-married
      Machine-op-inspct
      Unmarried
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      18
      38
      Private
      28887
      11th
      7
      Married-civ-spouse
      Sales
      Husband
      White
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      19
      43
      Self-emp-not-inc
      292175
      Masters
      14
      Divorced
      Exec-managerial
      Unmarried
      White
      Female
      0
      0
      45
      United-States
      >50K
    
    
      20
      40
      Private
      193524
      Doctorate
      16
      Married-civ-spouse
      Prof-specialty
      Husband
      White
      Male
      0
      0
      60
      United-States
      >50K
    
    
      21
      54
      Private
      302146
      HS-grad
      9
      Separated
      Other-service
      Unmarried
      Black
      Female
      0
      0
      20
      United-States
      <=50K
    
    
      22
      35
      Federal-gov
      76845
      9th
      5
      Married-civ-spouse
      Farming-fishing
      Husband
      Black
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      23
      43
      Private
      117037
      11th
      7
      Married-civ-spouse
      Transport-moving
      Husband
      White
      Male
      0
      2042
      40
      United-States
      <=50K
    
    
      24
      59
      Private
      109015
      HS-grad
      9
      Divorced
      Tech-support
      Unmarried
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      25
      56
      Local-gov
      216851
      Bachelors
      13
      Married-civ-spouse
      Tech-support
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      26
      19
      Private
      168294
      HS-grad
      9
      Never-married
      Craft-repair
      Own-child
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      27
      54
      ?
      180211
      Some-college
      10
      Married-civ-spouse
      ?
      Husband
      Asian-Pac-Islander
      Male
      0
      0
      60
      South
      >50K
    
    
      28
      39
      Private
      367260
      HS-grad
      9
      Divorced
      Exec-managerial
      Not-in-family
      White
      Male
      0
      0
      80
      United-States
      <=50K
    
    
      29
      49
      Private
      193366
      HS-grad
      9
      Married-civ-spouse
      Craft-repair
      Husband
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      32531
      30
      ?
      33811
      Bachelors
      13
      Never-married
      ?
      Not-in-family
      Asian-Pac-Islander
      Female
      0
      0
      99
      United-States
      <=50K
    
    
      32532
      34
      Private
      204461
      Doctorate
      16
      Married-civ-spouse
      Prof-specialty
      Husband
      White
      Male
      0
      0
      60
      United-States
      >50K
    
    
      32533
      54
      Private
      337992
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      Asian-Pac-Islander
      Male
      0
      0
      50
      Japan
      >50K
    
    
      32534
      37
      Private
      179137
      Some-college
      10
      Divorced
      Adm-clerical
      Unmarried
      White
      Female
      0
      0
      39
      United-States
      <=50K
    
    
      32535
      22
      Private
      325033
      12th
      8
      Never-married
      Protective-serv
      Own-child
      Black
      Male
      0
      0
      35
      United-States
      <=50K
    
    
      32536
      34
      Private
      160216
      Bachelors
      13
      Never-married
      Exec-managerial
      Not-in-family
      White
      Female
      0
      0
      55
      United-States
      >50K
    
    
      32537
      30
      Private
      345898
      HS-grad
      9
      Never-married
      Craft-repair
      Not-in-family
      Black
      Male
      0
      0
      46
      United-States
      <=50K
    
    
      32538
      38
      Private
      139180
      Bachelors
      13
      Divorced
      Prof-specialty
      Unmarried
      Black
      Female
      15020
      0
      45
      United-States
      >50K
    
    
      32539
      71
      ?
      287372
      Doctorate
      16
      Married-civ-spouse
      ?
      Husband
      White
      Male
      0
      0
      10
      United-States
      >50K
    
    
      32540
      45
      State-gov
      252208
      HS-grad
      9
      Separated
      Adm-clerical
      Own-child
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32541
      41
      ?
      202822
      HS-grad
      9
      Separated
      ?
      Not-in-family
      Black
      Female
      0
      0
      32
      United-States
      <=50K
    
    
      32542
      72
      ?
      129912
      HS-grad
      9
      Married-civ-spouse
      ?
      Husband
      White
      Male
      0
      0
      25
      United-States
      <=50K
    
    
      32543
      45
      Local-gov
      119199
      Assoc-acdm
      12
      Divorced
      Prof-specialty
      Unmarried
      White
      Female
      0
      0
      48
      United-States
      <=50K
    
    
      32544
      31
      Private
      199655
      Masters
      14
      Divorced
      Other-service
      Not-in-family
      Other
      Female
      0
      0
      30
      United-States
      <=50K
    
    
      32545
      39
      Local-gov
      111499
      Assoc-acdm
      12
      Married-civ-spouse
      Adm-clerical
      Wife
      White
      Female
      0
      0
      20
      United-States
      >50K
    
    
      32546
      37
      Private
      198216
      Assoc-acdm
      12
      Divorced
      Tech-support
      Not-in-family
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32547
      43
      Private
      260761
      HS-grad
      9
      Married-civ-spouse
      Machine-op-inspct
      Husband
      White
      Male
      0
      0
      40
      Mexico
      <=50K
    
    
      32548
      65
      Self-emp-not-inc
      99359
      Prof-school
      15
      Never-married
      Prof-specialty
      Not-in-family
      White
      Male
      1086
      0
      60
      United-States
      <=50K
    
    
      32549
      43
      State-gov
      255835
      Some-college
      10
      Divorced
      Adm-clerical
      Other-relative
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32550
      43
      Self-emp-not-inc
      27242
      Some-college
      10
      Married-civ-spouse
      Craft-repair
      Husband
      White
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      32551
      32
      Private
      34066
      10th
      6
      Married-civ-spouse
      Handlers-cleaners
      Husband
      Amer-Indian-Eskimo
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      32552
      43
      Private
      84661
      Assoc-voc
      11
      Married-civ-spouse
      Sales
      Husband
      White
      Male
      0
      0
      45
      United-States
      <=50K
    
    
      32553
      32
      Private
      116138
      Masters
      14
      Never-married
      Tech-support
      Not-in-family
      Asian-Pac-Islander
      Male
      0
      0
      11
      Taiwan
      <=50K
    
    
      32554
      53
      Private
      321865
      Masters
      14
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      32555
      22
      Private
      310152
      Some-college
      10
      Never-married
      Protective-serv
      Not-in-family
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      32556
      27
      Private
      257302
      Assoc-acdm
      12
      Married-civ-spouse
      Tech-support
      Wife
      White
      Female
      0
      0
      38
      United-States
      <=50K
    
    
      32557
      40
      Private
      154374
      HS-grad
      9
      Married-civ-spouse
      Machine-op-inspct
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      32558
      58
      Private
      151910
      HS-grad
      9
      Widowed
      Adm-clerical
      Unmarried
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32559
      22
      Private
      201490
      HS-grad
      9
      Never-married
      Adm-clerical
      Own-child
      White
      Male
      0
      0
      20
      United-States
      <=50K
    
    
      32560
      52
      Self-emp-inc
      287927
      HS-grad
      9
      Married-civ-spouse
      Exec-managerial
      Wife
      White
      Female
      15024
      0
      40
      United-States
      >50K
    
  

32561 rows × 15 columns



In [28]:

    
data = data[~pd.isnull(data['income'])]



In [29]:

    
data[data['native-country']==" United-States"]









    Out[29]:






  
    
      
      age
      workclass
      fnlwgt
      education-categorical
      educ
      marital-status
      occupation
      relationship
      race
      sex
      captial-gain
      capital-loss
      hours
      native-country
      income
    
  
  
    
      0
      39
      State-gov
      77516
      Bachelors
      13
      Never-married
      Adm-clerical
      Not-in-family
      White
      Male
      2174
      0
      40
      United-States
      <=50K
    
    
      1
      50
      Self-emp-not-inc
      83311
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      13
      United-States
      <=50K
    
    
      2
      38
      Private
      215646
      HS-grad
      9
      Divorced
      Handlers-cleaners
      Not-in-family
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      3
      53
      Private
      234721
      11th
      7
      Married-civ-spouse
      Handlers-cleaners
      Husband
      Black
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      5
      37
      Private
      284582
      Masters
      14
      Married-civ-spouse
      Exec-managerial
      Wife
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      7
      52
      Self-emp-not-inc
      209642
      HS-grad
      9
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      45
      United-States
      >50K
    
    
      8
      31
      Private
      45781
      Masters
      14
      Never-married
      Prof-specialty
      Not-in-family
      White
      Female
      14084
      0
      50
      United-States
      >50K
    
    
      9
      42
      Private
      159449
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      5178
      0
      40
      United-States
      >50K
    
    
      10
      37
      Private
      280464
      Some-college
      10
      Married-civ-spouse
      Exec-managerial
      Husband
      Black
      Male
      0
      0
      80
      United-States
      >50K
    
    
      12
      23
      Private
      122272
      Bachelors
      13
      Never-married
      Adm-clerical
      Own-child
      White
      Female
      0
      0
      30
      United-States
      <=50K
    
    
      13
      32
      Private
      205019
      Assoc-acdm
      12
      Never-married
      Sales
      Not-in-family
      Black
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      16
      25
      Self-emp-not-inc
      176756
      HS-grad
      9
      Never-married
      Farming-fishing
      Own-child
      White
      Male
      0
      0
      35
      United-States
      <=50K
    
    
      17
      32
      Private
      186824
      HS-grad
      9
      Never-married
      Machine-op-inspct
      Unmarried
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      18
      38
      Private
      28887
      11th
      7
      Married-civ-spouse
      Sales
      Husband
      White
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      19
      43
      Self-emp-not-inc
      292175
      Masters
      14
      Divorced
      Exec-managerial
      Unmarried
      White
      Female
      0
      0
      45
      United-States
      >50K
    
    
      20
      40
      Private
      193524
      Doctorate
      16
      Married-civ-spouse
      Prof-specialty
      Husband
      White
      Male
      0
      0
      60
      United-States
      >50K
    
    
      21
      54
      Private
      302146
      HS-grad
      9
      Separated
      Other-service
      Unmarried
      Black
      Female
      0
      0
      20
      United-States
      <=50K
    
    
      22
      35
      Federal-gov
      76845
      9th
      5
      Married-civ-spouse
      Farming-fishing
      Husband
      Black
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      23
      43
      Private
      117037
      11th
      7
      Married-civ-spouse
      Transport-moving
      Husband
      White
      Male
      0
      2042
      40
      United-States
      <=50K
    
    
      24
      59
      Private
      109015
      HS-grad
      9
      Divorced
      Tech-support
      Unmarried
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      25
      56
      Local-gov
      216851
      Bachelors
      13
      Married-civ-spouse
      Tech-support
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      26
      19
      Private
      168294
      HS-grad
      9
      Never-married
      Craft-repair
      Own-child
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      28
      39
      Private
      367260
      HS-grad
      9
      Divorced
      Exec-managerial
      Not-in-family
      White
      Male
      0
      0
      80
      United-States
      <=50K
    
    
      29
      49
      Private
      193366
      HS-grad
      9
      Married-civ-spouse
      Craft-repair
      Husband
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      30
      23
      Local-gov
      190709
      Assoc-acdm
      12
      Never-married
      Protective-serv
      Not-in-family
      White
      Male
      0
      0
      52
      United-States
      <=50K
    
    
      31
      20
      Private
      266015
      Some-college
      10
      Never-married
      Sales
      Own-child
      Black
      Male
      0
      0
      44
      United-States
      <=50K
    
    
      32
      45
      Private
      386940
      Bachelors
      13
      Divorced
      Exec-managerial
      Own-child
      White
      Male
      0
      1408
      40
      United-States
      <=50K
    
    
      33
      30
      Federal-gov
      59951
      Some-college
      10
      Married-civ-spouse
      Adm-clerical
      Own-child
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      34
      22
      State-gov
      311512
      Some-college
      10
      Married-civ-spouse
      Other-service
      Husband
      Black
      Male
      0
      0
      15
      United-States
      <=50K
    
    
      36
      21
      Private
      197200
      Some-college
      10
      Never-married
      Machine-op-inspct
      Own-child
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      32528
      31
      Private
      292592
      HS-grad
      9
      Married-civ-spouse
      Machine-op-inspct
      Wife
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32529
      29
      Private
      125976
      HS-grad
      9
      Separated
      Sales
      Unmarried
      White
      Female
      0
      0
      35
      United-States
      <=50K
    
    
      32530
      35
      ?
      320084
      Bachelors
      13
      Married-civ-spouse
      ?
      Wife
      White
      Female
      0
      0
      55
      United-States
      >50K
    
    
      32531
      30
      ?
      33811
      Bachelors
      13
      Never-married
      ?
      Not-in-family
      Asian-Pac-Islander
      Female
      0
      0
      99
      United-States
      <=50K
    
    
      32532
      34
      Private
      204461
      Doctorate
      16
      Married-civ-spouse
      Prof-specialty
      Husband
      White
      Male
      0
      0
      60
      United-States
      >50K
    
    
      32534
      37
      Private
      179137
      Some-college
      10
      Divorced
      Adm-clerical
      Unmarried
      White
      Female
      0
      0
      39
      United-States
      <=50K
    
    
      32535
      22
      Private
      325033
      12th
      8
      Never-married
      Protective-serv
      Own-child
      Black
      Male
      0
      0
      35
      United-States
      <=50K
    
    
      32536
      34
      Private
      160216
      Bachelors
      13
      Never-married
      Exec-managerial
      Not-in-family
      White
      Female
      0
      0
      55
      United-States
      >50K
    
    
      32537
      30
      Private
      345898
      HS-grad
      9
      Never-married
      Craft-repair
      Not-in-family
      Black
      Male
      0
      0
      46
      United-States
      <=50K
    
    
      32538
      38
      Private
      139180
      Bachelors
      13
      Divorced
      Prof-specialty
      Unmarried
      Black
      Female
      15020
      0
      45
      United-States
      >50K
    
    
      32539
      71
      ?
      287372
      Doctorate
      16
      Married-civ-spouse
      ?
      Husband
      White
      Male
      0
      0
      10
      United-States
      >50K
    
    
      32540
      45
      State-gov
      252208
      HS-grad
      9
      Separated
      Adm-clerical
      Own-child
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32541
      41
      ?
      202822
      HS-grad
      9
      Separated
      ?
      Not-in-family
      Black
      Female
      0
      0
      32
      United-States
      <=50K
    
    
      32542
      72
      ?
      129912
      HS-grad
      9
      Married-civ-spouse
      ?
      Husband
      White
      Male
      0
      0
      25
      United-States
      <=50K
    
    
      32543
      45
      Local-gov
      119199
      Assoc-acdm
      12
      Divorced
      Prof-specialty
      Unmarried
      White
      Female
      0
      0
      48
      United-States
      <=50K
    
    
      32544
      31
      Private
      199655
      Masters
      14
      Divorced
      Other-service
      Not-in-family
      Other
      Female
      0
      0
      30
      United-States
      <=50K
    
    
      32545
      39
      Local-gov
      111499
      Assoc-acdm
      12
      Married-civ-spouse
      Adm-clerical
      Wife
      White
      Female
      0
      0
      20
      United-States
      >50K
    
    
      32546
      37
      Private
      198216
      Assoc-acdm
      12
      Divorced
      Tech-support
      Not-in-family
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32548
      65
      Self-emp-not-inc
      99359
      Prof-school
      15
      Never-married
      Prof-specialty
      Not-in-family
      White
      Male
      1086
      0
      60
      United-States
      <=50K
    
    
      32549
      43
      State-gov
      255835
      Some-college
      10
      Divorced
      Adm-clerical
      Other-relative
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32550
      43
      Self-emp-not-inc
      27242
      Some-college
      10
      Married-civ-spouse
      Craft-repair
      Husband
      White
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      32551
      32
      Private
      34066
      10th
      6
      Married-civ-spouse
      Handlers-cleaners
      Husband
      Amer-Indian-Eskimo
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      32552
      43
      Private
      84661
      Assoc-voc
      11
      Married-civ-spouse
      Sales
      Husband
      White
      Male
      0
      0
      45
      United-States
      <=50K
    
    
      32554
      53
      Private
      321865
      Masters
      14
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      32555
      22
      Private
      310152
      Some-college
      10
      Never-married
      Protective-serv
      Not-in-family
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      32556
      27
      Private
      257302
      Assoc-acdm
      12
      Married-civ-spouse
      Tech-support
      Wife
      White
      Female
      0
      0
      38
      United-States
      <=50K
    
    
      32557
      40
      Private
      154374
      HS-grad
      9
      Married-civ-spouse
      Machine-op-inspct
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      32558
      58
      Private
      151910
      HS-grad
      9
      Widowed
      Adm-clerical
      Unmarried
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32559
      22
      Private
      201490
      HS-grad
      9
      Never-married
      Adm-clerical
      Own-child
      White
      Male
      0
      0
      20
      United-States
      <=50K
    
    
      32560
      52
      Self-emp-inc
      287927
      HS-grad
      9
      Married-civ-spouse
      Exec-managerial
      Wife
      White
      Female
      15024
      0
      40
      United-States
      >50K
    
  

29170 rows × 15 columns



In [30]:

    
income = 1 * (data['income'] == " >50K")
age2 = np.square(data['age'])



In [31]:

    
data = data[['age', 'educ', 'hours']]
data['age2'] = age2
data['income'] = income



In [32]:

    
income.value_counts()









    Out[32]:





0    24720
1     7841
Name: income, dtype: int64

The model

We will use a simple model, which assumes that the probability of making more than $50K is a function of age, years of education and hours worked per week. We will use PyMC3 do inference.

In Bayesian statistics, we treat everything as a random variable and we want to know the posterior probability distribution of the parameters (in this case the regression coefficients) The posterior is equal to the likelihood $$p(\theta | D) = \frac{p(D|\theta)p(\theta)}{p(D)}$$

Because the denominator is a notoriously difficult integral, $p(D) = \int p(D | \theta) p(\theta) d \theta $ we would prefer to skip computing it. Fortunately, if we draw examples from the parameter space, with probability proportional to the height of the posterior at any given point, we end up with an empirical distribution that converges to the posterior as the number of samples approaches infinity.

What this means in practice is that we only need to worry about the numerator.

Getting back to logistic regression, we need to specify a prior and a likelihood in order to draw samples from the posterior. We could use sociological knowledge about the effects of age and education on income, but instead, let's use the default prior specification for GLM coefficients that PyMC3 gives us, which is $p(θ)=N(0,10^{12}I)$. This is a very vague prior that will let the data speak for themselves.

The likelihood is the product of n Bernoulli trials, $\prod^{n}_{i=1} p_{i}^{y} (1 - p_{i})^{1-y_{i}}$, where $p_{i} = \frac{1}{1-e^{-z_{i}}},$,

$z_{i} = \beta_{0} + \beta_{1}(age)_{i} + \beta_2(age)^{2}_{i} + \beta_{3}(educ)_{i} + \beta_{4}(hours)_{i}$ and $y_{i} = 1$ if income is greater than 50K and $y_{i} = 0$ otherwise.

With the math out of the way we can get back to the data. Here I use PyMC3 to draw samples from the posterior. The sampling algorithm used is NUTS, which is a form of Hamiltonian Monte Carlo, in which parameteres are tuned automatically. Notice, that we get to borrow the syntax of specifying GLM's from R, very convenient! The last line in this cell tosses out the first 1000 samples, which are taken before the Markov Chain has converged and therefore do not come from our target distribution.



In [33]:

    
with pm.Model() as model:
    pm.glm.glm('income ~ age + age2 + educ + hours', data)
    trace = pm.sample(2000, pm.NUTS(), progressbar=True)
trace = trace[1000:]









    



 [-----------------100%-----------------] 2000 of 2000 complete in 76.4 sec

Some results

One of the major benefits that makes Bayesian data analysis worth the extra computational effort in many circumstances is that we can be explicit about our uncertainty. Maximum likelihood returns a number, but how certain can we be that we found the right number? Instead, Bayesian inference returns a distribution over parameter values.

Below, I overlay a kernel density estimate onto a 2-D histogram of samples from $p(β_{1},β_{3}|D)$.



In [34]:

    
plt.figure(figsize=(9,7))
plt.hist2d(trace['age'], trace['educ'], alpha=.5, cmap="Reds", bins=25)
plt.colorbar()
seaborn.kdeplot(trace['age'], trace['educ'])
plt.xlabel("beta_age")
plt.ylabel("beta_educ")
plt.show()

So how do age and education affect the probability of making more than $$50K?$ To answer this question, we can show how the probability of making more than $50K changes with age for a few different education levels. Here, we assume that the number of hours worked per week is fixed at 50. PyMC3 gives us a convenient way to plot the posterior predictive distribution. We need to give the function a linear model and a set of points to evaluate. We will pass in three different linear models: one with educ == 12 (finished high school), one with educ == 16 (finished undergrad) and one with educ == 19 (three years of grad school).



In [35]:

    
# Linear model with hours == 50 and educ == 12
lm = lambda x, samples: 1 / (1 + np.exp(-(samples['Intercept'] + 
                                          samples['age']*x + 
                                          samples['age2']*np.square(x) + 
                                          samples['educ']*12 + 
                                          samples['hours']*50)))

# Linear model with hours == 50 and educ == 16
lm2 = lambda x, samples: 1 / (1 + np.exp(-(samples['Intercept'] + 
                                          samples['age']*x + 
                                          samples['age2']*np.square(x) + 
                                          samples['educ']*16 + 
                                          samples['hours']*50)))

# Linear model with hours == 50 and educ == 19
lm3 = lambda x, samples: 1 / (1 + np.exp(-(samples['Intercept'] + 
                                          samples['age']*x + 
                                          samples['age2']*np.square(x) + 
                                          samples['educ']*19 + 
                                          samples['hours']*50)))

Each curve shows how the probability of earning more than $ 50K$ changes with age. The red curve represents 19 years of education, the green curve represents 16 years of education and the blue curve represents 12 years of education. For all three education levels, the probability of making more than $50K increases with age until approximately age 60, when the probability begins to drop off. Notice that each curve is a little blurry. This is because we are actually plotting 100 different curves for each level of education. Each curve is a draw from our posterior distribution. Because the curves are somewhat translucent, we can interpret dark, narrow portions of a curve as places where we have low uncertainty and light, spread out portions of the curve as places where we have somewhat higher uncertainty about our coefficient values.



In [36]:

    
# Plot the posterior predictive distributions of P(income > $50K) vs. age
pm.glm.plot_posterior_predictive(trace, eval=np.linspace(25, 75, 1000), lm=lm, samples=100, color="blue", alpha=.15)
pm.glm.plot_posterior_predictive(trace, eval=np.linspace(25, 75, 1000), lm=lm2, samples=100, color="green", alpha=.15)
pm.glm.plot_posterior_predictive(trace, eval=np.linspace(25, 75, 1000), lm=lm3, samples=100, color="red", alpha=.15)
plt.ylim([.4,.8])
plt.ylabel("P(Income > $50K)")
plt.xlabel("Age")
plt.show()



In [38]:

    
b = trace['educ']
plt.hist(np.exp(b), bins=25, normed=True)
plt.xlabel("Odds Ratio")
plt.show()

Finally, we can find a confidence interval for this quantity. This may be the best part about Bayesian statistics: we get to interpret confidence intervals the way we've always wanted to interpret them. We are 95% confident that the odds ratio lies within our interval!



In [39]:

    
lb, ub = np.percentile(b, 2.5), np.percentile(b, 97.5)

print("P(%.3f < O.R. < %.3f) = 0.95"%(np.exp(3*lb),np.exp(3*ub)))









    



P(1.149 < O.R. < 1.160) = 0.95



In [ ]:

	age	workclass	fnlwgt	education-categorical	educ	marital-status	occupation	relationship	race	sex	captial-gain	capital-loss	hours	native-country	income
0	39	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	0	40	United-States	<=50K
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	13	United-States	<=50K
2	38	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	0	40	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	40	United-States	<=50K
4	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	0	40	Cuba	<=50K
5	37	Private	284582	Masters	14	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	0	40	United-States	<=50K
6	49	Private	160187	9th	5	Married-spouse-absent	Other-service	Not-in-family	Black	Female	0	0	16	Jamaica	<=50K
7	52	Self-emp-not-inc	209642	HS-grad	9	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	45	United-States	>50K
8	31	Private	45781	Masters	14	Never-married	Prof-specialty	Not-in-family	White	Female	14084	0	50	United-States	>50K
9	42	Private	159449	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	5178	0	40	United-States	>50K
10	37	Private	280464	Some-college	10	Married-civ-spouse	Exec-managerial	Husband	Black	Male	0	0	80	United-States	>50K
11	30	State-gov	141297	Bachelors	13	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	0	0	40	India	>50K
12	23	Private	122272	Bachelors	13	Never-married	Adm-clerical	Own-child	White	Female	0	0	30	United-States	<=50K
13	32	Private	205019	Assoc-acdm	12	Never-married	Sales	Not-in-family	Black	Male	0	0	50	United-States	<=50K
14	40	Private	121772	Assoc-voc	11	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	0	0	40	?	>50K
15	34	Private	245487	7th-8th	4	Married-civ-spouse	Transport-moving	Husband	Amer-Indian-Eskimo	Male	0	0	45	Mexico	<=50K
16	25	Self-emp-not-inc	176756	HS-grad	9	Never-married	Farming-fishing	Own-child	White	Male	0	0	35	United-States	<=50K
17	32	Private	186824	HS-grad	9	Never-married	Machine-op-inspct	Unmarried	White	Male	0	0	40	United-States	<=50K
18	38	Private	28887	11th	7	Married-civ-spouse	Sales	Husband	White	Male	0	0	50	United-States	<=50K
19	43	Self-emp-not-inc	292175	Masters	14	Divorced	Exec-managerial	Unmarried	White	Female	0	0	45	United-States	>50K
20	40	Private	193524	Doctorate	16	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	60	United-States	>50K
21	54	Private	302146	HS-grad	9	Separated	Other-service	Unmarried	Black	Female	0	0	20	United-States	<=50K
22	35	Federal-gov	76845	9th	5	Married-civ-spouse	Farming-fishing	Husband	Black	Male	0	0	40	United-States	<=50K
23	43	Private	117037	11th	7	Married-civ-spouse	Transport-moving	Husband	White	Male	0	2042	40	United-States	<=50K
24	59	Private	109015	HS-grad	9	Divorced	Tech-support	Unmarried	White	Female	0	0	40	United-States	<=50K
25	56	Local-gov	216851	Bachelors	13	Married-civ-spouse	Tech-support	Husband	White	Male	0	0	40	United-States	>50K
26	19	Private	168294	HS-grad	9	Never-married	Craft-repair	Own-child	White	Male	0	0	40	United-States	<=50K
27	54	?	180211	Some-college	10	Married-civ-spouse	?	Husband	Asian-Pac-Islander	Male	0	0	60	South	>50K
28	39	Private	367260	HS-grad	9	Divorced	Exec-managerial	Not-in-family	White	Male	0	0	80	United-States	<=50K
29	49	Private	193366	HS-grad	9	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	40	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
32531	30	?	33811	Bachelors	13	Never-married	?	Not-in-family	Asian-Pac-Islander	Female	0	0	99	United-States	<=50K
32532	34	Private	204461	Doctorate	16	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	60	United-States	>50K
32533	54	Private	337992	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	Asian-Pac-Islander	Male	0	0	50	Japan	>50K
32534	37	Private	179137	Some-college	10	Divorced	Adm-clerical	Unmarried	White	Female	0	0	39	United-States	<=50K
32535	22	Private	325033	12th	8	Never-married	Protective-serv	Own-child	Black	Male	0	0	35	United-States	<=50K
32536	34	Private	160216	Bachelors	13	Never-married	Exec-managerial	Not-in-family	White	Female	0	0	55	United-States	>50K
32537	30	Private	345898	HS-grad	9	Never-married	Craft-repair	Not-in-family	Black	Male	0	0	46	United-States	<=50K
32538	38	Private	139180	Bachelors	13	Divorced	Prof-specialty	Unmarried	Black	Female	15020	0	45	United-States	>50K
32539	71	?	287372	Doctorate	16	Married-civ-spouse	?	Husband	White	Male	0	0	10	United-States	>50K
32540	45	State-gov	252208	HS-grad	9	Separated	Adm-clerical	Own-child	White	Female	0	0	40	United-States	<=50K
32541	41	?	202822	HS-grad	9	Separated	?	Not-in-family	Black	Female	0	0	32	United-States	<=50K
32542	72	?	129912	HS-grad	9	Married-civ-spouse	?	Husband	White	Male	0	0	25	United-States	<=50K
32543	45	Local-gov	119199	Assoc-acdm	12	Divorced	Prof-specialty	Unmarried	White	Female	0	0	48	United-States	<=50K
32544	31	Private	199655	Masters	14	Divorced	Other-service	Not-in-family	Other	Female	0	0	30	United-States	<=50K
32545	39	Local-gov	111499	Assoc-acdm	12	Married-civ-spouse	Adm-clerical	Wife	White	Female	0	0	20	United-States	>50K
32546	37	Private	198216	Assoc-acdm	12	Divorced	Tech-support	Not-in-family	White	Female	0	0	40	United-States	<=50K
32547	43	Private	260761	HS-grad	9	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0	0	40	Mexico	<=50K
32548	65	Self-emp-not-inc	99359	Prof-school	15	Never-married	Prof-specialty	Not-in-family	White	Male	1086	0	60	United-States	<=50K
32549	43	State-gov	255835	Some-college	10	Divorced	Adm-clerical	Other-relative	White	Female	0	0	40	United-States	<=50K
32550	43	Self-emp-not-inc	27242	Some-college	10	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	50	United-States	<=50K
32551	32	Private	34066	10th	6	Married-civ-spouse	Handlers-cleaners	Husband	Amer-Indian-Eskimo	Male	0	0	40	United-States	<=50K
32552	43	Private	84661	Assoc-voc	11	Married-civ-spouse	Sales	Husband	White	Male	0	0	45	United-States	<=50K
32553	32	Private	116138	Masters	14	Never-married	Tech-support	Not-in-family	Asian-Pac-Islander	Male	0	0	11	Taiwan	<=50K
32554	53	Private	321865	Masters	14	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	40	United-States	>50K
32555	22	Private	310152	Some-college	10	Never-married	Protective-serv	Not-in-family	White	Male	0	0	40	United-States	<=50K
32556	27	Private	257302	Assoc-acdm	12	Married-civ-spouse	Tech-support	Wife	White	Female	0	0	38	United-States	<=50K
32557	40	Private	154374	HS-grad	9	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0	0	40	United-States	>50K
32558	58	Private	151910	HS-grad	9	Widowed	Adm-clerical	Unmarried	White	Female	0	0	40	United-States	<=50K
32559	22	Private	201490	HS-grad	9	Never-married	Adm-clerical	Own-child	White	Male	0	0	20	United-States	<=50K
32560	52	Self-emp-inc	287927	HS-grad	9	Married-civ-spouse	Exec-managerial	Wife	White	Female	15024	0	40	United-States	>50K