Exploratory Data Analysis

When placed in Metapack data package, this notebook will load the package and run a variety of common EDA operations on the first resource.



In [1]:

    
import matplotlib.pyplot as plt 
import seaborn as sns
import metapack as mp
import pandas as pd
import numpy as np
from IPython.display import display 

%matplotlib inline
sns.set_context('notebook')



In [2]:

    
pd.__version__









    Out[2]:





'0.23.4'



In [3]:

    
pkg = mp.jupyter.open_package()

# For testing and development
#pkg = mp.open_package('http://s3.amazonaws.com/library.metatab.org/cde.ca.gov-accountability_dashboard-2.zip')

pkg









    Out[3]:




Cornell National Social Survey (CNSS), 2017
Cornell National Social Survey is a random-sample survey of adults aged 18 and over. In 2017, participants were asked their opinions on a range of topics.
cornell.edu-cnss-2017-1 from metapack+file:///Users/eric/proj/virt-proj/data-project/sdrdl-data-projects/cornell-social-survey/cornell.edu-cnss-2017/_packages/cornell.edu-cnss-2017-1/metadata.csv

To use this package, you will have to download the research file manually,
placing the file in the data directory. Cornell requires you to accept terms
and conditions before downloading the file, so we can't redistribute it.
I used Stat Transfer to convert the SPSS file to Stata, but there is one
variable, KRq1, "Number of Dogs Owned", that StatTransfer converts with
duplicate labels.
Documentation

Codebook None
Results and Citation Guidelines None

Contacts
Wrangler: Eric Busboom Civic Knowledge
Resources

cnss_2017 CNSS 2017, with categorical values



References

cnss_2017 Cornell National Social Survey 2017, in converted Stata forma



In [4]:

    
resource_name='cnss_2017'



In [5]:

    
pkg.resource(resource_name)









    Out[5]:




cnss_2017
file:///Users/eric/proj/virt-proj/data-project/sdrdl-data-projects/cornell-social-survey/cornell.edu-cnss-2017/_packages/cornell.edu-cnss-2017-1/data/cnss_2017.csv

Header Type Description
caseid integer Case identification number (assigned by SRI) 
survid integer Case identification number (assigned by SRI) 
timezone string Time zone (provided by MSG) 
state string State (provided by MSG) 
msa string Metropolitan Statistical Area (provided by MSG) 
msc integer Metropolitan Status Code (provided by MSG) 
censusr integer Census Region (provided by MSG) 
censusd integer Census Division (provided by MSG) 
cbsamsa integer CBSA MSA Met Status Code (provided by MSG) 
cbsamcsa integer CBSA MCSA Met Status Code (provided by MSG) 
ahq1 string College degrees for prisoners 
ahq2 string State funded college for prisoners 
ahq3 string Prisoners should repay education costs 
kenq1 string Country needs strong leader 
kenq2 string Courts get in way of leaders 
kenq3 string Media get in way of leaders 
js_version string Randomized text within JSq1 
jsq1 string Contraceptive policy under Trump 
lcq1 string China rise a threat or opportunity 
skq1 string Mens rights participant 
skq2_a string Red pill - Heard of term 
skq2_b string Alt right - Heard of term 
skq3 string Men should be alpha 
jbq1 string Resource to find new surgeon 
jbq2 string Used internet to find physician 
jbq3 string Helpfulness of internet w/ finding physician 
rvq1 string # of healthcare visits in the past year 
rvq2 string Rate customer service last healthcare visit 
rvq3 string Most urgent healthcare issue 
rvq4 string Employer offers wellness program 
rvq5 string Describe workplace wellness program 
rvq6 string Rate workplace wellness program 
jaq1 string Teen access to patient portal 
rq_seq_i string JAq2, JAq3 - Sequence Index Variable 
jaq2_rq string Parent access teen medical record 
jaq3_rq string Sensitive issue avoidance if parent views record 
pdq1 string Cosmetic or health reasons - Bariatric surgery 
pdq2 string Quick fix - Bariatric surgery 
pdq3 string Should insurance cover - Bariatric surgery 
krq2 string Canine clinical study interest 
mmq1 string Aware of bee health concern 
mmq2 string Personal concern about bee health 
mmq3 string Produce protecting bees - Pay more 
mmq4 string Organic food - Pay more 
riq1 string What do antibiotics kill 
riq2 string Cow antibiotics threaten human health 
riq3 string Milk without antibiotics - Pay more 
riq4 string Cow treatment on conventional/organic farms 
szq1 string Attractive natural sights - Neighborhood 
szq2 string People look out for each other - Neighborhood 
szq3 string Mental health a priority - Community 
szq4 string Sufficient mental health services - Community 
szq5 string Physical environment impacts mental health 
ebq1 string Worry about crime - Workplace 
ebq2 string Worry about crime - Neighborhood 
ebq3 string Recidivism due to criminal record 
drugq1 string Marijuana legalization 
mfq1 string Rate distraction at work 
mfq2 string Work from home frequency 
ssq1 string Homework completion rate in high school 
kwq1_a string Work harassment - Experienced 
kwq1_b string Domestic violence - Experienced 
kwq2 string Experience w/ violence impacted work 
skq4 string Harder for men to be successful 
skq5 string Men should protect women 
skq6 string Feminism good or bad for America 
eycq1 string Asked to do favor outside of work 
eycq2 string Who asked for favor outside of work 
eycq3 string Time spent doing favor outside of work 
eycq4 string Rules about doing favors outside of work 
lvq1_a string Restless - Past 30 days 
lvq1_b string Everything an effort - Past 30 days 
mjq1 string Undocumented farmworkers community impact 
idq1 string Occupational category 
idq2 string Labor union member 
idq3 string Voted in 2016 presidential election 
idq4 string Elected officials represent the rich 
idq5 string Minorities get government advantages 
employ string Employed 
jbtype string Main job type 
selfempl string Self-employed 
lkwork string Looking for new work 
lvq2_a string Performance raise - Eligible at work 
lvq2_b string Performance bonus - Eligible at work 
hhsize_a integer # adults 65+ in household 
hhsize_b integer # adults 18-64 in household 
hhsize_c integer # children in household 
hhadults integer  
ph_totl integer # phones for household 
ph_cell string Cell/Landline for survey 
yob integer Year born 
age integer Age (computed from yob) 
borninus string Born in US 
married string Marital status 
ideo string Social ideology 
party string Political party 
educ string Education level 
ownrent string Home ownership status 
hisp string Hispanic or Latino 
race_a string Caucasian - Race 
race_b string African-American - Race 
race_c string Native American - Race 
race_d string Asian - Race 
race_e string Other - Race 
numraces integer  
relig string Religious affiliation 
church string How often attend religious services 
hhince string Exact household income 2016 
hhinc50k string Over/Under $50k - Household income 2016 
hhincu string Range under $50k - Household income 2016 
hhinco string Range over $50k - Household income 2016 
hhinc string Household income 2016 - Coded value 
gender string Gender



In [6]:

    
df = pkg.resource(resource_name).read_csv(parse_dates=True)



In [7]:

    
df.head()









    Out[7]:







  
    
      
      caseid
      survid
      timezone
      state
      msa
      msc
      censusr
      censusd
      cbsamsa
      cbsamcsa
      ...
      race_e
      numraces
      relig
      church
      hhince
      hhinc50k
      hhincu
      hhinco
      hhinc
      gender
    
  
  
    
      0
      80007
      80007
      C
      TX
      2920.0
      1
      3
      7
      3
      5
      ...
      No
      1.0
      No religion / Atheist / Agnostic
      A few times a year
      NaN
      $50,000 or over
      NaN
      $150,000 or more
      $150,000 or more
      Male
    
    
      1
      80027
      80027
      C
      AL
      NaN
      5
      3
      6
      5
      1
      ...
      No
      1.0
      Protestant
      A few times a year
      NaN
      $50,000 or over
      NaN
      50 to under $75,000
      50 to under $75,000
      Female
    
    
      2
      80029
      80029
      C
      LA
      5560.0
      3
      3
      7
      3
      5
      ...
      No
      1.0
      Catholic
      A few times a year
      196000.0
      NaN
      NaN
      NaN
      $150,000 or more
      Male
    
    
      3
      80037
      80037
      C
      IN
      NaN
      5
      2
      3
      5
      1
      ...
      No
      1.0
      No religion / Atheist / Agnostic
      Never
      75000.0
      NaN
      NaN
      NaN
      75 to under $100,000
      Male
    
    
      4
      80041
      80041
      C
      MO
      3760.0
      1
      2
      4
      1
      5
      ...
      No
      1.0
      No religion / Atheist / Agnostic
      Never
      60000.0
      NaN
      NaN
      NaN
      50 to under $75,000
      Female
    
  

5 rows × 113 columns



In [8]:

    
empty_col_names = [cn for cn in df.columns if df[cn].nunique() == 0]
const_col_names= [cn for cn in df.columns if df[cn].nunique() == 1]
ignore_cols = empty_col_names+const_col_names
dt_col_names= list(df.select_dtypes(include=[np.datetime64]).columns)

number_col_names = [ cn for cn in df.select_dtypes(include=[np.number]).columns if cn not in ignore_cols ]
other_col_names = [cn for cn in df.columns if cn not in (empty_col_names+const_col_names+dt_col_names+number_col_names)]



In [9]:

    
pd.DataFrame.from_dict({'empty':[len(empty_col_names)], 
                        'const':[len(const_col_names)],
                        'datetime':[len(dt_col_names)],
                        'number':[len(number_col_names)],
                        'other':[len(other_col_names)],
                       }, 
                       orient='index', columns=['count'])

Constant Columns



In [10]:

    
if const_col_names:
    display(df[const_col_names].drop_duplicates().T)









    







  
    
      
      0
      1
      4
      36
    
  
  
    
      rvq3
      Answer
      Answer
      NaN
      NaN
    
    
      rvq5
      NaN
      Answer
      Answer
      NaN

Empty Columns



In [11]:

    
if empty_col_names:
    display(df[empty_col_names].drop_duplicates().T)

Date and Time Columns



In [12]:

    
if dt_col_names:
    display(df[dt_col_names].info())
    display(df[dt_col_names].describe().T)

Number Columns



In [13]:

    
if number_col_names:
    display(df[number_col_names].info())
    display(df[number_col_names].describe().T)









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 16 columns):
caseid      1000 non-null int64
survid      1000 non-null int64
msa         830 non-null float64
msc         1000 non-null int64
censusr     1000 non-null int64
censusd     1000 non-null int64
cbsamsa     1000 non-null int64
cbsamcsa    1000 non-null int64
hhsize_a    999 non-null float64
hhsize_b    999 non-null float64
hhsize_c    998 non-null float64
hhadults    999 non-null float64
ph_totl     998 non-null float64
yob         1000 non-null int64
age         1000 non-null int64
numraces    991 non-null float64
dtypes: float64(7), int64(9)
memory usage: 125.1 KB






    





None






    







  
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
  
  
    
      caseid
      1000.0
      67918.285000
      31335.857247
      10011.0
      80093.5
      83635.0
      86935.5
      90998.0
    
    
      survid
      1000.0
      67918.285000
      31335.857247
      10011.0
      80093.5
      83635.0
      86935.5
      90998.0
    
    
      msa
      830.0
      4446.293976
      2542.086945
      80.0
      1960.0
      4920.0
      6280.0
      9320.0
    
    
      msc
      1000.0
      2.196000
      1.553060
      1.0
      1.0
      1.0
      3.0
      5.0
    
    
      censusr
      1000.0
      2.590000
      1.005441
      1.0
      2.0
      3.0
      3.0
      4.0
    
    
      censusd
      1000.0
      5.048000
      2.473017
      1.0
      3.0
      5.0
      7.0
      9.0
    
    
      cbsamsa
      1000.0
      1.880000
      1.429561
      1.0
      1.0
      1.0
      3.0
      5.0
    
    
      cbsamcsa
      1000.0
      4.636000
      1.147528
      1.0
      5.0
      5.0
      5.0
      5.0
    
    
      hhsize_a
      999.0
      0.468468
      0.760206
      0.0
      0.0
      0.0
      1.0
      3.0
    
    
      hhsize_b
      999.0
      1.866867
      1.329974
      0.0
      1.0
      2.0
      2.0
      10.0
    
    
      hhsize_c
      998.0
      0.630261
      1.090014
      0.0
      0.0
      0.0
      1.0
      7.0
    
    
      hhadults
      999.0
      2.335335
      1.207302
      1.0
      2.0
      2.0
      3.0
      11.0
    
    
      ph_totl
      998.0
      2.641283
      1.480593
      1.0
      2.0
      2.0
      3.0
      10.0
    
    
      yob
      1000.0
      1968.829000
      18.057459
      1922.0
      1955.0
      1969.0
      1984.0
      1999.0
    
    
      age
      1000.0
      48.171000
      18.057459
      18.0
      33.0
      48.0
      62.0
      95.0
    
    
      numraces
      991.0
      1.109990
      0.374720
      1.0
      1.0
      1.0
      1.0
      4.0

Distributions



In [14]:

    
def plot_histograms(df):

    col_names = list(df.columns)

    n_cols = np.ceil(np.sqrt(len(col_names)))
    n_rows = np.ceil(np.sqrt(len(col_names)))

    #plt.figure(figsize=(3*n_cols,3*n_rows))
    fig, ax = plt.subplots(figsize=(3*n_cols,3*n_rows))
    
    for i in range(0,len(col_names)):
        plt.subplot(n_rows + 1,n_cols,i+1)
        try:
            g = sns.distplot(df[col_names[i]].dropna(),kde=True) 
            g.set(xticklabels=[])
            g.set(yticklabels=[])
        except:
            pass
        
    plt.tight_layout()



In [15]:

    
plot_histograms(df[number_col_names])









    



/Users/eric/proj/virt/data-project/lib/python3.6/site-packages/scipy/stats/stats.py:1706: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Box Plots



In [16]:

    
def plot_boxes(df):

    col_names = list(df.columns)

    n_cols = np.ceil(np.sqrt(len(col_names)))
    n_rows = np.ceil(np.sqrt(len(col_names)))

    #plt.figure(figsize=(2*n_cols,3*n_rows))
    fig, ax = plt.subplots(figsize=(2*n_cols,5*n_rows))
    
    for i in range(0,len(col_names)):
        plt.subplot(n_rows + 1,n_cols,i+1)
        try:
            g = sns.boxplot(df[col_names[i]].dropna(),orient='v') 
        except:
            pass
        
    plt.tight_layout()



In [17]:

    
plot_boxes(df[number_col_names])



In [18]:

    
## Correlations



In [19]:

    
cm = df[number_col_names].corr()

mask = np.zeros_like(cm, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

plt.figure(figsize=(.5*len(number_col_names),.5*len(number_col_names)))
sns.heatmap(cm, mask=mask, cmap = 'viridis')









    Out[19]:





<matplotlib.axes._subplots.AxesSubplot at 0x124a34f60>

Other Columns



In [20]:

    
if other_col_names:
    display(df[other_col_names].info())
    display(df[other_col_names].describe().T)









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 95 columns):
timezone      1000 non-null object
state         1000 non-null object
ahq1          1000 non-null object
ahq2          998 non-null object
ahq3          999 non-null object
kenq1         998 non-null object
kenq2         996 non-null object
kenq3         997 non-null object
js_version    1000 non-null object
jsq1          989 non-null object
lcq1          983 non-null object
skq1          1000 non-null object
skq2_a        1000 non-null object
skq2_b        1000 non-null object
skq3          987 non-null object
jbq1          998 non-null object
jbq2          999 non-null object
jbq3          450 non-null object
rvq1          999 non-null object
rvq2          999 non-null object
rvq4          998 non-null object
rvq6          356 non-null object
jaq1          998 non-null object
rq_seq_i      1000 non-null object
jaq2_rq       995 non-null object
jaq3_rq       993 non-null object
pdq1          986 non-null object
pdq2          993 non-null object
pdq3          1000 non-null object
krq2          487 non-null object
mmq1          1000 non-null object
mmq2          998 non-null object
mmq3          999 non-null object
mmq4          999 non-null object
riq1          979 non-null object
riq2          983 non-null object
riq3          996 non-null object
riq4          982 non-null object
szq1          997 non-null object
szq2          998 non-null object
szq3          997 non-null object
szq4          979 non-null object
szq5          997 non-null object
ebq1          1000 non-null object
ebq2          1000 non-null object
ebq3          999 non-null object
drugq1        996 non-null object
mfq1          1000 non-null object
mfq2          783 non-null object
ssq1          999 non-null object
kwq1_a        1000 non-null object
kwq1_b        999 non-null object
kwq2          393 non-null object
skq4          999 non-null object
skq5          1000 non-null object
skq6          983 non-null object
eycq1         999 non-null object
eycq2         450 non-null object
eycq3         450 non-null object
eycq4         976 non-null object
lvq1_a        999 non-null object
lvq1_b        999 non-null object
mjq1          997 non-null object
idq1          997 non-null object
idq2          998 non-null object
idq3          969 non-null object
idq4          1000 non-null object
idq5          997 non-null object
employ        1000 non-null object
jbtype        652 non-null object
selfempl      651 non-null object
lkwork        978 non-null object
lvq2_a        650 non-null object
lvq2_b        651 non-null object
ph_cell       999 non-null object
borninus      1000 non-null object
married       999 non-null object
ideo          992 non-null object
party         996 non-null object
educ          999 non-null object
ownrent       998 non-null object
hisp          999 non-null object
race_a        993 non-null object
race_b        993 non-null object
race_c        993 non-null object
race_d        993 non-null object
race_e        991 non-null object
relig         996 non-null object
church        996 non-null object
hhince        502 non-null object
hhinc50k      485 non-null object
hhincu        99 non-null object
hhinco        142 non-null object
hhinc         986 non-null object
gender        1000 non-null object
dtypes: object(95)
memory usage: 742.3+ KB






    





None






    







  
    
      
      count
      unique
      top
      freq
    
  
  
    
      timezone
      1000
      4
      E
      473
    
    
      state
      1000
      48
      CA
      98
    
    
      ahq1
      1000
      5
      Strongly agree
      462
    
    
      ahq2
      998
      5
      Somewhat agree
      258
    
    
      ahq3
      999
      5
      Strongly agree
      450
    
    
      kenq1
      998
      5
      Strongly disagree
      311
    
    
      kenq2
      996
      5
      Strongly disagree
      265
    
    
      kenq3
      997
      5
      Strongly agree
      332
    
    
      js_version
      1000
      2
      Provided no notice
      511
    
    
      jsq1
      989
      4
      Some of the time
      615
    
    
      lcq1
      983
      3
      Equal parts threat and opportunity
      600
    
    
      skq1
      1000
      2
      No
      945
    
    
      skq2_a
      1000
      2
      No
      815
    
    
      skq2_b
      1000
      2
      Yes
      604
    
    
      skq3
      987
      3
      No
      653
    
    
      jbq1
      998
      4
      Talking to your current doctor or another heal...
      585
    
    
      jbq2
      999
      2
      No
      546
    
    
      jbq3
      450
      4
      Somewhat helpful
      237
    
    
      rvq1
      999
      7
      1-2 times
      363
    
    
      rvq2
      999
      5
      Very positive
      518
    
    
      rvq4
      998
      3
      Yes
      357
    
    
      rvq6
      356
      6
      Very positive
      131
    
    
      jaq1
      998
      3
      Only with parental permission
      602
    
    
      rq_seq_i
      1000
      2
      Question order JAq2_rq, JAq3_rq
      532
    
    
      jaq2_rq
      995
      3
      Always
      733
    
    
      jaq3_rq
      993
      2
      Yes
      830
    
    
      pdq1
      986
      2
      Mostly for health reasons
      498
    
    
      pdq2
      993
      2
      No
      604
    
    
      pdq3
      1000
      3
      Yes, but only for health benefits
      729
    
    
      krq2
      487
      2
      Yes
      255
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      idq3
      969
      6
      Hillary Clinton
      322
    
    
      idq4
      1000
      5
      Strongly agree
      312
    
    
      idq5
      997
      5
      Disagree
      239
    
    
      employ
      1000
      5
      Yes
      652
    
    
      jbtype
      652
      5
      Full-time, all year round
      519
    
    
      selfempl
      651
      2
      No
      546
    
    
      lkwork
      978
      2
      No
      800
    
    
      lvq2_a
      650
      2
      Yes
      402
    
    
      lvq2_b
      651
      2
      No
      381
    
    
      ph_cell
      999
      3
      Cell
      764
    
    
      borninus
      1000
      2
      Yes
      898
    
    
      married
      999
      6
      Married
      499
    
    
      ideo
      992
      7
      Moderate or middle of the road
      363
    
    
      party
      996
      8
      Independent (close to Neither)
      245
    
    
      educ
      999
      7
      College graduate (BS, BA, or other 4-year degree)
      266
    
    
      ownrent
      998
      3
      Own
      629
    
    
      hisp
      999
      2
      No
      872
    
    
      race_a
      993
      2
      Yes
      815
    
    
      race_b
      993
      2
      No
      849
    
    
      race_c
      993
      2
      No
      918
    
    
      race_d
      993
      2
      No
      952
    
    
      race_e
      991
      2
      No
      966
    
    
      relig
      996
      7
      Protestant
      474
    
    
      church
      996
      6
      Once a week
      232
    
    
      hhince
      502
      95
      50000.0
      38
    
    
      hhinc50k
      485
      2
      $50,000 or over
      283
    
    
      hhincu
      99
      5
      20 to under $30,000
      26
    
    
      hhinco
      142
      4
      50 to under $75,000
      45
    
    
      hhinc
      986
      9
      50 to under $75,000
      290
    
    
      gender
      1000
      2
      Female
      502
    
  

95 rows × 4 columns

Nulls



In [21]:

    
cols = dt_col_names + number_col_names + other_col_names

fig, ax = plt.subplots(figsize=(15,.5*len(cols)))
sns.heatmap(df[cols].isnull().T,cbar=False,xticklabels=False,cmap = 'viridis', ax=ax )









    Out[21]:





<matplotlib.axes._subplots.AxesSubplot at 0x124cba6a0>

Header	Type	Description
caseid	integer	Case identification number (assigned by SRI)
survid	integer	Case identification number (assigned by SRI)
timezone	string	Time zone (provided by MSG)
state	string	State (provided by MSG)
msa	string	Metropolitan Statistical Area (provided by MSG)
msc	integer	Metropolitan Status Code (provided by MSG)
censusr	integer	Census Region (provided by MSG)
censusd	integer	Census Division (provided by MSG)
cbsamsa	integer	CBSA MSA Met Status Code (provided by MSG)
cbsamcsa	integer	CBSA MCSA Met Status Code (provided by MSG)
ahq1	string	College degrees for prisoners
ahq2	string	State funded college for prisoners
ahq3	string	Prisoners should repay education costs
kenq1	string	Country needs strong leader
kenq2	string	Courts get in way of leaders
kenq3	string	Media get in way of leaders
js_version	string	Randomized text within JSq1
jsq1	string	Contraceptive policy under Trump
lcq1	string	China rise a threat or opportunity
skq1	string	Mens rights participant
skq2_a	string	Red pill - Heard of term
skq2_b	string	Alt right - Heard of term
skq3	string	Men should be alpha
jbq1	string	Resource to find new surgeon
jbq2	string	Used internet to find physician
jbq3	string	Helpfulness of internet w/ finding physician
rvq1	string	# of healthcare visits in the past year
rvq2	string	Rate customer service last healthcare visit
rvq3	string	Most urgent healthcare issue
rvq4	string	Employer offers wellness program
rvq5	string	Describe workplace wellness program
rvq6	string	Rate workplace wellness program
jaq1	string	Teen access to patient portal
rq_seq_i	string	JAq2, JAq3 - Sequence Index Variable
jaq2_rq	string	Parent access teen medical record
jaq3_rq	string	Sensitive issue avoidance if parent views record
pdq1	string	Cosmetic or health reasons - Bariatric surgery
pdq2	string	Quick fix - Bariatric surgery
pdq3	string	Should insurance cover - Bariatric surgery
krq2	string	Canine clinical study interest
mmq1	string	Aware of bee health concern
mmq2	string	Personal concern about bee health
mmq3	string	Produce protecting bees - Pay more
mmq4	string	Organic food - Pay more
riq1	string	What do antibiotics kill
riq2	string	Cow antibiotics threaten human health
riq3	string	Milk without antibiotics - Pay more
riq4	string	Cow treatment on conventional/organic farms
szq1	string	Attractive natural sights - Neighborhood
szq2	string	People look out for each other - Neighborhood
szq3	string	Mental health a priority - Community
szq4	string	Sufficient mental health services - Community
szq5	string	Physical environment impacts mental health
ebq1	string	Worry about crime - Workplace
ebq2	string	Worry about crime - Neighborhood
ebq3	string	Recidivism due to criminal record
drugq1	string	Marijuana legalization
mfq1	string	Rate distraction at work
mfq2	string	Work from home frequency
ssq1	string	Homework completion rate in high school
kwq1_a	string	Work harassment - Experienced
kwq1_b	string	Domestic violence - Experienced
kwq2	string	Experience w/ violence impacted work
skq4	string	Harder for men to be successful
skq5	string	Men should protect women
skq6	string	Feminism good or bad for America
eycq1	string	Asked to do favor outside of work
eycq2	string	Who asked for favor outside of work
eycq3	string	Time spent doing favor outside of work
eycq4	string	Rules about doing favors outside of work
lvq1_a	string	Restless - Past 30 days
lvq1_b	string	Everything an effort - Past 30 days
mjq1	string	Undocumented farmworkers community impact
idq1	string	Occupational category
idq2	string	Labor union member
idq3	string	Voted in 2016 presidential election
idq4	string	Elected officials represent the rich
idq5	string	Minorities get government advantages
employ	string	Employed
jbtype	string	Main job type
selfempl	string	Self-employed
lkwork	string	Looking for new work
lvq2_a	string	Performance raise - Eligible at work
lvq2_b	string	Performance bonus - Eligible at work
hhsize_a	integer	# adults 65+ in household
hhsize_b	integer	# adults 18-64 in household
hhsize_c	integer	# children in household
hhadults	integer
ph_totl	integer	# phones for household
ph_cell	string	Cell/Landline for survey
yob	integer	Year born
age	integer	Age (computed from yob)
borninus	string	Born in US
married	string	Marital status
ideo	string	Social ideology
party	string	Political party
educ	string	Education level
ownrent	string	Home ownership status
hisp	string	Hispanic or Latino
race_a	string	Caucasian - Race
race_b	string	African-American - Race
race_c	string	Native American - Race
race_d	string	Asian - Race
race_e	string	Other - Race
numraces	integer
relig	string	Religious affiliation
church	string	How often attend religious services
hhince	string	Exact household income 2016
hhinc50k	string	Over/Under $50k - Household income 2016
hhincu	string	Range under $50k - Household income 2016
hhinco	string	Range over $50k - Household income 2016
hhinc	string	Household income 2016 - Coded value
gender	string	Gender

	caseid	survid	timezone	state	msa	msc	censusr	censusd	cbsamsa	cbsamcsa	...	race_e	numraces	relig	church	hhince	hhinc50k	hhincu	hhinco	hhinc	gender
0	80007	80007	C	TX	2920.0	1	3	7	3	5	...	No	1.0	No religion / Atheist / Agnostic	A few times a year	NaN	$50,000 or over	NaN	$150,000 or more	$150,000 or more	Male
1	80027	80027	C	AL	NaN	5	3	6	5	1	...	No	1.0	Protestant	A few times a year	NaN	$50,000 or over	NaN	50 to under $75,000	50 to under $75,000	Female
2	80029	80029	C	LA	5560.0	3	3	7	3	5	...	No	1.0	Catholic	A few times a year	196000.0	NaN	NaN	NaN	$150,000 or more	Male
3	80037	80037	C	IN	NaN	5	2	3	5	1	...	No	1.0	No religion / Atheist / Agnostic	Never	75000.0	NaN	NaN	NaN	75 to under $100,000	Male
4	80041	80041	C	MO	3760.0	1	2	4	1	5	...	No	1.0	No religion / Atheist / Agnostic	Never	60000.0	NaN	NaN	NaN	50 to under $75,000	Female

	count	mean	std	min	25%	50%	75%	max
caseid	1000.0	67918.285000	31335.857247	10011.0	80093.5	83635.0	86935.5	90998.0
survid	1000.0	67918.285000	31335.857247	10011.0	80093.5	83635.0	86935.5	90998.0
msa	830.0	4446.293976	2542.086945	80.0	1960.0	4920.0	6280.0	9320.0
msc	1000.0	2.196000	1.553060	1.0	1.0	1.0	3.0	5.0
censusr	1000.0	2.590000	1.005441	1.0	2.0	3.0	3.0	4.0
censusd	1000.0	5.048000	2.473017	1.0	3.0	5.0	7.0	9.0
cbsamsa	1000.0	1.880000	1.429561	1.0	1.0	1.0	3.0	5.0
cbsamcsa	1000.0	4.636000	1.147528	1.0	5.0	5.0	5.0	5.0
hhsize_a	999.0	0.468468	0.760206	0.0	0.0	0.0	1.0	3.0
hhsize_b	999.0	1.866867	1.329974	0.0	1.0	2.0	2.0	10.0
hhsize_c	998.0	0.630261	1.090014	0.0	0.0	0.0	1.0	7.0
hhadults	999.0	2.335335	1.207302	1.0	2.0	2.0	3.0	11.0
ph_totl	998.0	2.641283	1.480593	1.0	2.0	2.0	3.0	10.0
yob	1000.0	1968.829000	18.057459	1922.0	1955.0	1969.0	1984.0	1999.0
age	1000.0	48.171000	18.057459	18.0	33.0	48.0	62.0	95.0
numraces	991.0	1.109990	0.374720	1.0	1.0	1.0	1.0	4.0

	count	unique	top	freq
timezone	1000	4	E	473
state	1000	48	CA	98
ahq1	1000	5	Strongly agree	462
ahq2	998	5	Somewhat agree	258
ahq3	999	5	Strongly agree	450
kenq1	998	5	Strongly disagree	311
kenq2	996	5	Strongly disagree	265
kenq3	997	5	Strongly agree	332
js_version	1000	2	Provided no notice	511
jsq1	989	4	Some of the time	615
lcq1	983	3	Equal parts threat and opportunity	600
skq1	1000	2	No	945
skq2_a	1000	2	No	815
skq2_b	1000	2	Yes	604
skq3	987	3	No	653
jbq1	998	4	Talking to your current doctor or another heal...	585
jbq2	999	2	No	546
jbq3	450	4	Somewhat helpful	237
rvq1	999	7	1-2 times	363
rvq2	999	5	Very positive	518
rvq4	998	3	Yes	357
rvq6	356	6	Very positive	131
jaq1	998	3	Only with parental permission	602
rq_seq_i	1000	2	Question order JAq2_rq, JAq3_rq	532
jaq2_rq	995	3	Always	733
jaq3_rq	993	2	Yes	830
pdq1	986	2	Mostly for health reasons	498
pdq2	993	2	No	604
pdq3	1000	3	Yes, but only for health benefits	729
krq2	487	2	Yes	255
...	...	...	...	...
idq3	969	6	Hillary Clinton	322
idq4	1000	5	Strongly agree	312
idq5	997	5	Disagree	239
employ	1000	5	Yes	652
jbtype	652	5	Full-time, all year round	519
selfempl	651	2	No	546
lkwork	978	2	No	800
lvq2_a	650	2	Yes	402
lvq2_b	651	2	No	381
ph_cell	999	3	Cell	764
borninus	1000	2	Yes	898
married	999	6	Married	499
ideo	992	7	Moderate or middle of the road	363
party	996	8	Independent (close to Neither)	245
educ	999	7	College graduate (BS, BA, or other 4-year degree)	266
ownrent	998	3	Own	629
hisp	999	2	No	872
race_a	993	2	Yes	815
race_b	993	2	No	849
race_c	993	2	No	918
race_d	993	2	No	952
race_e	991	2	No	966
relig	996	7	Protestant	474
church	996	6	Once a week	232
hhince	502	95	50000.0	38
hhinc50k	485	2	$50,000 or over	283
hhincu	99	5	20 to under $30,000	26
hhinco	142	4	50 to under $75,000	45
hhinc	986	9	50 to under $75,000	290
gender	1000	2	Female	502