notebook.community

Edit and run



In [1]:

    
import matplotlib.pyplot as plt 
import metapack as mp
import pandas as pd
import numpy as np
import seaborn as sns
sns.set(color_codes=True)
%matplotlib inline



In [2]:

    
%load_ext metapack.jupyter.magic



In [3]:

    
pkg = mp.jupyter.open_source_package()
pkg









    Out[3]:




Cornell National Social Survey (CNSS), 2017
Cornell National Social Survey is a random-sample survey of adults aged 18 and over. In 2017, participants were asked their opinions on a range of topics.
cornell.edu-cnss-2017-1 from metapack+file:///Users/eric/proj/virt-proj/data-project/sdrdl-data-projects/cornell-social-survey/cornell.edu-cnss-2017/metadata.csv

To use this package, you will have to download the research file manually,
placing the file in the data directory. Cornell requires you to accept terms
and conditions before downloading the file, so we can't redistribute it.
I used Stat Transfer to convert the SPSS file to Stata, but there is one
variable, KRq1, "Number of Dogs Owned", that StatTransfer converts with
duplicate labels.
Documentation

Codebook None
Results and Citation Guidelines None

Contacts
Wrangler: Eric Busboom Civic Knowledge
Resources

cnss_2017 CNSS 2017, with categorical values



References

cnss_2017_dta Cornell National Social Survey 2017, in converted Stata forma



In [5]:

    
fp = pkg.reference('cnss_2017_dta').resolved_url.fspath



In [8]:

    
# The variable KRq1, Number of Dogs, has a duplicate label problem when the file is converted with StatTransfer; 
# StatTransfer seperates out the number from the text "dogs", and pandas
# interprets this as multiple labels with the value "dogs". The easiest way to
# handle this is to remove the variable. 

itr = pd.read_stata(fp, iterator=True)

columns = list(itr.varlist)
columns.remove('KRq1')

df = pd.read_stata(fp, columns = columns)
labels = dict(itr.variable_labels()) # Store variable labels as a dict
df.head()









    Out[8]:







  
    
      
      caseid
      survid
      timezone
      state
      msa
      msc
      censusr
      censusd
      cbsamsa
      cbsamcsa
      ...
      RACE_E
      numraces
      relig
      church
      hhince
      hhinc50k
      hhincu
      hhinco
      hhinc
      gender
    
  
  
    
      0
      80007
      80007
      C
      TX
      2920
      1
      3
      7
      3
      5
      ...
      No
      1.0
      No religion / Atheist / Agnostic
      A few times a year
      NaN
      $50,000 or over
      NaN
      $150,000 or more
      $150,000 or more
      Male
    
    
      1
      80027
      80027
      C
      AL
      
      5
      3
      6
      5
      1
      ...
      No
      1.0
      Protestant
      A few times a year
      NaN
      $50,000 or over
      NaN
      50 to under $75,000
      50 to under $75,000
      Female
    
    
      2
      80029
      80029
      C
      LA
      5560
      3
      3
      7
      3
      5
      ...
      No
      1.0
      Catholic
      A few times a year
      196000
      NaN
      NaN
      NaN
      $150,000 or more
      Male
    
    
      3
      80037
      80037
      C
      IN
      
      5
      2
      3
      5
      1
      ...
      No
      1.0
      No religion / Atheist / Agnostic
      Never
      75000
      NaN
      NaN
      NaN
      75 to under $100,000
      Male
    
    
      4
      80041
      80041
      C
      MO
      3760
      1
      2
      4
      1
      5
      ...
      No
      1.0
      No religion / Atheist / Agnostic
      Never
      60000
      NaN
      NaN
      NaN
      50 to under $75,000
      Female
    
  

5 rows × 113 columns



In [25]:

    
df.church.cat.codes









    Out[25]:





0      3
1      3
2      3
3      5
4      5
5      3
6      5
7      5
8      5
9      0
10     3
11     4
12     3
13     3
14     2
15     5
16     4
17     1
18     0
19     4
20     2
21     3
22     1
23     1
24     1
25     3
26     3
27     2
28     3
29     2
      ..
970    1
971    4
972    1
973    0
974   -1
975    5
976    1
977    1
978    3
979    2
980    4
981    2
982    5
983    5
984    2
985    1
986    3
987    1
988    1
989    1
990    4
991    2
992    1
993    1
994    2
995    1
996    1
997    2
998    5
999    3
Length: 1000, dtype: int8



In [27]:

    
df.loc[974].church









    Out[27]:





nan



In [9]:

    
print('\n'.join( '{}\t{}'.format(k,v) for k,v in labels.items() if k != 'KRq1')  )









    



caseid	Case identification number (assigned by SRI)
survid	Case identification number (assigned by SRI)
timezone	Time zone (provided by MSG)
state	State (provided by MSG)
msa	Metropolitan Statistical Area (provided by MSG)
msc	Metropolitan Status Code (provided by MSG)
censusr	Census Region (provided by MSG)
censusd	Census Division (provided by MSG)
cbsamsa	CBSA MSA Met Status Code (provided by MSG)
cbsamcsa	CBSA MCSA Met Status Code (provided by MSG)
AHq1	College degrees for prisoners
AHq2	State funded college for prisoners
AHq3	Prisoners should repay education costs
KENq1	Country needs strong leader
KENq2	Courts get in way of leaders
KENq3	Media get in way of leaders
JS_version	Randomized text within JSq1
JSq1	Contraceptive policy under Trump
LCq1	China rise a threat or opportunity
SKq1	Mens rights participant
SKq2_a	Red pill - Heard of term
SKq2_b	Alt right - Heard of term
SKq3	Men should be alpha
JBq1	Resource to find new surgeon
JBq2	Used internet to find physician
JBq3	Helpfulness of internet w/ finding physician
RVq1	# of healthcare visits in the past year
RVq2	Rate customer service last healthcare visit
RVq3	Most urgent healthcare issue
RVq4	Employer offers wellness program
RVq5	Describe workplace wellness program
RVq6	Rate workplace wellness program
JAq1	Teen access to patient portal
rq_seq_i	JAq2, JAq3 - Sequence Index Variable
JAq2_rq	Parent access teen medical record
JAq3_rq	Sensitive issue avoidance if parent views record
PDq1	Cosmetic or health reasons - Bariatric surgery
PDq2	Quick fix - Bariatric surgery
PDq3	Should insurance cover - Bariatric surgery
KRq2	Canine clinical study interest
MMq1	Aware of bee health concern
MMq2	Personal concern about bee health
MMq3	Produce protecting bees - Pay more
MMq4	Organic food - Pay more
RIq1	What do antibiotics kill
RIq2	Cow antibiotics threaten human health
RIq3	Milk without antibiotics - Pay more
RIq4	Cow treatment on conventional/organic farms
SZq1	Attractive natural sights - Neighborhood
SZq2	People look out for each other - Neighborhood
SZq3	Mental health a priority - Community
SZq4	Sufficient mental health services - Community
SZq5	Physical environment impacts mental health
EBq1	Worry about crime - Workplace
EBq2	Worry about crime - Neighborhood
EBq3	Recidivism due to criminal record
DRUGq1	Marijuana legalization
MFq1	Rate distraction at work
MFq2	Work from home frequency
SSq1	Homework completion rate in high school
KWq1_a	Work harassment - Experienced
KWq1_b	Domestic violence - Experienced
KWq2	Experience w/ violence impacted work
SKq4	Harder for men to be successful
SKq5	Men should protect women
SKq6	Feminism good or bad for America
EYCq1	Asked to do favor outside of work
EYCq2	Who asked for favor outside of work
EYCq3	Time spent doing favor outside of work
EYCq4	Rules about doing favors outside of work
LVq1_a	Restless - Past 30 days
LVq1_b	Everything an effort - Past 30 days
MJq1	Undocumented farmworkers community impact
IDq1	Occupational category
IDq2	Labor union member
IDq3	Voted in 2016 presidential election
IDq4	Elected officials represent the rich
IDq5	Minorities get government advantages
employ	Employed
jbtype	Main job type
selfempl	Self-employed
lkwork	Looking for new work
LVq2_a	Performance raise - Eligible at work
LVq2_b	Performance bonus - Eligible at work
HHSIZE_A	# adults 65+ in household
HHSIZE_B	# adults 18-64 in household
HHSIZE_C	# children in household
hhadults	
ph_totl	# phones for household
ph_cell	Cell/Landline for survey
yob	Year born
age	Age (computed from yob)
borninus	Born in US
married	Marital status
ideo	Social ideology
party	Political party
educ	Education level
ownrent	Home ownership status
hisp	Hispanic or Latino
RACE_A	Caucasian - Race
RACE_B	African-American - Race
RACE_C	Native American - Race
RACE_D	Asian - Race
RACE_E	Other - Race
numraces	
relig	Religious affiliation
church	How often attend religious services
hhince	Exact household income 2016
hhinc50k	Over/Under $50k - Household income 2016
hhincu	Range under $50k - Household income 2016
hhinco	Range over $50k - Household income 2016
hhinc	Household income 2016 - Coded value
gender	Gender



In [ ]:

	caseid	survid	timezone	state	msa	msc	censusr	censusd	cbsamsa	cbsamcsa	...	RACE_E	numraces	relig	church	hhince	hhinc50k	hhincu	hhinco	hhinc	gender
0	80007	80007	C	TX	2920	1	3	7	3	5	...	No	1.0	No religion / Atheist / Agnostic	A few times a year	NaN	$50,000 or over	NaN	$150,000 or more	$150,000 or more	Male
1	80027	80027	C	AL		5	3	6	5	1	...	No	1.0	Protestant	A few times a year	NaN	$50,000 or over	NaN	50 to under $75,000	50 to under $75,000	Female
2	80029	80029	C	LA	5560	3	3	7	3	5	...	No	1.0	Catholic	A few times a year	196000	NaN	NaN	NaN	$150,000 or more	Male
3	80037	80037	C	IN		5	2	3	5	1	...	No	1.0	No religion / Atheist / Agnostic	Never	75000	NaN	NaN	NaN	75 to under $100,000	Male
4	80041	80041	C	MO	3760	1	2	4	1	5	...	No	1.0	No religion / Atheist / Agnostic	Never	60000	NaN	NaN	NaN	50 to under $75,000	Female