In [1]:
import matplotlib.pyplot as plt 
import metapack as mp
import pandas as pd
import numpy as np
import seaborn as sns
sns.set(color_codes=True)
%matplotlib inline

In [2]:
%load_ext metapack.jupyter.magic

In [3]:
pkg = mp.jupyter.open_source_package()
pkg


Out[3]:

Cornell National Social Survey (CNSS), 2017

Cornell National Social Survey is a random-sample survey of adults aged 18 and over. In 2017, participants were asked their opinions on a range of topics.

cornell.edu-cnss-2017-1 from metapack+file:///Users/eric/proj/virt-proj/data-project/sdrdl-data-projects/cornell-social-survey/cornell.edu-cnss-2017/metadata.csv

To use this package, you will have to download the research file manually, placing the file in the data directory. Cornell requires you to accept terms and conditions before downloading the file, so we can't redistribute it.

I used Stat Transfer to convert the SPSS file to Stata, but there is one variable, KRq1, "Number of Dogs Owned", that StatTransfer converts with duplicate labels.

Documentation

Contacts

Wrangler: Eric Busboom Civic Knowledge

Resources

  1. cnss_2017 CNSS 2017, with categorical values

References

  1. cnss_2017_dta Cornell National Social Survey 2017, in converted Stata forma


In [5]:
fp = pkg.reference('cnss_2017_dta').resolved_url.fspath

In [8]:
# The variable KRq1, Number of Dogs, has a duplicate label problem when the file is converted with StatTransfer; 
# StatTransfer seperates out the number from the text "dogs", and pandas
# interprets this as multiple labels with the value "dogs". The easiest way to
# handle this is to remove the variable. 

itr = pd.read_stata(fp, iterator=True)

columns = list(itr.varlist)
columns.remove('KRq1')

df = pd.read_stata(fp, columns = columns)
labels = dict(itr.variable_labels()) # Store variable labels as a dict
df.head()


Out[8]:
caseid survid timezone state msa msc censusr censusd cbsamsa cbsamcsa ... RACE_E numraces relig church hhince hhinc50k hhincu hhinco hhinc gender
0 80007 80007 C TX 2920 1 3 7 3 5 ... No 1.0 No religion / Atheist / Agnostic A few times a year NaN $50,000 or over NaN $150,000 or more $150,000 or more Male
1 80027 80027 C AL 5 3 6 5 1 ... No 1.0 Protestant A few times a year NaN $50,000 or over NaN 50 to under $75,000 50 to under $75,000 Female
2 80029 80029 C LA 5560 3 3 7 3 5 ... No 1.0 Catholic A few times a year 196000 NaN NaN NaN $150,000 or more Male
3 80037 80037 C IN 5 2 3 5 1 ... No 1.0 No religion / Atheist / Agnostic Never 75000 NaN NaN NaN 75 to under $100,000 Male
4 80041 80041 C MO 3760 1 2 4 1 5 ... No 1.0 No religion / Atheist / Agnostic Never 60000 NaN NaN NaN 50 to under $75,000 Female

5 rows × 113 columns


In [25]:
df.church.cat.codes


Out[25]:
0      3
1      3
2      3
3      5
4      5
5      3
6      5
7      5
8      5
9      0
10     3
11     4
12     3
13     3
14     2
15     5
16     4
17     1
18     0
19     4
20     2
21     3
22     1
23     1
24     1
25     3
26     3
27     2
28     3
29     2
      ..
970    1
971    4
972    1
973    0
974   -1
975    5
976    1
977    1
978    3
979    2
980    4
981    2
982    5
983    5
984    2
985    1
986    3
987    1
988    1
989    1
990    4
991    2
992    1
993    1
994    2
995    1
996    1
997    2
998    5
999    3
Length: 1000, dtype: int8

In [27]:
df.loc[974].church


Out[27]:
nan

In [9]:
print('\n'.join( '{}\t{}'.format(k,v) for k,v in labels.items() if k != 'KRq1')  )


caseid	Case identification number (assigned by SRI)
survid	Case identification number (assigned by SRI)
timezone	Time zone (provided by MSG)
state	State (provided by MSG)
msa	Metropolitan Statistical Area (provided by MSG)
msc	Metropolitan Status Code (provided by MSG)
censusr	Census Region (provided by MSG)
censusd	Census Division (provided by MSG)
cbsamsa	CBSA MSA Met Status Code (provided by MSG)
cbsamcsa	CBSA MCSA Met Status Code (provided by MSG)
AHq1	College degrees for prisoners
AHq2	State funded college for prisoners
AHq3	Prisoners should repay education costs
KENq1	Country needs strong leader
KENq2	Courts get in way of leaders
KENq3	Media get in way of leaders
JS_version	Randomized text within JSq1
JSq1	Contraceptive policy under Trump
LCq1	China rise a threat or opportunity
SKq1	Mens rights participant
SKq2_a	Red pill - Heard of term
SKq2_b	Alt right - Heard of term
SKq3	Men should be alpha
JBq1	Resource to find new surgeon
JBq2	Used internet to find physician
JBq3	Helpfulness of internet w/ finding physician
RVq1	# of healthcare visits in the past year
RVq2	Rate customer service last healthcare visit
RVq3	Most urgent healthcare issue
RVq4	Employer offers wellness program
RVq5	Describe workplace wellness program
RVq6	Rate workplace wellness program
JAq1	Teen access to patient portal
rq_seq_i	JAq2, JAq3 - Sequence Index Variable
JAq2_rq	Parent access teen medical record
JAq3_rq	Sensitive issue avoidance if parent views record
PDq1	Cosmetic or health reasons - Bariatric surgery
PDq2	Quick fix - Bariatric surgery
PDq3	Should insurance cover - Bariatric surgery
KRq2	Canine clinical study interest
MMq1	Aware of bee health concern
MMq2	Personal concern about bee health
MMq3	Produce protecting bees - Pay more
MMq4	Organic food - Pay more
RIq1	What do antibiotics kill
RIq2	Cow antibiotics threaten human health
RIq3	Milk without antibiotics - Pay more
RIq4	Cow treatment on conventional/organic farms
SZq1	Attractive natural sights - Neighborhood
SZq2	People look out for each other - Neighborhood
SZq3	Mental health a priority - Community
SZq4	Sufficient mental health services - Community
SZq5	Physical environment impacts mental health
EBq1	Worry about crime - Workplace
EBq2	Worry about crime - Neighborhood
EBq3	Recidivism due to criminal record
DRUGq1	Marijuana legalization
MFq1	Rate distraction at work
MFq2	Work from home frequency
SSq1	Homework completion rate in high school
KWq1_a	Work harassment - Experienced
KWq1_b	Domestic violence - Experienced
KWq2	Experience w/ violence impacted work
SKq4	Harder for men to be successful
SKq5	Men should protect women
SKq6	Feminism good or bad for America
EYCq1	Asked to do favor outside of work
EYCq2	Who asked for favor outside of work
EYCq3	Time spent doing favor outside of work
EYCq4	Rules about doing favors outside of work
LVq1_a	Restless - Past 30 days
LVq1_b	Everything an effort - Past 30 days
MJq1	Undocumented farmworkers community impact
IDq1	Occupational category
IDq2	Labor union member
IDq3	Voted in 2016 presidential election
IDq4	Elected officials represent the rich
IDq5	Minorities get government advantages
employ	Employed
jbtype	Main job type
selfempl	Self-employed
lkwork	Looking for new work
LVq2_a	Performance raise - Eligible at work
LVq2_b	Performance bonus - Eligible at work
HHSIZE_A	# adults 65+ in household
HHSIZE_B	# adults 18-64 in household
HHSIZE_C	# children in household
hhadults	
ph_totl	# phones for household
ph_cell	Cell/Landline for survey
yob	Year born
age	Age (computed from yob)
borninus	Born in US
married	Marital status
ideo	Social ideology
party	Political party
educ	Education level
ownrent	Home ownership status
hisp	Hispanic or Latino
RACE_A	Caucasian - Race
RACE_B	African-American - Race
RACE_C	Native American - Race
RACE_D	Asian - Race
RACE_E	Other - Race
numraces	
relig	Religious affiliation
church	How often attend religious services
hhince	Exact household income 2016
hhinc50k	Over/Under $50k - Household income 2016
hhincu	Range under $50k - Household income 2016
hhinco	Range over $50k - Household income 2016
hhinc	Household income 2016 - Coded value
gender	Gender

In [ ]: