Example usage for weightedcalcs

The example below uawa weightedcalcs to analyze a slice of the American Community Survey's 2015 data for Wyoming.


In [1]:
import weightedcalcs as wc
import pandas as pd

Load the ACS data into a pandas.DataFrame


In [2]:
responses = pd.read_csv("../data/acs-2015-pums-wy-simple.csv")

In [3]:
responses.head()


Out[3]:
SERIALNO PWGTP age gender marriage_status income
0 1990 148 67 Male Never married or under 15 years old 27000.0
1 2253 371 93 Female Widowed 0.0
2 2861 288 46 Female Divorced 44000.0
3 4537 58 59 Male Divorced 35000.0
4 4797 130 70 Male Married 0.0

In addition to the full list of responses, let's create a subset including only adult respondents, since we'll be focusing on income later.


In [4]:
adults = responses[responses["age"] >= 18]

In [5]:
adults.head()


Out[5]:
SERIALNO PWGTP age gender marriage_status income
0 1990 148 67 Male Never married or under 15 years old 27000.0
1 2253 371 93 Female Widowed 0.0
2 2861 288 46 Female Divorced 44000.0
3 4537 58 59 Male Divorced 35000.0
4 4797 130 70 Male Married 0.0

Create an instance of weightedcalcs.Calculator

The ACS' PWGTP variable is respondents the Census-assigned survey weight. All our weighted calculations will use this variable.


In [6]:
calc = wc.Calculator("PWGTP")

Basic weighted calculations

Weighted mean income


In [7]:
calc.mean(adults, "income").round()


Out[7]:
30709.0

Weighted standard deviation of income


In [8]:
calc.std(adults, "income").round()


Out[8]:
46093.0

Weighted median income


In [9]:
calc.median(adults, "income")


Out[9]:
18000.0

Weighted 75th percentile of income


In [10]:
calc.quantile(adults, "income", 0.75)


Out[10]:
45000.0

Weighted distribution of marriage statuses

~43% of Wyoming residents are married:


In [11]:
calc.distribution(responses, "marriage_status").round(3).sort_values(ascending=False)


Out[11]:
marriage_status
Married                                0.425
Never married or under 15 years old    0.421
Divorced                               0.097
Widowed                                0.046
Separated                              0.012
Name: PWGTP, dtype: float64

~56% of adult Wyoming residents are married:


In [12]:
calc.distribution(adults, "marriage_status").round(3).sort_values(ascending=False)


Out[12]:
marriage_status
Married                                0.557
Never married or under 15 years old    0.240
Divorced                               0.127
Widowed                                0.060
Separated                              0.016
Name: PWGTP, dtype: float64

Grouped weighted calculations

Below, we perform similar calculations as above, but now take advantage of the fact that weightedcalcs can handle DataFrameGroupBy objects. In the examples below, we group by the ACS's marriage status categories and gender.


In [13]:
grp_marriage_sex = adults.groupby(["marriage_status", "gender"])

For reference, here's how many responses fall into each category:


In [14]:
grp_marriage_sex.size().unstack()


Out[14]:
gender Female Male
marriage_status
Divorced 292 279
Married 1337 1337
Never married or under 15 years old 382 535
Separated 25 18
Widowed 232 75

Weighted mean income


In [15]:
calc.mean(grp_marriage_sex, "income").round().astype(int)


Out[15]:
gender Female Male
marriage_status
Divorced 27803 38884
Married 22592 50263
Never married or under 15 years old 15625 27531
Separated 15443 18553
Widowed 5890 15421

Weighted standard deviation of income


In [16]:
calc.std(grp_marriage_sex, "income").round()


Out[16]:
gender Female Male
marriage_status
Divorced 40039.0 40916.0
Married 33602.0 63959.0
Never married or under 15 years old 19885.0 34576.0
Separated 14822.0 25867.0
Widowed 17113.0 55463.0

Weighted median income


In [17]:
calc.median(grp_marriage_sex, "income")


Out[17]:
gender Female Male
marriage_status
Divorced 21000.0 29000.0
Married 11000.0 40200.0
Never married or under 15 years old 8300.0 16000.0
Separated 10000.0 0.0
Widowed 0.0 0.0

Weighted 75th percentile of income


In [18]:
calc.quantile(grp_marriage_sex, "income", 0.75)


Out[18]:
gender Female Male
marriage_status
Divorced 39000.0 65000.0
Married 35000.0 70000.0
Never married or under 15 years old 25000.0 38000.0
Separated 32400.0 30000.0
Widowed 0.0 0.0