Example usage for `weightedcalcs`

The example below uawa weightedcalcs to analyze a slice of the American Community Survey's 2015 data for Wyoming.



In [1]:

    
import weightedcalcs as wc
import pandas as pd

Load the ACS data into a `pandas.DataFrame`



In [2]:

    
responses = pd.read_csv("../data/acs-2015-pums-wy-simple.csv")



In [3]:

    
responses.head()









    Out[3]:







  
    
      
      SERIALNO
      PWGTP
      age
      gender
      marriage_status
      income
    
  
  
    
      0
      1990
      148
      67
      Male
      Never married or under 15 years old
      27000.0
    
    
      1
      2253
      371
      93
      Female
      Widowed
      0.0
    
    
      2
      2861
      288
      46
      Female
      Divorced
      44000.0
    
    
      3
      4537
      58
      59
      Male
      Divorced
      35000.0
    
    
      4
      4797
      130
      70
      Male
      Married
      0.0

In addition to the full list of responses, let's create a subset including only adult respondents, since we'll be focusing on income later.



In [4]:

    
adults = responses[responses["age"] >= 18]



In [5]:

    
adults.head()









    Out[5]:







  
    
      
      SERIALNO
      PWGTP
      age
      gender
      marriage_status
      income
    
  
  
    
      0
      1990
      148
      67
      Male
      Never married or under 15 years old
      27000.0
    
    
      1
      2253
      371
      93
      Female
      Widowed
      0.0
    
    
      2
      2861
      288
      46
      Female
      Divorced
      44000.0
    
    
      3
      4537
      58
      59
      Male
      Divorced
      35000.0
    
    
      4
      4797
      130
      70
      Male
      Married
      0.0

Create an instance of `weightedcalcs.Calculator`

The ACS' PWGTP variable is respondents the Census-assigned survey weight. All our weighted calculations will use this variable.



In [6]:

    
calc = wc.Calculator("PWGTP")

Basic weighted calculations

Weighted mean income



In [7]:

    
calc.mean(adults, "income").round()









    Out[7]:





30709.0

Weighted standard deviation of income



In [8]:

    
calc.std(adults, "income").round()









    Out[8]:





46093.0

Weighted median income



In [9]:

    
calc.median(adults, "income")









    Out[9]:





18000.0

Weighted 75th percentile of income



In [10]:

    
calc.quantile(adults, "income", 0.75)









    Out[10]:





45000.0

Weighted distribution of marriage statuses

~43% of Wyoming residents are married:



In [11]:

    
calc.distribution(responses, "marriage_status").round(3).sort_values(ascending=False)









    Out[11]:





marriage_status
Married                                0.425
Never married or under 15 years old    0.421
Divorced                               0.097
Widowed                                0.046
Separated                              0.012
Name: PWGTP, dtype: float64

~56% of adult Wyoming residents are married:



In [12]:

    
calc.distribution(adults, "marriage_status").round(3).sort_values(ascending=False)









    Out[12]:





marriage_status
Married                                0.557
Never married or under 15 years old    0.240
Divorced                               0.127
Widowed                                0.060
Separated                              0.016
Name: PWGTP, dtype: float64

Grouped weighted calculations

Below, we perform similar calculations as above, but now take advantage of the fact that weightedcalcs can handle DataFrameGroupBy objects. In the examples below, we group by the ACS's marriage status categories and gender.



In [13]:

    
grp_marriage_sex = adults.groupby(["marriage_status", "gender"])

For reference, here's how many responses fall into each category:



In [14]:

    
grp_marriage_sex.size().unstack()









    Out[14]:







  
    
      gender
      Female
      Male
    
    
      marriage_status
      
      
    
  
  
    
      Divorced
      292
      279
    
    
      Married
      1337
      1337
    
    
      Never married or under 15 years old
      382
      535
    
    
      Separated
      25
      18
    
    
      Widowed
      232
      75

Weighted mean income



In [15]:

    
calc.mean(grp_marriage_sex, "income").round().astype(int)









    Out[15]:







  
    
      gender
      Female
      Male
    
    
      marriage_status
      
      
    
  
  
    
      Divorced
      27803
      38884
    
    
      Married
      22592
      50263
    
    
      Never married or under 15 years old
      15625
      27531
    
    
      Separated
      15443
      18553
    
    
      Widowed
      5890
      15421

Weighted standard deviation of income



In [16]:

    
calc.std(grp_marriage_sex, "income").round()









    Out[16]:







  
    
      gender
      Female
      Male
    
    
      marriage_status
      
      
    
  
  
    
      Divorced
      40039.0
      40916.0
    
    
      Married
      33602.0
      63959.0
    
    
      Never married or under 15 years old
      19885.0
      34576.0
    
    
      Separated
      14822.0
      25867.0
    
    
      Widowed
      17113.0
      55463.0

Weighted median income



In [17]:

    
calc.median(grp_marriage_sex, "income")









    Out[17]:







  
    
      gender
      Female
      Male
    
    
      marriage_status
      
      
    
  
  
    
      Divorced
      21000.0
      29000.0
    
    
      Married
      11000.0
      40200.0
    
    
      Never married or under 15 years old
      8300.0
      16000.0
    
    
      Separated
      10000.0
      0.0
    
    
      Widowed
      0.0
      0.0

Weighted 75th percentile of income



In [18]:

    
calc.quantile(grp_marriage_sex, "income", 0.75)









    Out[18]:







  
    
      gender
      Female
      Male
    
    
      marriage_status
      
      
    
  
  
    
      Divorced
      39000.0
      65000.0
    
    
      Married
      35000.0
      70000.0
    
    
      Never married or under 15 years old
      25000.0
      38000.0
    
    
      Separated
      32400.0
      30000.0
    
    
      Widowed
      0.0
      0.0

	SERIALNO	PWGTP	age	gender	marriage_status	income
0	1990	148	67	Male	Never married or under 15 years old	27000.0
1	2253	371	93	Female	Widowed	0.0
2	2861	288	46	Female	Divorced	44000.0
3	4537	58	59	Male	Divorced	35000.0
4	4797	130	70	Male	Married	0.0

gender	Female	Male
marriage_status
Divorced	292	279
Married	1337	1337
Never married or under 15 years old	382	535
Separated	25	18
Widowed	232	75

gender	Female	Male
marriage_status
Divorced	27803	38884
Married	22592	50263
Never married or under 15 years old	15625	27531
Separated	15443	18553
Widowed	5890	15421

gender	Female	Male
marriage_status
Divorced	40039.0	40916.0
Married	33602.0	63959.0
Never married or under 15 years old	19885.0	34576.0
Separated	14822.0	25867.0
Widowed	17113.0	55463.0

gender	Female	Male
marriage_status
Divorced	21000.0	29000.0
Married	11000.0	40200.0
Never married or under 15 years old	8300.0	16000.0
Separated	10000.0	0.0
Widowed	0.0	0.0

gender	Female	Male
marriage_status
Divorced	39000.0	65000.0
Married	35000.0	70000.0
Never married or under 15 years old	25000.0	38000.0
Separated	32400.0	30000.0
Widowed	0.0	0.0

Example usage for weightedcalcs

Load the ACS data into a pandas.DataFrame

Create an instance of weightedcalcs.Calculator

Basic weighted calculations

Weighted mean income

Weighted standard deviation of income

Weighted median income

Weighted 75th percentile of income

Weighted distribution of marriage statuses

Grouped weighted calculations

Weighted mean income

Weighted standard deviation of income

Weighted median income

Weighted 75th percentile of income

Example usage for `weightedcalcs`

Load the ACS data into a `pandas.DataFrame`

Create an instance of `weightedcalcs.Calculator`