bit.ly/osb2015pandas

Introduction to data munging with Jupyter and pandas

Open Source Bridge 2015



In [1]:

    
from __future__ import division

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn
import rpy2

from IPython.display import display, Image, YouTubeVideo

%matplotlib inline

The case for open source data tools

Reproducibility and Transparency
Cost -- compare capabilities between software you already use and open source here
Learn more about open science at Amy Boyle's talk Thursday afternoon!

IPython/Jupyter Notebook: some cool tips for beginners

1. holy moly tab completion



In [3]:

    
pd.read_csv('









    Out[3]:





<function pandas.io.parsers.read_csv>

2. built-in documentation



In [12]:

    
?pd.read_csv

3. markup!

Markdown and HTML

</span>

LaTeX

$\bar{x} = \frac{\sum_{i=1}^{n}w_i\cdot x_i}{\sum_{i=1}^{n}w_i}$

Syntax highlighting for other languages

x <- c(0:10, 50)
xm <- mean(x)
c(xm, mean(x, trim = 0.10))

4. magic methods



In [14]:

    
%quickref



In [15]:

    
%load_ext rmagic









    



The rmagic extension is already loaded. To reload it, use:
  %reload_ext rmagic



In [17]:

    
%R x <- c(0:10, 50)
%R xm <- mean(x)
%R c(xm, mean(x, trim = 0.10))









    Out[17]:





array([ 8.75,  5.5 ])

5. multimedia



In [8]:

    
YouTubeVideo("L4Hbv4ugUWk")









    Out[8]:

6. Notebooks are now viewable directly on github, and also exportable as PDF or HTML.

pandas

Input

Source: IRS.gov



In [22]:

    
?pd.read_csv()



In [18]:

    
# read in a CSV
# specify that zipcode should be treated as a string rather than an int!
AGI = pd.read_csv('12zpallagi.csv',dtype={'zipcode': str})



In [19]:

    
AGI.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 166904 entries, 0 to 166903
Data columns (total 77 columns):
STATEFIPS    166904 non-null int64
STATE        166904 non-null object
zipcode      166904 non-null object
AGI_STUB     166904 non-null int64
N1           166904 non-null float64
MARS1        166904 non-null float64
MARS2        166904 non-null float64
MARS4        166904 non-null float64
PREP         166904 non-null float64
N2           166904 non-null float64
NUMDEP       166904 non-null float64
A00100       166904 non-null float64
N00200       166904 non-null float64
A00200       166904 non-null float64
N00300       166904 non-null float64
A00300       166904 non-null float64
N00600       166904 non-null float64
A00600       166904 non-null float64
N00650       166904 non-null float64
A00650       166904 non-null float64
N00900       166904 non-null float64
A00900       166904 non-null float64
SCHF         166904 non-null float64
N01000       166904 non-null float64
A01000       166904 non-null float64
N01400       166904 non-null float64
A01400       166904 non-null float64
N01700       166904 non-null float64
A01700       166904 non-null float64
N02300       166904 non-null float64
A02300       166904 non-null float64
N02500       166904 non-null float64
A02500       166904 non-null float64
N03300       166904 non-null float64
A03300       166904 non-null float64
N00101       166904 non-null float64
A00101       166904 non-null float64
N04470       166904 non-null float64
A04470       166904 non-null float64
N18425       166904 non-null float64
A18425       166904 non-null float64
N18450       166904 non-null float64
A18450       166904 non-null float64
N18500       166904 non-null float64
A18500       166904 non-null float64
N18300       166904 non-null float64
A18300       166904 non-null float64
N19300       166904 non-null float64
A19300       166904 non-null float64
N19700       166904 non-null float64
A19700       166904 non-null float64
N04800       166904 non-null float64
A04800       166904 non-null float64
N07100       166904 non-null float64
A07100       166904 non-null float64
N07220       166904 non-null float64
A07220       166904 non-null float64
N07180       166904 non-null float64
A07180       166904 non-null float64
N07260       166904 non-null float64
A07260       166904 non-null float64
N59660       166904 non-null float64
A59660       166904 non-null float64
N59720       166904 non-null float64
A59720       166904 non-null float64
N11070       166904 non-null float64
A11070       166904 non-null float64
N09600       166904 non-null float64
A09600       166904 non-null float64
N06500       166904 non-null float64
A06500       166904 non-null float64
N10300       166904 non-null float64
A10300       166904 non-null float64
N11901       166904 non-null float64
A11901       166904 non-null float64
N11902       166904 non-null float64
A11902       166904 non-null float64
dtypes: float64(73), int64(2), object(2)
memory usage: 99.3+ MB

Transformation



In [20]:

    
# you can select columns by label or position!
AGI_column_subset = AGI[['STATE','AGI_STUB','zipcode','N1','A00100']]



In [21]:

    
AGI_column_subset.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 166904 entries, 0 to 166903
Data columns (total 5 columns):
STATE       166904 non-null object
AGI_STUB    166904 non-null int64
zipcode     166904 non-null object
N1          166904 non-null float64
A00100      166904 non-null float64
dtypes: float64(2), int64(1), object(2)
memory usage: 7.6+ MB



In [31]:

    
# note this is inclusive!
AGI_row_subset = AGI.ix[6:11]



In [32]:

    
AGI_row_subset









    Out[32]:






  
    
      
      STATEFIPS
      STATE
      zipcode
      AGI_STUB
      N1
      MARS1
      MARS2
      MARS4
      PREP
      N2
      ...
      N09600
      A09600
      N06500
      A06500
      N10300
      A10300
      N11901
      A11901
      N11902
      A11902
    
  
  
    
      6 
       1
       AL
       35004
       1
       1600
       990
       270
       300
       840
       2390
      ...
        0
        0
        570
        397
        760
        667
       130
       107
       1370
       2855
    
    
      7 
       1
       AL
       35004
       2
       1310
       570
       400
       290
       680
       2620
      ...
        0
        0
       1030
       2595
       1060
       2752
       150
       186
       1140
       2965
    
    
      8 
       1
       AL
       35004
       3
        900
       280
       500
       100
       460
       2090
      ...
        0
        0
        860
       4382
        870
       4604
       170
       278
        710
       1982
    
    
      9 
       1
       AL
       35004
       4
        590
        70
       490
        30
       270
       1590
      ...
        0
        0
        580
       4605
        580
       4774
       120
       300
        460
       1460
    
    
      10
       1
       AL
       35004
       5
        480
        30
       440
         0
       250
       1340
      ...
        0
        0
        480
       7397
        480
       7623
       150
       582
        340
       1330
    
    
      11
       1
       AL
       35004
       6
         50
         0
        50
         0
        50
        140
      ...
       30
       70
         50
       4956
         50
       5125
        30
       585
          0
          0
    
  

6 rows × 77 columns



In [34]:

    
AGI_column_subset.rename(columns={'N1':'population','A00100':'amount'},inplace=True)









    



/usr/local/lib/python2.7/site-packages/pandas/core/frame.py:2417: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  **kwargs)



In [35]:

    
AGI_column_subset









    Out[35]:






  
    
      
      STATE
      AGI_STUB
      zipcode
      population
      amount
    
  
  
    
      0     
       AL
       1
       00000
       889920
       11517112
    
    
      1     
       AL
       2
       00000
       491150
       17617800
    
    
      2     
       AL
       3
       00000
       254280
       15644666
    
    
      3     
       AL
       4
       00000
       160160
       13885434
    
    
      4     
       AL
       5
       00000
       183320
       24641055
    
    
      5     
       AL
       6
       00000
        44840
       21926706
    
    
      6     
       AL
       1
       35004
         1600
          20639
    
    
      7     
       AL
       2
       35004
         1310
          48501
    
    
      8     
       AL
       3
       35004
          900
          55790
    
    
      9     
       AL
       4
       35004
          590
          50978
    
    
      10    
       AL
       5
       35004
          480
          59932
    
    
      11    
       AL
       6
       35004
           50
          21723
    
    
      12    
       AL
       1
       35005
         1400
          19059
    
    
      13    
       AL
       2
       35005
         1060
          38184
    
    
      14    
       AL
       3
       35005
          490
          29808
    
    
      15    
       AL
       4
       35005
          250
          21171
    
    
      16    
       AL
       5
       35005
          170
          22122
    
    
      17    
       AL
       6
       35005
            0
              0
    
    
      18    
       AL
       1
       35006
          500
           6356
    
    
      19    
       AL
       2
       35006
          330
          12039
    
    
      20    
       AL
       3
       35006
          180
          11227
    
    
      21    
       AL
       4
       35006
          150
          12855
    
    
      22    
       AL
       5
       35006
          140
          17443
    
    
      23    
       AL
       6
       35006
            0
              0
    
    
      24    
       AL
       1
       35007
         4240
          50439
    
    
      25    
       AL
       2
       35007
         2640
          96955
    
    
      26    
       AL
       3
       35007
         1860
         115225
    
    
      27    
       AL
       4
       35007
         1360
         117808
    
    
      28    
       AL
       5
       35007
         1510
         195070
    
    
      29    
       AL
       6
       35007
          190
          55060
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      166874
       WY
       1
       83126
           50
            633
    
    
      166875
       WY
       2
       83126
           20
            809
    
    
      166876
       WY
       3
       83126
           30
           2048
    
    
      166877
       WY
       4
       83126
           40
           6122
    
    
      166878
       WY
       5
       83126
            0
              0
    
    
      166879
       WY
       6
       83126
            0
              0
    
    
      166880
       WY
       1
       83127
          520
           6466
    
    
      166881
       WY
       2
       83127
          300
          11343
    
    
      166882
       WY
       3
       83127
          230
          14287
    
    
      166883
       WY
       4
       83127
          170
          14615
    
    
      166884
       WY
       5
       83127
          190
          24920
    
    
      166885
       WY
       6
       83127
           30
          16535
    
    
      166886
       WY
       1
       83128
          340
           4881
    
    
      166887
       WY
       2
       83128
          240
           9295
    
    
      166888
       WY
       3
       83128
          160
          10042
    
    
      166889
       WY
       4
       83128
           80
           6885
    
    
      166890
       WY
       5
       83128
           90
          11905
    
    
      166891
       WY
       6
       83128
           30
          11589
    
    
      166892
       WY
       1
       83414
           60
            531
    
    
      166893
       WY
       2
       83414
           50
           1700
    
    
      166894
       WY
       3
       83414
           40
           2491
    
    
      166895
       WY
       4
       83414
            0
              0
    
    
      166896
       WY
       5
       83414
           30
           4645
    
    
      166897
       WY
       6
       83414
           20
          14668
    
    
      166898
       WY
       1
       99999
         8870
         113864
    
    
      166899
       WY
       2
       99999
         6070
         220135
    
    
      166900
       WY
       3
       99999
         3720
         229700
    
    
      166901
       WY
       4
       99999
         2590
         224438
    
    
      166902
       WY
       5
       99999
         2960
         391340
    
    
      166903
       WY
       6
       99999
         1070
        1317401
    
  

166904 rows × 5 columns



In [63]:

    
# group by zipcode and sum other values, resetting index
AGI_grouped = AGI_column_subset.groupby('zipcode').sum().reset_index()



In [64]:

    
AGI_grouped.head()

A WILD ZIP CODE APPEARS!

Delete or render null? You decide!



In [65]:

    
AGI_grouped['population'].mean()









    Out[65]:





10254.987012987012



In [66]:

    
#this can also be done using the na_values param upon being read in
null_zips = AGI_grouped['zipcode'] == '00000'
AGI_grouped.loc[null_zips, 'zipcode'] = np.nan



In [67]:

    
AGI_grouped.head()



In [68]:

    
AGI_notnull = AGI_grouped.dropna()



In [69]:

    
AGI_notnull['population'].mean()









    Out[69]:





5128.9638875861319



In [70]:

    
AGI_grouped.dropna(inplace=True)



In [73]:

    
# make a new column with the real amount, not in thousands
AGI_grouped['actual_amount'] = AGI_grouped['amount'] * 1000



In [74]:

    
# make a mean!
AGI_grouped['weighted_mean_AGI'] = AGI_grouped['actual_amount']/AGI_grouped['population']



In [75]:

    
#use anonymous functions to change every value in a column!
AGI_grouped['weighted_mean_AGI']= AGI_grouped['weighted_mean_AGI'].apply(lambda x: round(x, 0))



In [77]:

    
AGI_grouped.head()









    Out[77]:






  
    
      
      zipcode
      AGI_STUB
      population
      amount
      actual_amount
      weighted_mean_AGI
    
  
  
    
      1
       01001
       21
       8780
       458716
       458716000
       52246
    
    
      2
       01002
       21
       9460
       732849
       732849000
       77468
    
    
      3
       01005
       21
       2230
       122744
       122744000
       55042
    
    
      4
       01007
       21
       7320
       467891
       467891000
       63920
    
    
      5
       01008
       21
        640
        38487
        38487000
       60136



In [79]:

    
# drop columns you won't need
AGI_grouped.drop(['AGI_STUB','amount','actual_amount'],axis=1,inplace=True)



In [81]:

    
AGI_grouped.head()









    Out[81]:






  
    
      
      zipcode
      population
      weighted_mean_AGI
    
  
  
    
      1
       01001
       8780
       52246
    
    
      2
       01002
       9460
       77468
    
    
      3
       01005
       2230
       55042
    
    
      4
       01007
       7320
       63920
    
    
      5
       01008
        640
       60136

Merging! Better than in traffic!



In [37]:

    
# also look into pandas.Series.unique
AGI_subset_geo = AGI[['zipcode','STATE']].drop_duplicates()



In [82]:

    
AGI_subset_geo









    Out[82]:






  
    
      
      zipcode
      STATE
    
  
  
    
      0     
       00000
       AL
    
    
      6     
       35004
       AL
    
    
      12    
       35005
       AL
    
    
      18    
       35006
       AL
    
    
      24    
       35007
       AL
    
    
      30    
       35010
       AL
    
    
      36    
       35014
       AL
    
    
      42    
       35016
       AL
    
    
      48    
       35019
       AL
    
    
      54    
       35020
       AL
    
    
      60    
       35022
       AL
    
    
      66    
       35023
       AL
    
    
      72    
       35031
       AL
    
    
      78    
       35033
       AL
    
    
      84    
       35034
       AL
    
    
      90    
       35035
       AL
    
    
      96    
       35040
       AL
    
    
      102   
       35042
       AL
    
    
      108   
       35043
       AL
    
    
      114   
       35044
       AL
    
    
      120   
       35045
       AL
    
    
      126   
       35046
       AL
    
    
      132   
       35049
       AL
    
    
      138   
       35051
       AL
    
    
      144   
       35053
       AL
    
    
      150   
       35054
       AL
    
    
      156   
       35055
       AL
    
    
      162   
       35057
       AL
    
    
      168   
       35058
       AL
    
    
      174   
       35061
       AL
    
    
      ...
      ...
      ...
    
    
      166724
       82922
       WY
    
    
      166730
       82923
       WY
    
    
      166736
       82925
       WY
    
    
      166742
       82930
       WY
    
    
      166748
       82932
       WY
    
    
      166754
       82933
       WY
    
    
      166760
       82935
       WY
    
    
      166766
       82937
       WY
    
    
      166772
       82941
       WY
    
    
      166778
       83001
       WY
    
    
      166784
       83011
       WY
    
    
      166790
       83012
       WY
    
    
      166796
       83013
       WY
    
    
      166802
       83014
       WY
    
    
      166808
       83101
       WY
    
    
      166814
       83110
       WY
    
    
      166820
       83111
       WY
    
    
      166826
       83112
       WY
    
    
      166832
       83113
       WY
    
    
      166838
       83114
       WY
    
    
      166844
       83115
       WY
    
    
      166850
       83118
       WY
    
    
      166856
       83120
       WY
    
    
      166862
       83122
       WY
    
    
      166868
       83123
       WY
    
    
      166874
       83126
       WY
    
    
      166880
       83127
       WY
    
    
      166886
       83128
       WY
    
    
      166892
       83414
       WY
    
    
      166898
       99999
       WY
    
  

27820 rows × 2 columns



In [83]:

    
#merge rather than join if you want to use a common column other than the index
AGI_final = pd.merge(AGI_grouped, AGI_subset_geo, how='left', on='zipcode')



In [84]:

    
AGI_final.head()









    Out[84]:






  
    
      
      zipcode
      population
      weighted_mean_AGI
      STATE
    
  
  
    
      0
       01001
       8780
       52246
       MA
    
    
      1
       01002
       9460
       77468
       MA
    
    
      2
       01005
       2230
       55042
       MA
    
    
      3
       01007
       7320
       63920
       MA
    
    
      4
       01008
        640
       60136
       MA



In [56]:

    
# this gives you the greated weighted_mean_AGI first
AGI_final.sort('weighted_mean_AGI',ascending=False).head()









    Out[56]:






  
    
      
      zipcode
      population
      weighted_mean_AGI
      STATE
    
  
  
    
      9031
      33109
      250
      2694776
      FL
    
    
      26104
      94027
      3220
      1464534
      CA
    
    
      4926
      19035
      2040
      1052019
      PA
    
    
      2194
      10005
      5580
      983554
      NY
    
    
      9168
      33480
      5580
      966673
      FL



In [89]:

    
# chain methods!
AGI_final.groupby('STATE').mean().sort('weighted_mean_AGI',ascending=False)









    Out[89]:






  
    
      
      population
      weighted_mean_AGI
    
    
      STATE
      
      
    
  
  
    
      DC
       139477.826087
       101818.217391
    
    
      CT
        17525.992366
        95944.801527
    
    
      NJ
        13011.868132
        89198.278388
    
    
      MA
        12580.020747
        86974.653527
    
    
      CA
        12839.494609
        75530.843666
    
    
      MD
        14052.049383
        73656.022222
    
    
      NV
        31566.615385
        71164.100000
    
    
      NY
         7778.382924
        68857.815653
    
    
      ND
        11232.543554
        68087.386760
    
    
      FL
        12769.008715
        65399.029412
    
    
      RI
        48424.285714
        65349.700000
    
    
      NH
        15360.649351
        64192.991342
    
    
      WY
        29240.462963
        64139.157407
    
    
      DE
        59146.428571
        63232.964286
    
    
      CO
        13168.686869
        62932.449495
    
    
      WA
        12110.442656
        61812.977867
    
    
      TX
         8598.743068
        60810.509550
    
    
      IL
         7203.452769
        60420.749186
    
    
      VA
         8309.132075
        59568.041509
    
    
      AK
        58637.962963
        58816.740741
    
    
      MN
         6894.545455
        58649.895202
    
    
      UT
        21698.804348
        57757.282609
    
    
      AZ
        18891.527778
        56760.319444
    
    
      NE
         7706.639175
        56243.342268
    
    
      PA
         6472.874909
        55454.958303
    
    
      HI
        59265.254237
        55327.932203
    
    
      IA
         5183.317191
        55095.415254
    
    
      SD
        11328.965517
        53825.644828
    
    
      KS
         6921.953642
        53340.508278
    
    
      WI
         7870.336606
        52714.803647
    
    
      OK
         8129.944954
        52095.623853
    
    
      LA
        10651.504425
        51541.876106
    
    
      OR
        13021.898017
        51247.753541
    
    
      OH
         8307.505010
        50755.814629
    
    
      MI
         8311.313131
        50467.204265
    
    
      VT
        13467.805907
        50113.248945
    
    
      MT
        14748.628319
        49128.296460
    
    
      IN
         8683.679525
        49115.709199
    
    
      GA
        10617.046477
        48504.626687
    
    
      NC
         9685.696552
        48415.451034
    
    
      ID
        16758.625592
        47691.592417
    
    
      ME
         9470.948509
        47603.149051
    
    
      TN
         9664.397284
        46767.217317
    
    
      SC
        13080.294118
        45617.171123
    
    
      MO
         6242.913386
        45408.147357
    
    
      AL
         8463.923611
        45086.321181
    
    
      NM
        17494.028436
        44060.383886
    
    
      WV
         7027.126214
        43301.866019
    
    
      KY
         7155.471125
        42994.249240
    
    
      AR
         8252.953157
        42244.458248
    
    
      MS
        11071.300813
        40697.222222



In [88]:

    
AGI_final.sort('weighted_mean_AGI').head().plot(kind='bar')









    Out[88]:





<matplotlib.axes._subplots.AxesSubplot at 0x1173f2c90>

References

Python for Data Analysis written by Wes McKinney, creator of pandas

The inimitable Julia Evans' pandas cookbook

Cyrille Rossant's books on IPython

Interesting notebooks

	STATEFIPS	STATE	zipcode	AGI_STUB	N1	MARS1	MARS2	MARS4	PREP	N2	...	N09600	A09600	N06500	A06500	N10300	A10300	N11901	A11901	N11902	A11902
6	1	AL	35004	1	1600	990	270	300	840	2390	...	0	0	570	397	760	667	130	107	1370	2855
7	1	AL	35004	2	1310	570	400	290	680	2620	...	0	0	1030	2595	1060	2752	150	186	1140	2965
8	1	AL	35004	3	900	280	500	100	460	2090	...	0	0	860	4382	870	4604	170	278	710	1982
9	1	AL	35004	4	590	70	490	30	270	1590	...	0	0	580	4605	580	4774	120	300	460	1460
10	1	AL	35004	5	480	30	440	0	250	1340	...	0	0	480	7397	480	7623	150	582	340	1330
11	1	AL	35004	6	50	0	50	0	50	140	...	30	70	50	4956	50	5125	30	585	0	0

	STATE	AGI_STUB	zipcode	population	amount
0	AL	1	00000	889920	11517112
1	AL	2	00000	491150	17617800
2	AL	3	00000	254280	15644666
3	AL	4	00000	160160	13885434
4	AL	5	00000	183320	24641055
5	AL	6	00000	44840	21926706
6	AL	1	35004	1600	20639
7	AL	2	35004	1310	48501
8	AL	3	35004	900	55790
9	AL	4	35004	590	50978
10	AL	5	35004	480	59932
11	AL	6	35004	50	21723
12	AL	1	35005	1400	19059
13	AL	2	35005	1060	38184
14	AL	3	35005	490	29808
15	AL	4	35005	250	21171
16	AL	5	35005	170	22122
17	AL	6	35005	0	0
18	AL	1	35006	500	6356
19	AL	2	35006	330	12039
20	AL	3	35006	180	11227
21	AL	4	35006	150	12855
22	AL	5	35006	140	17443
23	AL	6	35006	0	0
24	AL	1	35007	4240	50439
25	AL	2	35007	2640	96955
26	AL	3	35007	1860	115225
27	AL	4	35007	1360	117808
28	AL	5	35007	1510	195070
29	AL	6	35007	190	55060
...	...	...	...	...	...
166874	WY	1	83126	50	633
166875	WY	2	83126	20	809
166876	WY	3	83126	30	2048
166877	WY	4	83126	40	6122
166878	WY	5	83126	0	0
166879	WY	6	83126	0	0
166880	WY	1	83127	520	6466
166881	WY	2	83127	300	11343
166882	WY	3	83127	230	14287
166883	WY	4	83127	170	14615
166884	WY	5	83127	190	24920
166885	WY	6	83127	30	16535
166886	WY	1	83128	340	4881
166887	WY	2	83128	240	9295
166888	WY	3	83128	160	10042
166889	WY	4	83128	80	6885
166890	WY	5	83128	90	11905
166891	WY	6	83128	30	11589
166892	WY	1	83414	60	531
166893	WY	2	83414	50	1700
166894	WY	3	83414	40	2491
166895	WY	4	83414	0	0
166896	WY	5	83414	30	4645
166897	WY	6	83414	20	14668
166898	WY	1	99999	8870	113864
166899	WY	2	99999	6070	220135
166900	WY	3	99999	3720	229700
166901	WY	4	99999	2590	224438
166902	WY	5	99999	2960	391340
166903	WY	6	99999	1070	1317401

	zipcode	AGI_STUB	population	amount
0	00000	1071	142098490	9123982917
1	01001	21	8780	458716
2	01002	21	9460	732849
3	01005	21	2230	122744
4	01007	21	7320	467891

	zipcode	population	weighted_mean_AGI
1	01001	8780	52246
2	01002	9460	77468
3	01005	2230	55042
4	01007	7320	63920
5	01008	640	60136

	zipcode	population	weighted_mean_AGI	STATE
9031	33109	250	2694776	FL
26104	94027	3220	1464534	CA
4926	19035	2040	1052019	PA
2194	10005	5580	983554	NY
9168	33480	5580	966673	FL

	population	weighted_mean_AGI
STATE
DC	139477.826087	101818.217391
CT	17525.992366	95944.801527
NJ	13011.868132	89198.278388
MA	12580.020747	86974.653527
CA	12839.494609	75530.843666
MD	14052.049383	73656.022222
NV	31566.615385	71164.100000
NY	7778.382924	68857.815653
ND	11232.543554	68087.386760
FL	12769.008715	65399.029412
RI	48424.285714	65349.700000
NH	15360.649351	64192.991342
WY	29240.462963	64139.157407
DE	59146.428571	63232.964286
CO	13168.686869	62932.449495
WA	12110.442656	61812.977867
TX	8598.743068	60810.509550
IL	7203.452769	60420.749186
VA	8309.132075	59568.041509
AK	58637.962963	58816.740741
MN	6894.545455	58649.895202
UT	21698.804348	57757.282609
AZ	18891.527778	56760.319444
NE	7706.639175	56243.342268
PA	6472.874909	55454.958303
HI	59265.254237	55327.932203
IA	5183.317191	55095.415254
SD	11328.965517	53825.644828
KS	6921.953642	53340.508278
WI	7870.336606	52714.803647
OK	8129.944954	52095.623853
LA	10651.504425	51541.876106
OR	13021.898017	51247.753541
OH	8307.505010	50755.814629
MI	8311.313131	50467.204265
VT	13467.805907	50113.248945
MT	14748.628319	49128.296460
IN	8683.679525	49115.709199
GA	10617.046477	48504.626687
NC	9685.696552	48415.451034
ID	16758.625592	47691.592417
ME	9470.948509	47603.149051
TN	9664.397284	46767.217317
SC	13080.294118	45617.171123
MO	6242.913386	45408.147357
AL	8463.923611	45086.321181
NM	17494.028436	44060.383886
WV	7027.126214	43301.866019
KY	7155.471125	42994.249240
AR	8252.953157	42244.458248
MS	11071.300813	40697.222222