2. Refine the Data

"Data is messy"

Missing e.g. Check for missing or incomplete data
Quality e.g. Check for duplicates, accuracy, unusual data
Parse e.g. extract year and month from date
Convert e.g. free text to coded value
Derive e.g. gender from title
Calculate e.g. percentages, proportion
Remove e.g. remove redundant data
Merge e.g. first and surname for full name
Aggregate e.g. rollup by year, cluster by area
Filter e.g. exclude based on location
Sample e.g. extract a representative data
Summary e.g. show summary stats like mean



In [1]:

    
# Load the libraries
import numpy as np
import pandas as pd



In [2]:

    
# Load the data again!
df = pd.read_csv("data/Weed_Price.csv", parse_dates=[-1])

2.1 Missing Data

By “missing” data we simply mean null or “not present for whatever reason”. Lets see if we can find the missing data in our data set either because it exists and was not collected or it never existed



In [3]:

    
# Lets start the count to seeing about missing data
df.count()









    Out[3]:





State     22899
HighQ     22899
HighQN    22899
MedQ      22899
MedQN     22899
LowQ      12342
LowQN     22899
date      22899
dtype: int64



In [4]:

    
df['LowQ'].isnull()









    Out[4]:





0        False
1        False
2        False
3        False
4        False
5        False
6        False
7        False
8        False
9        False
10       False
11       False
12       False
13       False
14       False
15       False
16       False
17       False
18       False
19       False
20       False
21       False
22       False
23       False
24       False
25       False
26       False
27       False
28       False
29       False
         ...  
22869     True
22870     True
22871     True
22872     True
22873     True
22874     True
22875     True
22876     True
22877     True
22878     True
22879     True
22880     True
22881     True
22882     True
22883     True
22884     True
22885     True
22886     True
22887     True
22888     True
22889     True
22890     True
22891     True
22892     True
22893     True
22894     True
22895     True
22896     True
22897     True
22898     True
Name: LowQ, dtype: bool



In [5]:

    
# We can see the bottom rows which have NaN values
df.tail()









    Out[5]:






  
    
      
      State
      HighQ
      HighQN
      MedQ
      MedQN
      LowQ
      LowQN
      date
    
  
  
    
      22894
      Virginia
      364.98
      3513
      293.12
      3079
      NaN
      284
      2014-12-31
    
    
      22895
      Washington
      233.05
      3337
      189.92
      3562
      NaN
      160
      2014-12-31
    
    
      22896
      West Virginia
      359.35
      551
      224.03
      545
      NaN
      60
      2014-12-31
    
    
      22897
      Wisconsin
      350.52
      2244
      272.71
      2221
      NaN
      167
      2014-12-31
    
    
      22898
      Wyoming
      322.27
      131
      351.86
      197
      NaN
      12
      2014-12-31

Pandas will represent missing value by NaN

What can we do this with missing value?

Drop these rows / columns? Use .dropna(how='any')
Fill with a dummy value? Use .fillna(value=dummy)
Impute the cell with the most recent value? Use .fillna(method='ffill')
Interpolate the amount in a linear fashion? Use .interpolate()

We use the inplace = True operator to avoid making a copy of the dataframe and changing the dataframe itself



In [6]:

    
# Lets sort this data frame by State and Date
df.sort(columns=['State','date'], inplace=True)



In [7]:

    
# Lets fill the missing value with last available value
df.fillna(method = "ffill", inplace=True)



In [8]:

    
df.count()









    Out[8]:





State     22899
HighQ     22899
HighQN    22899
MedQ      22899
MedQN     22899
LowQ      22899
LowQN     22899
date      22899
dtype: int64

Exercise

Fill the missing value with a backward fill.



In [ ]:

Fill the missing values with the mean for the column.



In [ ]:

2.2 Quality of the Data

Lets check for completeness.

Say, do we have data on each date for all the 51 states?



In [9]:

    
df["State"].value_counts()









    Out[9]:





Utah                    449
Washington              449
Massachusetts           449
Missouri                449
Connecticut             449
Vermont                 449
Pennsylvania            449
Georgia                 449
California              449
North Carolina          449
Florida                 449
Oregon                  449
Alabama                 449
Indiana                 449
Kentucky                449
Hawaii                  449
South Carolina          449
Virginia                449
Oklahoma                449
Texas                   449
Illinois                449
Nebraska                449
Ohio                    449
Arizona                 449
Rhode Island            449
Louisiana               449
Wisconsin               449
New York                449
Arkansas                449
Montana                 449
New Hampshire           449
North Dakota            449
New Mexico              449
Delaware                449
Iowa                    449
New Jersey              449
Kansas                  449
Idaho                   449
Colorado                449
Maryland                449
District of Columbia    449
Michigan                449
South Dakota            449
West Virginia           449
Minnesota               449
Wyoming                 449
Mississippi             449
Tennessee               449
Nevada                  449
Maine                   449
Alaska                  449
dtype: int64

Lets check the dates and see if they are all continuous



In [10]:

    
df1 = df[df.State=='California'].copy()
df2 = df[df.State=='California'].copy()



In [11]:

    
df1.shape









    Out[11]:





(449, 8)



In [12]:

    
print "Earliest Date:",df1.date.min(), "\n", "Latest Date:",df1.date.max(), "\n",  \
"Number of days in between them:", df1.date.max() - df1.date.min()









    



Earliest Date: 2013-12-27 00:00:00 
Latest Date: 2015-06-11 00:00:00 
Number of days in between them: 531 days 00:00:00



In [13]:

    
df1.groupby(['date']).size()









    Out[13]:





date
2013-12-27    1
2013-12-28    1
2013-12-29    1
2013-12-30    1
2013-12-31    1
2014-01-01    1
2014-01-02    1
2014-01-03    1
2014-01-04    1
2014-01-05    1
2014-01-06    1
2014-01-07    1
2014-01-08    1
2014-01-09    1
2014-01-10    1
2014-01-11    1
2014-01-12    1
2014-01-13    1
2014-01-14    1
2014-01-15    1
2014-01-16    1
2014-01-17    1
2014-01-18    1
2014-01-19    1
2014-01-20    1
2014-01-21    1
2014-01-22    1
2014-01-23    1
2014-01-24    1
2014-01-25    1
             ..
2015-05-13    1
2015-05-14    1
2015-05-15    1
2015-05-16    1
2015-05-17    1
2015-05-18    1
2015-05-19    1
2015-05-20    1
2015-05-21    1
2015-05-22    1
2015-05-23    1
2015-05-24    1
2015-05-25    1
2015-05-26    1
2015-05-27    1
2015-05-28    1
2015-05-29    1
2015-05-30    1
2015-05-31    1
2015-06-01    1
2015-06-02    1
2015-06-03    1
2015-06-04    1
2015-06-05    1
2015-06-06    1
2015-06-07    1
2015-06-08    1
2015-06-09    1
2015-06-10    1
2015-06-11    1
dtype: int64



In [14]:

    
df1.set_index("date", inplace=True)
df2.set_index("date", inplace=True)



In [15]:

    
idx = pd.date_range(df1.index.min(), df1.index.max())



In [16]:

    
idx









    Out[16]:





DatetimeIndex(['2013-12-27', '2013-12-28', '2013-12-29', '2013-12-30',
               '2013-12-31', '2014-01-01', '2014-01-02', '2014-01-03',
               '2014-01-04', '2014-01-05', 
               ...
               '2015-06-02', '2015-06-03', '2015-06-04', '2015-06-05',
               '2015-06-06', '2015-06-07', '2015-06-08', '2015-06-09',
               '2015-06-10', '2015-06-11'],
              dtype='datetime64[ns]', length=532, freq='D', tz=None)



In [17]:

    
df1 = df1.reindex(idx, fill_value=0)



In [18]:

    
df1.shape









    Out[18]:





(532, 7)

Exercise Show the list of dates that were missing. Hint Leverage df2. Both df1 and df2 have date as index



In [ ]:

2.3 Parse the Data

Lets see if we can get the year, month, week and weekdays from the date. Pandas has got good built in functionality for timeseries data using the DatetimeIndex method



In [19]:

    
df['year'] = pd.DatetimeIndex(df['date']).year
df['month'] = pd.DatetimeIndex(df['date']).month
df['week'] = pd.DatetimeIndex(df['date']).week
df['weekday'] = pd.DatetimeIndex(df['date']).weekday



In [20]:

    
df.dtypes









    Out[20]:





State              object
HighQ             float64
HighQN              int64
MedQ              float64
MedQN               int64
LowQ              float64
LowQN               int64
date       datetime64[ns]
year                int32
month               int32
week                int32
weekday             int32
dtype: object



In [21]:

    
df.tail()



In [22]:

    
df['year'].value_counts()









    Out[22]:





2014    18564
2015     4080
2013      255
dtype: int64



In [23]:

    
df['month'].value_counts()









    Out[23]:





1     3162
5     2703
2     2244
6     2091
12    1836
10    1581
7     1581
3     1581
11    1530
9     1530
8     1530
4     1530
dtype: int64



In [24]:

    
df["weekday"].value_counts()









    Out[24]:





6    3315
0    3315
4    3264
3    3264
2    3264
1    3264
5    3213
dtype: int64

2.4 Aggregate the Data

To aggregate, we typically use the “group by” function, which involves the following steps

Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure



In [25]:

    
df_mean = df.groupby("State", as_index=False).mean()



In [26]:

    
df_mean.shape









    Out[26]:





(51, 11)



In [27]:

    
df_mean.head()









    Out[27]:






  
    
      
      State
      HighQ
      HighQN
      MedQ
      MedQN
      LowQ
      LowQN
      year
      month
      week
      weekday
    
  
  
    
      0
      Alabama
      339.561849
      1379.414254
      204.606169
      1270.351893
      145.978508
      161.149220
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      1
      Alaska
      291.482004
      321.244989
      262.046392
      407.917595
      394.653964
      32.334076
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      2
      Arizona
      300.667483
      2392.465479
      209.365345
      2137.414254
      188.500134
      279.006682
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      3
      Arkansas
      348.056147
      751.988864
      190.414655
      724.683742
      126.771269
      135.902004
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      4
      California
      245.376125
      14947.073497
      191.268909
      16769.821826
      189.783586
      976.298441
      2014.167038
      5.953229
      23.812918
      2.995546

Pivot Table



In [28]:

    
df.head()



In [29]:

    
pd.pivot_table(df, values='HighQ', index=['State'], columns=["weekday"] )









    Out[29]:






  
    
      weekday
      0
      1
      2
      3
      4
      5
      6
    
    
      State
      
      
      
      
      
      
      
    
  
  
    
      Alabama
      339.556308
      339.577656
      339.559375
      339.552500
      339.577969
      339.568254
      339.541385
    
    
      Alaska
      291.463231
      291.384687
      291.390000
      291.597500
      291.506406
      291.596032
      291.438923
    
    
      Arizona
      300.612000
      300.653906
      300.643750
      300.655000
      300.751562
      300.715873
      300.642308
    
    
      Arkansas
      347.930462
      348.011250
      348.017656
      347.934844
      348.278281
      348.235238
      347.991077
    
    
      California
      245.348923
      245.364375
      245.359219
      245.342813
      245.434219
      245.425556
      245.359231
    
    
      Colorado
      238.948462
      238.885938
      238.889844
      238.977187
      238.885937
      238.897619
      238.944769
    
    
      Connecticut
      341.624615
      341.682812
      341.633750
      341.616406
      341.832656
      341.819683
      341.652308
    
    
      Delaware
      366.712769
      366.748750
      366.705937
      366.659375
      366.962344
      366.937143
      366.750615
    
    
      District of Columbia
      348.205385
      348.055000
      348.132656
      348.145625
      348.213594
      348.277778
      348.212462
    
    
      Florida
      302.548923
      302.577031
      302.529688
      302.510781
      302.638437
      302.648095
      302.541231
    
    
      Georgia
      326.588308
      326.635937
      326.594219
      326.579687
      326.805000
      326.784762
      326.633077
    
    
      Hawaii
      309.940615
      309.895312
      309.855156
      309.818750
      309.934062
      310.033016
      309.956462
    
    
      Idaho
      275.642615
      275.627656
      275.609844
      275.653437
      275.671250
      275.654444
      275.616923
    
    
      Illinois
      355.406769
      355.460313
      355.442031
      355.427188
      355.580625
      355.502540
      355.426000
    
    
      Indiana
      333.147077
      333.184531
      333.187344
      333.145781
      333.257031
      333.253333
      333.164769
    
    
      Iowa
      367.059692
      367.086094
      367.071563
      367.055625
      367.175781
      367.173175
      367.051077
    
    
      Kansas
      350.760615
      350.785469
      350.765938
      350.778438
      350.928906
      350.903492
      350.813846
    
    
      Kentucky
      338.906615
      338.857812
      338.846719
      338.874062
      338.796719
      338.799048
      338.902769
    
    
      Louisiana
      366.743385
      366.816562
      366.761094
      366.733437
      367.016406
      366.960159
      366.800615
    
    
      Maine
      315.272615
      315.392812
      315.384844
      315.330156
      315.570469
      315.523333
      315.262308
    
    
      Maryland
      370.884308
      370.970469
      370.912187
      370.899219
      371.160937
      371.162063
      370.912462
    
    
      Massachusetts
      348.942462
      348.995625
      348.968750
      348.945625
      349.156562
      349.100952
      348.974615
    
    
      Michigan
      292.022462
      292.080312
      292.044531
      292.004844
      292.229844
      292.199841
      292.055077
    
    
      Minnesota
      358.104154
      358.228437
      358.188594
      358.052969
      358.507969
      358.471587
      358.156308
    
    
      Mississippi
      337.535385
      337.479062
      337.495469
      337.532656
      337.615937
      337.660635
      337.597231
    
    
      Missouri
      359.098923
      359.278594
      359.273594
      359.264844
      359.418438
      359.434921
      359.174000
    
    
      Montana
      267.832615
      267.860625
      267.713125
      267.700937
      267.930781
      267.934444
      267.850923
    
    
      Nebraska
      341.441692
      341.522187
      341.490000
      341.479375
      341.768438
      341.713968
      341.504923
    
    
      Nevada
      267.794769
      267.827969
      267.823594
      267.811875
      267.866875
      267.824286
      267.792308
    
    
      New Hampshire
      359.469846
      359.543906
      359.519844
      359.362656
      359.595469
      359.512222
      359.331231
    
    
      New Jersey
      349.260769
      349.313594
      349.299219
      349.272969
      349.435469
      349.396667
      349.276000
    
    
      New Mexico
      289.536769
      289.511719
      289.499531
      289.470156
      289.670937
      289.685714
      289.565231
    
    
      New York
      346.877692
      346.907031
      346.866406
      346.831875
      347.027344
      346.995079
      346.886154
    
    
      North Carolina
      349.413846
      349.476719
      349.435625
      349.406563
      349.585156
      349.565873
      349.444308
    
    
      North Dakota
      398.349692
      398.576562
      398.677656
      398.550000
      399.090469
      398.785079
      398.658769
    
    
      Ohio
      334.526615
      334.546562
      334.536406
      334.532969
      334.615938
      334.592063
      334.533385
    
    
      Oklahoma
      361.417077
      361.645625
      361.615313
      361.597656
      361.753437
      361.598730
      361.389846
    
    
      Oregon
      208.461385
      208.474844
      208.454063
      208.422344
      208.604688
      208.579048
      208.479385
    
    
      Pennsylvania
      366.067231
      366.102969
      366.094375
      366.060469
      366.253125
      366.222857
      366.081846
    
    
      Rhode Island
      323.733077
      323.828281
      323.801094
      323.756094
      323.998594
      323.951746
      323.803538
    
    
      South Carolina
      341.612154
      341.686094
      341.656250
      341.641094
      341.837500
      341.803651
      341.617846
    
    
      South Dakota
      375.626615
      375.828125
      375.830937
      375.837500
      375.946875
      376.010159
      375.657692
    
    
      Tennessee
      352.678000
      352.730469
      352.711094
      352.707344
      352.871250
      352.868730
      352.707846
    
    
      Texas
      337.034154
      337.069687
      337.037969
      337.023750
      337.267656
      337.235714
      337.071231
    
    
      Utah
      289.552308
      289.597500
      289.586094
      289.567344
      289.634687
      289.671905
      289.564923
    
    
      Vermont
      374.172462
      374.250312
      374.204844
      374.181250
      374.395156
      374.335556
      374.216615
    
    
      Virginia
      368.084308
      368.120781
      368.112813
      368.087969
      368.264375
      368.245238
      368.116462
    
    
      Washington
      233.381231
      233.377969
      233.361875
      233.363594
      233.422500
      233.426032
      233.382308
    
    
      West Virginia
      357.408000
      357.428906
      357.368125
      357.330156
      357.395938
      357.424444
      357.406462
    
    
      Wisconsin
      352.981077
      353.050313
      353.040469
      353.000938
      353.160000
      353.141905
      352.996308
    
    
      Wyoming
      338.837538
      339.090625
      339.045781
      338.932969
      339.260469
      339.072698
      338.910769

Exercise Get a cross tabulation: for each state, for each month, get the prices for the weekday as shown in the output



In [ ]:

2.5 Derive the Data

Lets us load the demographic dataset and create a new column for others in the population



In [30]:

    
df_demo = pd.read_csv("data/Demographics_State.csv")



In [31]:

    
df_demo.head()









    Out[31]:






  
    
      
      region
      total_population
      percent_white
      percent_black
      percent_asian
      percent_hispanic
      per_capita_income
      median_rent
      median_age
    
  
  
    
      0
      alabama
      4799277
      67
      26
      1
      4
      23680
      501
      38.1
    
    
      1
      alaska
      720316
      63
      3
      5
      6
      32651
      978
      33.6
    
    
      2
      arizona
      6479703
      57
      4
      3
      30
      25358
      747
      36.3
    
    
      3
      arkansas
      2933369
      74
      15
      1
      7
      22170
      480
      37.5
    
    
      4
      california
      37659181
      40
      6
      13
      38
      29527
      1119
      35.4



In [32]:

    
df_demo["percent_other"] = 100 - df_demo["percent_white"] - df_demo["percent_black"] - df_demo["percent_asian"] - df_demo["percent_hispanic"]



In [ ]:

    
df_demo.head()

Exercise Express median rent as a proportion of california's median rent (Compute it as a new column)



In [ ]:

2.6 Merge the Data

Lets merge the demographic dataset with the price dataset



In [ ]:

    
# Let us change the column name region to State
df_demo = df_demo.rename(columns={'region':'State'})



In [ ]:

    
df_demo.head()



In [ ]:

    
# We can now merge Demographic and Price mean data into one single data frame
df_merge = pd.merge(df_mean, df_demo, how='inner', on='State')
df_merge.head()

What happened? Why is there no data in the dataframe?



In [ ]:

    
# Change the State in df_mean to lower case
df_mean['State'] = df_mean['State'].str.lower()



In [ ]:

    
df_mean.head()



In [ ]:

    
# We can now merge Demographic and Price mean data into one single data frame
df_merge = pd.merge(df_mean, df_demo, how='inner', on='State')



In [ ]:

    
df_merge.head()

2.7 Filter the Data

Lets start by filtering the data

by location
by Year
by location & Year



In [ ]:

    
# Filter data for location California
df_cal = df[df["State"] == "California"]



In [ ]:

    
df_cal.shape



In [ ]:

    
# Filter data for year
df_2014 = df[df["year"] == 2014]



In [ ]:

    
df_2014.shape



In [ ]:

    
df_cal_2014 = df[(df["year"] == 2014) & (df["State"] == "California")]



In [ ]:

    
df_cal_2014.shape



In [ ]:

    
df_cal_2014.head()

Exercise Find the % of hispanic population for the state with max white population (use df_demo)



In [ ]:

2.8 Summarise the Data

We can use the describe function to get the summary stats for each column in the data frame



In [ ]:

    
df.describe()

We can also use convenience functions like sum(), count(), mean() etc. to calculate these



In [ ]:

    
df.HighQ.mean()



In [ ]:

    
# Lets do this the hard way
df.HighQ.sum()



In [ ]:

    
df.HighQ.count()



In [ ]:

    
df.HighQ.sum()/df.HighQ.count()



In [ ]:

    
df.HighQ.median()

2.9 Sample the Data



In [ ]:

    
?df.sample



In [ ]:

    
df_ca_sample = df[df.State=='California'].sample(n = 50, replace = True, random_state=123)



In [ ]:

    
df_ca_sample.duplicated()



In [ ]:

    
df_ca_sample.loc[8572]

2.10 Quirks in Pandas



In [ ]:

    
df_ca_sample.iat[0, 0] = "Cal"



In [ ]:

    
df_ca_sample.head()

But changes when you copy



In [ ]:

    
df_ca_sample2 = df_ca_sample



In [ ]:

    
df_ca_sample2.iat[0, 0] = "CA"
df_ca_sample2.head()



In [ ]:

    
df_ca_sample.head()

Fix the issue



In [ ]:

    
df_ca_sample3 = df_ca_sample2.copy()



In [ ]:

    
df_ca_sample3.head()



In [ ]:

    
df_ca_sample3.iat[0, 0] = "CALIFORNIA"
df_ca_sample3.head()



In [ ]:

    
df_ca_sample2.head()



In [ ]:

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date
22894	Virginia	364.98	3513	293.12	3079	NaN	284	2014-12-31
22895	Washington	233.05	3337	189.92	3562	NaN	160	2014-12-31
22896	West Virginia	359.35	551	224.03	545	NaN	60	2014-12-31
22897	Wisconsin	350.52	2244	272.71	2221	NaN	167	2014-12-31
22898	Wyoming	322.27	131	351.86	197	NaN	12	2014-12-31

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date	year	month	week	weekday
4997	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-07	2015	6	23	6
5762	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-08	2015	6	24	0
6527	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-09	2015	6	24	1
7343	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-10	2015	6	24	2
8159	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-11	2015	6	24	3

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	year	month	week	weekday
0	Alabama	339.561849	1379.414254	204.606169	1270.351893	145.978508	161.149220	2014.167038	5.953229	23.812918	2.995546
1	Alaska	291.482004	321.244989	262.046392	407.917595	394.653964	32.334076	2014.167038	5.953229	23.812918	2.995546
2	Arizona	300.667483	2392.465479	209.365345	2137.414254	188.500134	279.006682	2014.167038	5.953229	23.812918	2.995546
3	Arkansas	348.056147	751.988864	190.414655	724.683742	126.771269	135.902004	2014.167038	5.953229	23.812918	2.995546
4	California	245.376125	14947.073497	191.268909	16769.821826	189.783586	976.298441	2014.167038	5.953229	23.812918	2.995546

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date	year	month	week	weekday
20094	Alabama	339.65	1033	198.04	926	147.15	122	2013-12-27	2013	12	52	4
20859	Alabama	339.65	1033	198.04	926	147.15	122	2013-12-28	2013	12	52	5
21573	Alabama	339.75	1036	198.26	929	149.49	123	2013-12-29	2013	12	52	6
22287	Alabama	339.75	1036	198.81	930	149.49	123	2013-12-30	2013	12	1	0
22797	Alabama	339.42	1040	198.68	932	149.49	123	2013-12-31	2013	12	1	1

weekday	0	1	2	3	4	5	6
State
Alabama	339.556308	339.577656	339.559375	339.552500	339.577969	339.568254	339.541385
Alaska	291.463231	291.384687	291.390000	291.597500	291.506406	291.596032	291.438923
Arizona	300.612000	300.653906	300.643750	300.655000	300.751562	300.715873	300.642308
Arkansas	347.930462	348.011250	348.017656	347.934844	348.278281	348.235238	347.991077
California	245.348923	245.364375	245.359219	245.342813	245.434219	245.425556	245.359231
Colorado	238.948462	238.885938	238.889844	238.977187	238.885937	238.897619	238.944769
Connecticut	341.624615	341.682812	341.633750	341.616406	341.832656	341.819683	341.652308
Delaware	366.712769	366.748750	366.705937	366.659375	366.962344	366.937143	366.750615
District of Columbia	348.205385	348.055000	348.132656	348.145625	348.213594	348.277778	348.212462
Florida	302.548923	302.577031	302.529688	302.510781	302.638437	302.648095	302.541231
Georgia	326.588308	326.635937	326.594219	326.579687	326.805000	326.784762	326.633077
Hawaii	309.940615	309.895312	309.855156	309.818750	309.934062	310.033016	309.956462
Idaho	275.642615	275.627656	275.609844	275.653437	275.671250	275.654444	275.616923
Illinois	355.406769	355.460313	355.442031	355.427188	355.580625	355.502540	355.426000
Indiana	333.147077	333.184531	333.187344	333.145781	333.257031	333.253333	333.164769
Iowa	367.059692	367.086094	367.071563	367.055625	367.175781	367.173175	367.051077
Kansas	350.760615	350.785469	350.765938	350.778438	350.928906	350.903492	350.813846
Kentucky	338.906615	338.857812	338.846719	338.874062	338.796719	338.799048	338.902769
Louisiana	366.743385	366.816562	366.761094	366.733437	367.016406	366.960159	366.800615
Maine	315.272615	315.392812	315.384844	315.330156	315.570469	315.523333	315.262308
Maryland	370.884308	370.970469	370.912187	370.899219	371.160937	371.162063	370.912462
Massachusetts	348.942462	348.995625	348.968750	348.945625	349.156562	349.100952	348.974615
Michigan	292.022462	292.080312	292.044531	292.004844	292.229844	292.199841	292.055077
Minnesota	358.104154	358.228437	358.188594	358.052969	358.507969	358.471587	358.156308
Mississippi	337.535385	337.479062	337.495469	337.532656	337.615937	337.660635	337.597231
Missouri	359.098923	359.278594	359.273594	359.264844	359.418438	359.434921	359.174000
Montana	267.832615	267.860625	267.713125	267.700937	267.930781	267.934444	267.850923
Nebraska	341.441692	341.522187	341.490000	341.479375	341.768438	341.713968	341.504923
Nevada	267.794769	267.827969	267.823594	267.811875	267.866875	267.824286	267.792308
New Hampshire	359.469846	359.543906	359.519844	359.362656	359.595469	359.512222	359.331231
New Jersey	349.260769	349.313594	349.299219	349.272969	349.435469	349.396667	349.276000
New Mexico	289.536769	289.511719	289.499531	289.470156	289.670937	289.685714	289.565231
New York	346.877692	346.907031	346.866406	346.831875	347.027344	346.995079	346.886154
North Carolina	349.413846	349.476719	349.435625	349.406563	349.585156	349.565873	349.444308
North Dakota	398.349692	398.576562	398.677656	398.550000	399.090469	398.785079	398.658769
Ohio	334.526615	334.546562	334.536406	334.532969	334.615938	334.592063	334.533385
Oklahoma	361.417077	361.645625	361.615313	361.597656	361.753437	361.598730	361.389846
Oregon	208.461385	208.474844	208.454063	208.422344	208.604688	208.579048	208.479385
Pennsylvania	366.067231	366.102969	366.094375	366.060469	366.253125	366.222857	366.081846
Rhode Island	323.733077	323.828281	323.801094	323.756094	323.998594	323.951746	323.803538
South Carolina	341.612154	341.686094	341.656250	341.641094	341.837500	341.803651	341.617846
South Dakota	375.626615	375.828125	375.830937	375.837500	375.946875	376.010159	375.657692
Tennessee	352.678000	352.730469	352.711094	352.707344	352.871250	352.868730	352.707846
Texas	337.034154	337.069687	337.037969	337.023750	337.267656	337.235714	337.071231
Utah	289.552308	289.597500	289.586094	289.567344	289.634687	289.671905	289.564923
Vermont	374.172462	374.250312	374.204844	374.181250	374.395156	374.335556	374.216615
Virginia	368.084308	368.120781	368.112813	368.087969	368.264375	368.245238	368.116462
Washington	233.381231	233.377969	233.361875	233.363594	233.422500	233.426032	233.382308
West Virginia	357.408000	357.428906	357.368125	357.330156	357.395938	357.424444	357.406462
Wisconsin	352.981077	353.050313	353.040469	353.000938	353.160000	353.141905	352.996308
Wyoming	338.837538	339.090625	339.045781	338.932969	339.260469	339.072698	338.910769

	region	total_population	percent_white	percent_black	percent_asian	percent_hispanic	per_capita_income	median_rent	median_age
0	alabama	4799277	67	26	1	4	23680	501	38.1
1	alaska	720316	63	3	5	6	32651	978	33.6
2	arizona	6479703	57	4	3	30	25358	747	36.3
3	arkansas	2933369	74	15	1	7	22170	480	37.5
4	california	37659181	40	6	13	38	29527	1119	35.4