2. Refine the Data

"Data is messy"

Missing e.g. Check for missing or incomplete data
Quality e.g. Check for duplicates, accuracy, unusual data
Parse e.g. extract year and month from date
Convert e.g. free text to coded value
Derive e.g. gender from title
Calculate e.g. percentages, proportion
Remove e.g. remove redundant data
Merge e.g. first and surname for full name
Aggregate e.g. rollup by year, cluster by area
Filter e.g. exclude based on location
Sample e.g. extract a representative data
Summary e.g. show summary stats like mean



In [1]:

    
# Load the libraries
import numpy as np
import pandas as pd



In [2]:

    
# Load the data again!
df = pd.read_csv("data/Weed_Price.csv", parse_dates=[-1])

2.1 Missing Data

By “missing” data we simply mean null or “not present for whatever reason”. Lets see if we can find the missing data in our data set either because it exists and was not collected or it never existed



In [3]:

    
# Lets start the count to seeing about missing data
df.count()









    Out[3]:





State     22899
HighQ     22899
HighQN    22899
MedQ      22899
MedQN     22899
LowQ      12342
LowQN     22899
date      22899
dtype: int64



In [4]:

    
df['LowQ'].isnull()









    Out[4]:





0        False
1        False
2        False
3        False
4        False
5        False
6        False
7        False
8        False
9        False
10       False
11       False
12       False
13       False
14       False
15       False
16       False
17       False
18       False
19       False
20       False
21       False
22       False
23       False
24       False
25       False
26       False
27       False
28       False
29       False
         ...  
22869     True
22870     True
22871     True
22872     True
22873     True
22874     True
22875     True
22876     True
22877     True
22878     True
22879     True
22880     True
22881     True
22882     True
22883     True
22884     True
22885     True
22886     True
22887     True
22888     True
22889     True
22890     True
22891     True
22892     True
22893     True
22894     True
22895     True
22896     True
22897     True
22898     True
Name: LowQ, dtype: bool



In [5]:

    
# We can see the bottom rows which have NaN values
df.tail()









    Out[5]:






  
    
      
      State
      HighQ
      HighQN
      MedQ
      MedQN
      LowQ
      LowQN
      date
    
  
  
    
      22894
      Virginia
      364.98
      3513
      293.12
      3079
      NaN
      284
      2014-12-31
    
    
      22895
      Washington
      233.05
      3337
      189.92
      3562
      NaN
      160
      2014-12-31
    
    
      22896
      West Virginia
      359.35
      551
      224.03
      545
      NaN
      60
      2014-12-31
    
    
      22897
      Wisconsin
      350.52
      2244
      272.71
      2221
      NaN
      167
      2014-12-31
    
    
      22898
      Wyoming
      322.27
      131
      351.86
      197
      NaN
      12
      2014-12-31

Pandas will represent missing value by NaN

What can we do this with missing value?

Drop these rows / columns? Use .dropna(how='any')
Fill with a dummy value? Use .fillna(value=dummy)
Impute the cell with the most recent value? Use .fillna(method='ffill')
Interpolate the amount in a linear fashion? Use .interpolate()

We use the inplace = True operator to avoid making a copy of the dataframe and changing the dataframe itself



In [6]:

    
# Lets sort this data frame by State and Date
df.sort(columns=['State','date'], inplace=True)



In [7]:

    
# Lets fill the missing value with last available value
df.fillna(method = "ffill", inplace=True)



In [8]:

    
df.count()









    Out[8]:





State     22899
HighQ     22899
HighQN    22899
MedQ      22899
MedQN     22899
LowQ      22899
LowQN     22899
date      22899
dtype: int64

Exercise

Fill the missing value with a backward fill.



In [ ]:

Fill the missing values with the mean for the column.



In [ ]:

2.2 Quality of the Data

Lets check for completeness.

Say, do we have data on each date for all the 51 states?



In [10]:

    
df["State"].value_counts()









    Out[10]:





Utah                    449
Washington              449
Massachusetts           449
Missouri                449
Connecticut             449
Vermont                 449
Pennsylvania            449
Georgia                 449
California              449
North Carolina          449
Florida                 449
Oregon                  449
Alabama                 449
Indiana                 449
Kentucky                449
Hawaii                  449
South Carolina          449
Virginia                449
Oklahoma                449
Texas                   449
Illinois                449
Nebraska                449
Ohio                    449
Arizona                 449
Rhode Island            449
Louisiana               449
Wisconsin               449
New York                449
Arkansas                449
Montana                 449
New Hampshire           449
North Dakota            449
New Mexico              449
Delaware                449
Iowa                    449
New Jersey              449
Kansas                  449
Idaho                   449
Colorado                449
Maryland                449
District of Columbia    449
Michigan                449
South Dakota            449
West Virginia           449
Minnesota               449
Wyoming                 449
Mississippi             449
Tennessee               449
Nevada                  449
Maine                   449
Alaska                  449
dtype: int64

Lets check the dates and see if they are all continuous



In [58]:

    
df1 = df[df.State=='California'].copy()
df2 = df[df.State=='California'].copy()



In [59]:

    
df1.shape









    Out[59]:





(449, 8)



In [60]:

    
print "Earliest Date:",df1.date.min(), "\n", "Latest Date:",df1.date.max(), "\n",  \
"Number of days in between them:", df1.date.max() - df1.date.min()









    



Earliest Date: 2013-12-27 00:00:00 
Latest Date: 2015-06-11 00:00:00 
Number of days in between them: 531 days 00:00:00



In [61]:

    
df1.groupby(['date']).size()









    Out[61]:





date
2013-12-27    1
2013-12-28    1
2013-12-29    1
2013-12-30    1
2013-12-31    1
2014-01-01    1
2014-01-02    1
2014-01-03    1
2014-01-04    1
2014-01-05    1
2014-01-06    1
2014-01-07    1
2014-01-08    1
2014-01-09    1
2014-01-10    1
2014-01-11    1
2014-01-12    1
2014-01-13    1
2014-01-14    1
2014-01-15    1
2014-01-16    1
2014-01-17    1
2014-01-18    1
2014-01-19    1
2014-01-20    1
2014-01-21    1
2014-01-22    1
2014-01-23    1
2014-01-24    1
2014-01-25    1
             ..
2015-05-13    1
2015-05-14    1
2015-05-15    1
2015-05-16    1
2015-05-17    1
2015-05-18    1
2015-05-19    1
2015-05-20    1
2015-05-21    1
2015-05-22    1
2015-05-23    1
2015-05-24    1
2015-05-25    1
2015-05-26    1
2015-05-27    1
2015-05-28    1
2015-05-29    1
2015-05-30    1
2015-05-31    1
2015-06-01    1
2015-06-02    1
2015-06-03    1
2015-06-04    1
2015-06-05    1
2015-06-06    1
2015-06-07    1
2015-06-08    1
2015-06-09    1
2015-06-10    1
2015-06-11    1
dtype: int64



In [62]:

    
df1.set_index("date", inplace=True)
df2.set_index("date", inplace=True)



In [63]:

    
idx = pd.date_range(df1.index.min(), df1.index.max())



In [64]:

    
idx









    Out[64]:





DatetimeIndex(['2013-12-27', '2013-12-28', '2013-12-29', '2013-12-30',
               '2013-12-31', '2014-01-01', '2014-01-02', '2014-01-03',
               '2014-01-04', '2014-01-05', 
               ...
               '2015-06-02', '2015-06-03', '2015-06-04', '2015-06-05',
               '2015-06-06', '2015-06-07', '2015-06-08', '2015-06-09',
               '2015-06-10', '2015-06-11'],
              dtype='datetime64[ns]', length=532, freq='D', tz=None)



In [65]:

    
df1 = df1.reindex(idx, fill_value=0)



In [66]:

    
df1.shape









    Out[66]:





(532, 7)

Exercise Show the list of dates that were missing. Hint Leverage df2. Both df1 and df2 have date as index



In [ ]:

2.3 Parse the Data

Lets see if we can get the year, month, week and weekdays from the date. Pandas has got good built in functionality for timeseries data using the DatetimeIndex method



In [68]:

    
df['year'] = pd.DatetimeIndex(df['date']).year
df['month'] = pd.DatetimeIndex(df['date']).month
df['week'] = pd.DatetimeIndex(df['date']).week
df['weekday'] = pd.DatetimeIndex(df['date']).weekday



In [69]:

    
df.dtypes









    Out[69]:





State              object
HighQ             float64
HighQN              int64
MedQ              float64
MedQN               int64
LowQ              float64
LowQN               int64
date       datetime64[ns]
year                int32
month               int32
week                int32
weekday             int32
dtype: object



In [70]:

    
df.tail()



In [71]:

    
df['year'].value_counts()









    Out[71]:





2014    18564
2015     4080
2013      255
dtype: int64



In [72]:

    
df['month'].value_counts()









    Out[72]:





1     3162
5     2703
2     2244
6     2091
12    1836
10    1581
7     1581
3     1581
11    1530
9     1530
8     1530
4     1530
dtype: int64



In [73]:

    
df["weekday"].value_counts()









    Out[73]:





6    3315
0    3315
4    3264
3    3264
2    3264
1    3264
5    3213
dtype: int64

2.4 Aggregate the Data

To aggregate, we typically use the “group by” function, which involves the following steps

Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure



In [74]:

    
df_mean = df.groupby("State", as_index=False).mean()



In [75]:

    
df_mean.shape









    Out[75]:





(51, 11)



In [76]:

    
df_mean.head()









    Out[76]:






  
    
      
      State
      HighQ
      HighQN
      MedQ
      MedQN
      LowQ
      LowQN
      year
      month
      week
      weekday
    
  
  
    
      0
      Alabama
      339.561849
      1379.414254
      204.606169
      1270.351893
      145.978508
      161.149220
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      1
      Alaska
      291.482004
      321.244989
      262.046392
      407.917595
      394.653964
      32.334076
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      2
      Arizona
      300.667483
      2392.465479
      209.365345
      2137.414254
      188.500134
      279.006682
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      3
      Arkansas
      348.056147
      751.988864
      190.414655
      724.683742
      126.771269
      135.902004
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      4
      California
      245.376125
      14947.073497
      191.268909
      16769.821826
      189.783586
      976.298441
      2014.167038
      5.953229
      23.812918
      2.995546

Pivot Table



In [77]:

    
df.head()



In [78]:

    
pd.pivot_table(df, values='HighQ', index=['State'], columns=["weekday"] )









    Out[78]:






  
    
      weekday
      0
      1
      2
      3
      4
      5
      6
    
    
      State
      
      
      
      
      
      
      
    
  
  
    
      Alabama
      339.556308
      339.577656
      339.559375
      339.552500
      339.577969
      339.568254
      339.541385
    
    
      Alaska
      291.463231
      291.384687
      291.390000
      291.597500
      291.506406
      291.596032
      291.438923
    
    
      Arizona
      300.612000
      300.653906
      300.643750
      300.655000
      300.751562
      300.715873
      300.642308
    
    
      Arkansas
      347.930462
      348.011250
      348.017656
      347.934844
      348.278281
      348.235238
      347.991077
    
    
      California
      245.348923
      245.364375
      245.359219
      245.342813
      245.434219
      245.425556
      245.359231
    
    
      Colorado
      238.948462
      238.885938
      238.889844
      238.977187
      238.885937
      238.897619
      238.944769
    
    
      Connecticut
      341.624615
      341.682812
      341.633750
      341.616406
      341.832656
      341.819683
      341.652308
    
    
      Delaware
      366.712769
      366.748750
      366.705937
      366.659375
      366.962344
      366.937143
      366.750615
    
    
      District of Columbia
      348.205385
      348.055000
      348.132656
      348.145625
      348.213594
      348.277778
      348.212462
    
    
      Florida
      302.548923
      302.577031
      302.529688
      302.510781
      302.638437
      302.648095
      302.541231
    
    
      Georgia
      326.588308
      326.635937
      326.594219
      326.579687
      326.805000
      326.784762
      326.633077
    
    
      Hawaii
      309.940615
      309.895312
      309.855156
      309.818750
      309.934062
      310.033016
      309.956462
    
    
      Idaho
      275.642615
      275.627656
      275.609844
      275.653437
      275.671250
      275.654444
      275.616923
    
    
      Illinois
      355.406769
      355.460313
      355.442031
      355.427188
      355.580625
      355.502540
      355.426000
    
    
      Indiana
      333.147077
      333.184531
      333.187344
      333.145781
      333.257031
      333.253333
      333.164769
    
    
      Iowa
      367.059692
      367.086094
      367.071563
      367.055625
      367.175781
      367.173175
      367.051077
    
    
      Kansas
      350.760615
      350.785469
      350.765938
      350.778438
      350.928906
      350.903492
      350.813846
    
    
      Kentucky
      338.906615
      338.857812
      338.846719
      338.874062
      338.796719
      338.799048
      338.902769
    
    
      Louisiana
      366.743385
      366.816562
      366.761094
      366.733437
      367.016406
      366.960159
      366.800615
    
    
      Maine
      315.272615
      315.392812
      315.384844
      315.330156
      315.570469
      315.523333
      315.262308
    
    
      Maryland
      370.884308
      370.970469
      370.912187
      370.899219
      371.160937
      371.162063
      370.912462
    
    
      Massachusetts
      348.942462
      348.995625
      348.968750
      348.945625
      349.156562
      349.100952
      348.974615
    
    
      Michigan
      292.022462
      292.080312
      292.044531
      292.004844
      292.229844
      292.199841
      292.055077
    
    
      Minnesota
      358.104154
      358.228437
      358.188594
      358.052969
      358.507969
      358.471587
      358.156308
    
    
      Mississippi
      337.535385
      337.479062
      337.495469
      337.532656
      337.615937
      337.660635
      337.597231
    
    
      Missouri
      359.098923
      359.278594
      359.273594
      359.264844
      359.418438
      359.434921
      359.174000
    
    
      Montana
      267.832615
      267.860625
      267.713125
      267.700937
      267.930781
      267.934444
      267.850923
    
    
      Nebraska
      341.441692
      341.522187
      341.490000
      341.479375
      341.768438
      341.713968
      341.504923
    
    
      Nevada
      267.794769
      267.827969
      267.823594
      267.811875
      267.866875
      267.824286
      267.792308
    
    
      New Hampshire
      359.469846
      359.543906
      359.519844
      359.362656
      359.595469
      359.512222
      359.331231
    
    
      New Jersey
      349.260769
      349.313594
      349.299219
      349.272969
      349.435469
      349.396667
      349.276000
    
    
      New Mexico
      289.536769
      289.511719
      289.499531
      289.470156
      289.670937
      289.685714
      289.565231
    
    
      New York
      346.877692
      346.907031
      346.866406
      346.831875
      347.027344
      346.995079
      346.886154
    
    
      North Carolina
      349.413846
      349.476719
      349.435625
      349.406563
      349.585156
      349.565873
      349.444308
    
    
      North Dakota
      398.349692
      398.576562
      398.677656
      398.550000
      399.090469
      398.785079
      398.658769
    
    
      Ohio
      334.526615
      334.546562
      334.536406
      334.532969
      334.615938
      334.592063
      334.533385
    
    
      Oklahoma
      361.417077
      361.645625
      361.615313
      361.597656
      361.753437
      361.598730
      361.389846
    
    
      Oregon
      208.461385
      208.474844
      208.454063
      208.422344
      208.604688
      208.579048
      208.479385
    
    
      Pennsylvania
      366.067231
      366.102969
      366.094375
      366.060469
      366.253125
      366.222857
      366.081846
    
    
      Rhode Island
      323.733077
      323.828281
      323.801094
      323.756094
      323.998594
      323.951746
      323.803538
    
    
      South Carolina
      341.612154
      341.686094
      341.656250
      341.641094
      341.837500
      341.803651
      341.617846
    
    
      South Dakota
      375.626615
      375.828125
      375.830937
      375.837500
      375.946875
      376.010159
      375.657692
    
    
      Tennessee
      352.678000
      352.730469
      352.711094
      352.707344
      352.871250
      352.868730
      352.707846
    
    
      Texas
      337.034154
      337.069687
      337.037969
      337.023750
      337.267656
      337.235714
      337.071231
    
    
      Utah
      289.552308
      289.597500
      289.586094
      289.567344
      289.634687
      289.671905
      289.564923
    
    
      Vermont
      374.172462
      374.250312
      374.204844
      374.181250
      374.395156
      374.335556
      374.216615
    
    
      Virginia
      368.084308
      368.120781
      368.112813
      368.087969
      368.264375
      368.245238
      368.116462
    
    
      Washington
      233.381231
      233.377969
      233.361875
      233.363594
      233.422500
      233.426032
      233.382308
    
    
      West Virginia
      357.408000
      357.428906
      357.368125
      357.330156
      357.395938
      357.424444
      357.406462
    
    
      Wisconsin
      352.981077
      353.050313
      353.040469
      353.000938
      353.160000
      353.141905
      352.996308
    
    
      Wyoming
      338.837538
      339.090625
      339.045781
      338.932969
      339.260469
      339.072698
      338.910769

Exercise Get a cross tabulation: for each state, for each month, get the prices for the weekday as shown in the output



In [79]:









    Out[79]:






  
    
      
      weekday
      0
      1
      2
      3
      4
      5
      6
    
    
      State
      month
      
      
      
      
      
      
      
    
  
  
    
      Alabama
      1
      338.018750
      337.968750
      338.152222
      338.042000
      338.020000
      337.876667
      338.090000
    
    
      2
      338.144286
      338.576667
      338.561667
      338.630000
      338.690000
      338.535000
      338.027143
    
    
      3
      340.758000
      340.737500
      340.857500
      340.747500
      340.665000
      340.722000
      340.750000
    
    
      4
      341.000000
      340.930000
      340.962000
      341.167500
      341.105000
      341.127500
      340.970000
    
    
      5
      339.645714
      339.595714
      339.535714
      339.713750
      339.685000
      339.628750
      339.360000
    
    
      6
      339.844286
      339.711667
      339.755000
      339.733333
      340.284000
      340.210000
      340.235000
    
    
      7
      340.630000
      340.650000
      340.586000
      340.498000
      340.537500
      340.580000
      340.545000
    
    
      8
      340.552500
      340.507500
      340.470000
      340.440000
      340.526000
      340.525000
      340.592000
    
    
      9
      340.434000
      340.434000
      340.430000
      340.517500
      340.455000
      340.447500
      340.382500
    
    
      10
      339.027500
      338.820000
      339.032000
      339.048000
      339.026000
      339.005000
      339.067500
    
    
      11
      339.422500
      339.370000
      339.360000
      339.280000
      339.197500
      339.262000
      339.350000
    
    
      12
      339.180000
      339.085000
      338.882000
      339.177500
      339.230000
      339.192000
      339.134000
    
    
      Alaska
      1
      296.395000
      296.257500
      295.400000
      296.391000
      296.365000
      297.210000
      296.402500
    
    
      2
      292.368571
      291.665000
      291.706667
      291.706667
      291.431667
      292.160000
      292.424286
    
    
      3
      286.508000
      286.417500
      286.417500
      286.417500
      286.160000
      286.284000
      286.518000
    
    
      4
      288.577500
      288.380000
      288.186000
      288.260000
      288.260000
      288.310000
      288.092500
    
    
      5
      289.794286
      289.687143
      289.611429
      289.386250
      289.487500
      289.311250
      290.205000
    
    
      6
      289.178571
      289.381667
      289.380000
      289.386667
      288.578000
      288.704000
      288.268333
    
    
      7
      287.992500
      288.152000
      288.492000
      288.316000
      288.310000
      288.280000
      288.545000
    
    
      8
      289.442500
      289.442500
      289.470000
      289.470000
      289.244000
      289.782500
      289.618000
    
    
      9
      291.356000
      291.356000
      291.342500
      291.365000
      291.365000
      291.365000
      291.365000
    
    
      10
      291.535000
      291.535000
      291.510000
      291.364000
      291.530000
      291.710000
      291.747500
    
    
      11
      290.967500
      290.967500
      290.967500
      290.967500
      291.052500
      290.914000
      290.700000
    
    
      12
      298.530000
      298.596667
      300.566000
      303.090000
      300.184000
      300.182000
      300.182000
    
    
      Arizona
      1
      300.628750
      300.573750
      300.885556
      300.643000
      300.635000
      300.286667
      300.636250
    
    
      2
      300.507143
      301.148333
      301.195000
      301.123333
      301.108333
      301.058333
      300.518571
    
    
      3
      302.794000
      302.857500
      302.877500
      302.860000
      302.825000
      302.862000
      302.806000
    
    
      4
      301.522500
      301.706000
      301.632000
      301.637500
      301.680000
      301.537500
      301.522500
    
    
      5
      299.955714
      299.952857
      299.924286
      300.307500
      300.270000
      300.236250
      299.550000
    
    
      6
      300.270000
      300.210000
      300.216667
      300.213333
      300.928000
      300.974000
      301.183333
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      West Virginia
      7
      354.302500
      354.268000
      354.078000
      353.986000
      353.955000
      353.980000
      354.125000
    
    
      8
      354.385000
      354.452500
      354.452500
      354.480000
      354.428000
      354.337500
      354.366000
    
    
      9
      355.302000
      355.352000
      355.325000
      355.425000
      355.530000
      355.530000
      355.467500
    
    
      10
      354.615000
      354.655000
      354.816000
      354.816000
      354.716000
      354.580000
      354.512500
    
    
      11
      357.317500
      357.315000
      357.272500
      357.360000
      357.477500
      357.046000
      357.536000
    
    
      12
      359.270000
      359.333333
      358.916000
      358.862500
      359.370000
      359.430000
      359.398000
    
    
      Wisconsin
      1
      354.043750
      354.036250
      354.462222
      354.029000
      353.958000
      353.483333
      354.077500
    
    
      2
      353.748571
      354.648333
      354.675000
      354.711667
      354.625000
      354.680000
      353.748571
    
    
      3
      356.736000
      356.827500
      356.727500
      356.697500
      356.697500
      356.766000
      356.732000
    
    
      4
      355.545000
      355.646000
      355.600000
      355.630000
      355.612500
      355.565000
      355.590000
    
    
      5
      350.808571
      350.774286
      350.755714
      351.315000
      351.298750
      351.288750
      350.093750
    
    
      6
      351.417143
      351.041667
      351.015000
      351.018333
      352.176000
      352.116000
      352.513333
    
    
      7
      353.337500
      353.386000
      353.360000
      353.322000
      353.305000
      353.392500
      353.422500
    
    
      8
      353.027500
      353.045000
      353.047500
      352.940000
      352.990000
      352.935000
      352.966000
    
    
      9
      352.508000
      352.502000
      352.545000
      352.380000
      352.380000
      352.427500
      352.430000
    
    
      10
      352.237500
      352.220000
      352.216000
      352.204000
      352.180000
      352.212500
      352.200000
    
    
      11
      351.332500
      351.362500
      351.405000
      351.325000
      351.277500
      351.374000
      351.242000
    
    
      12
      351.910000
      351.980000
      350.844000
      350.837500
      352.140000
      352.158000
      352.160000
    
    
      Wyoming
      1
      337.210000
      337.210000
      339.286667
      337.397000
      337.210000
      335.551111
      337.210000
    
    
      2
      340.128571
      343.103333
      343.103333
      342.720000
      342.820000
      343.090000
      340.117143
    
    
      3
      354.232000
      354.242500
      354.242500
      354.242500
      354.242500
      354.430000
      354.640000
    
    
      4
      353.935000
      353.968000
      353.968000
      354.972500
      354.972500
      353.935000
      353.935000
    
    
      5
      335.088571
      335.198571
      335.071429
      336.931250
      336.931250
      336.931250
      332.307500
    
    
      6
      340.477143
      338.593333
      338.593333
      338.740000
      343.744000
      343.850000
      344.995000
    
    
      7
      350.922500
      351.094000
      351.046000
      350.786000
      351.060000
      351.060000
      351.060000
    
    
      8
      344.870000
      344.870000
      344.870000
      344.870000
      341.978000
      340.050000
      341.978000
    
    
      9
      328.902000
      328.902000
      328.992500
      328.992500
      328.992500
      328.525000
      328.525000
    
    
      10
      329.297500
      329.297500
      329.140000
      328.616000
      328.616000
      328.510000
      329.297500
    
    
      11
      326.605000
      326.605000
      326.757500
      326.757500
      326.757500
      327.214000
      327.214000
    
    
      12
      329.785000
      329.611667
      323.890000
      324.295000
      330.482000
      330.368000
      330.368000
    
  

612 rows × 7 columns

2.5 Derive the Data

Lets us load the demographic dataset and create a new column for others in the population



In [81]:

    
df_demo = pd.read_csv("data/Demographics_State.csv")



In [82]:

    
df_demo.head()









    Out[82]:






  
    
      
      region
      total_population
      percent_white
      percent_black
      percent_asian
      percent_hispanic
      per_capita_income
      median_rent
      median_age
    
  
  
    
      0
      alabama
      4799277
      67
      26
      1
      4
      23680
      501
      38.1
    
    
      1
      alaska
      720316
      63
      3
      5
      6
      32651
      978
      33.6
    
    
      2
      arizona
      6479703
      57
      4
      3
      30
      25358
      747
      36.3
    
    
      3
      arkansas
      2933369
      74
      15
      1
      7
      22170
      480
      37.5
    
    
      4
      california
      37659181
      40
      6
      13
      38
      29527
      1119
      35.4



In [83]:

    
df_demo["percent_other"] = 100 - df_demo["percent_white"] - df_demo["percent_black"] - df_demo["percent_asian"] - df_demo["percent_hispanic"]



In [84]:

    
df_demo.head()









    Out[84]:






  
    
      
      region
      total_population
      percent_white
      percent_black
      percent_asian
      percent_hispanic
      per_capita_income
      median_rent
      median_age
      percent_other
    
  
  
    
      0
      alabama
      4799277
      67
      26
      1
      4
      23680
      501
      38.1
      2
    
    
      1
      alaska
      720316
      63
      3
      5
      6
      32651
      978
      33.6
      23
    
    
      2
      arizona
      6479703
      57
      4
      3
      30
      25358
      747
      36.3
      6
    
    
      3
      arkansas
      2933369
      74
      15
      1
      7
      22170
      480
      37.5
      3
    
    
      4
      california
      37659181
      40
      6
      13
      38
      29527
      1119
      35.4
      3

Exercise Express median rent as a proportion of california's median rent (Compute it as a new column)



In [ ]:

2.6 Merge the Data

Lets merge the demographic dataset with the price dataset



In [23]:

    
# Let us change the column name region to State
df_demo = df_demo.rename(columns={'region':'State'})



In [24]:

    
df_demo.head()









    Out[24]:






  
    
      
      State
      total_population
      percent_white
      percent_black
      percent_asian
      percent_hispanic
      per_capita_income
      median_rent
      median_age
      percent_other
    
  
  
    
      0
      alabama
      4799277
      67
      26
      1
      4
      23680
      501
      38.1
      2
    
    
      1
      alaska
      720316
      63
      3
      5
      6
      32651
      978
      33.6
      23
    
    
      2
      arizona
      6479703
      57
      4
      3
      30
      25358
      747
      36.3
      6
    
    
      3
      arkansas
      2933369
      74
      15
      1
      7
      22170
      480
      37.5
      3
    
    
      4
      california
      37659181
      40
      6
      13
      38
      29527
      1119
      35.4
      3



In [26]:

    
# We can now merge Demographic and Price mean data into one single data frame
df_merge = pd.merge(df_mean, df_demo, how='inner', on='State')
df_merge.head()









    Out[26]:






  
    
      
      State
      HighQ
      HighQN
      MedQ
      MedQN
      LowQ
      LowQN
      year
      month
      week
      weekday
      total_population
      percent_white
      percent_black
      percent_asian
      percent_hispanic
      per_capita_income
      median_rent
      median_age
      percent_other

What happened? Why is there no data in the dataframe?



In [27]:

    
# Change the State in df_mean to lower case
df_mean['State'] = df_mean['State'].str.lower()



In [28]:

    
df_mean.head()









    Out[28]:






  
    
      
      State
      HighQ
      HighQN
      MedQ
      MedQN
      LowQ
      LowQN
      year
      month
      week
      weekday
    
  
  
    
      0
      alabama
      339.561849
      1379.414254
      204.606169
      1270.351893
      145.978508
      161.149220
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      1
      alaska
      291.482004
      321.244989
      262.046392
      407.917595
      394.653964
      32.334076
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      2
      arizona
      300.667483
      2392.465479
      209.365345
      2137.414254
      188.500134
      279.006682
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      3
      arkansas
      348.056147
      751.988864
      190.414655
      724.683742
      126.771269
      135.902004
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      4
      california
      245.376125
      14947.073497
      191.268909
      16769.821826
      189.783586
      976.298441
      2014.167038
      5.953229
      23.812918
      2.995546



In [29]:

    
# We can now merge Demographic and Price mean data into one single data frame
df_merge = pd.merge(df_mean, df_demo, how='inner', on='State')



In [30]:

    
df_merge.head()









    Out[30]:






  
    
      
      State
      HighQ
      HighQN
      MedQ
      MedQN
      LowQ
      LowQN
      year
      month
      week
      weekday
      total_population
      percent_white
      percent_black
      percent_asian
      percent_hispanic
      per_capita_income
      median_rent
      median_age
      percent_other
    
  
  
    
      0
      alabama
      339.561849
      1379.414254
      204.606169
      1270.351893
      145.978508
      161.149220
      2014.167038
      5.953229
      23.812918
      2.995546
      4799277
      67
      26
      1
      4
      23680
      501
      38.1
      2
    
    
      1
      alaska
      291.482004
      321.244989
      262.046392
      407.917595
      394.653964
      32.334076
      2014.167038
      5.953229
      23.812918
      2.995546
      720316
      63
      3
      5
      6
      32651
      978
      33.6
      23
    
    
      2
      arizona
      300.667483
      2392.465479
      209.365345
      2137.414254
      188.500134
      279.006682
      2014.167038
      5.953229
      23.812918
      2.995546
      6479703
      57
      4
      3
      30
      25358
      747
      36.3
      6
    
    
      3
      arkansas
      348.056147
      751.988864
      190.414655
      724.683742
      126.771269
      135.902004
      2014.167038
      5.953229
      23.812918
      2.995546
      2933369
      74
      15
      1
      7
      22170
      480
      37.5
      3
    
    
      4
      california
      245.376125
      14947.073497
      191.268909
      16769.821826
      189.783586
      976.298441
      2014.167038
      5.953229
      23.812918
      2.995546
      37659181
      40
      6
      13
      38
      29527
      1119
      35.4
      3

2.7 Filter the Data

Lets start by filtering the data

by location
by Year
by location & Year



In [31]:

    
# Filter data for location California
df_cal = df[df["State"] == "California"]



In [32]:

    
df_cal.shape









    Out[32]:





(449, 12)



In [33]:

    
# Filter data for year
df_2014 = df[df["year"] == 2014]



In [34]:

    
df_2014.shape









    Out[34]:





(18564, 12)



In [35]:

    
df_cal_2014 = df[(df["year"] == 2014) & (df["State"] == "California")]



In [36]:

    
df_cal_2014.shape









    Out[36]:





(364, 12)



In [37]:

    
df_cal_2014.head()

Exercise Find the % of hispanic population for the state with max white population (use df_demo)



In [ ]:

2.8 Summarise the Data

We can use the describe function to get the summary stats for each column in the data frame



In [38]:

    
df.describe()









    Out[38]:






  
    
      
      HighQ
      HighQN
      MedQ
      MedQN
      LowQ
      LowQN
      year
      month
      week
      weekday
    
  
  
    
      count
      22899.000000
      22899.000000
      22899.000000
      22899.000000
      22899.000000
      22899.000000
      22899.000000
      22899.000000
      22899.000000
      22899.000000
    
    
      mean
      329.759854
      2274.743657
      247.618306
      2183.737805
      203.624092
      202.804489
      2014.167038
      5.953229
      23.812918
      2.995546
    
    
      std
      41.173167
      2641.936586
      44.276015
      2789.902626
      101.484265
      220.531987
      0.401765
      3.553055
      15.426018
      2.005599
    
    
      min
      202.020000
      93.000000
      144.850000
      134.000000
      63.700000
      11.000000
      2013.000000
      1.000000
      1.000000
      0.000000
    
    
      25%
      303.780000
      597.000000
      215.775000
      548.000000
      145.810000
      51.000000
      2014.000000
      3.000000
      9.000000
      1.000000
    
    
      50%
      342.310000
      1420.000000
      245.800000
      1320.000000
      185.780000
      139.000000
      2014.000000
      6.000000
      22.000000
      3.000000
    
    
      75%
      356.550000
      2958.000000
      274.155000
      2673.000000
      222.940000
      263.000000
      2014.000000
      9.000000
      37.000000
      5.000000
    
    
      max
      415.700000
      18492.000000
      379.000000
      22027.000000
      734.650000
      1287.000000
      2015.000000
      12.000000
      52.000000
      6.000000

We can also use convenience functions like sum(), count(), mean() etc. to calculate these



In [39]:

    
df.HighQ.mean()









    Out[39]:





329.75985414210226



In [40]:

    
# Lets do this the hard way
df.HighQ.sum()









    Out[40]:





7551170.8999999994



In [41]:

    
df.HighQ.count()









    Out[41]:





22899



In [42]:

    
df.HighQ.sum()/df.HighQ.count()









    Out[42]:





329.75985414210226



In [43]:

    
df.HighQ.median()









    Out[43]:





342.31

2.9 Sample the Data



In [46]:

    
?df.sample



In [51]:

    
df_ca_sample = df[df.State=='California'].sample(n = 50, replace = True, random_state=123)



In [57]:

    
df_ca_sample.duplicated()









    Out[57]:





20914    False
10051    False
11428    False
2503     False
11275    False
9235     False
14590    False
8725     False
21118    False
15967    False
21322    False
7399     False
1024     False
13927    False
1075     False
8572     False
7042     False
20200    False
17599    False
6583     False
12499    False
6991     False
1432     False
20047    False
5104     False
10357    False
1024      True
16987    False
3217     False
21475    False
2911     False
21577    False
2197     False
2350     False
11428     True
15304    False
8572      True
5716     False
15508    False
7960     False
17956    False
3268     False
8521     False
8419     False
18364    False
15814    False
12244    False
20608    False
17497    False
9949     False
dtype: bool



In [58]:

    
df_ca_sample.loc[8572]

2.10 Quirks in Pandas



In [69]:

    
df_ca_sample.iat[0, 0] = "Cal"



In [71]:

    
df_ca_sample.head()

But changes when you copy



In [74]:

    
df_ca_sample2 = df_ca_sample



In [77]:

    
df_ca_sample2.iat[0, 0] = "CA"
df_ca_sample2.head()



In [78]:

    
df_ca_sample.head()

Fix the issue



In [79]:

    
df_ca_sample3 = df_ca_sample2.copy()



In [80]:

    
df_ca_sample3.head()



In [81]:

    
df_ca_sample3.iat[0, 0] = "CALIFORNIA"
df_ca_sample3.head()



In [82]:

    
df_ca_sample2.head()



In [ ]:

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date
22894	Virginia	364.98	3513	293.12	3079	NaN	284	2014-12-31
22895	Washington	233.05	3337	189.92	3562	NaN	160	2014-12-31
22896	West Virginia	359.35	551	224.03	545	NaN	60	2014-12-31
22897	Wisconsin	350.52	2244	272.71	2221	NaN	167	2014-12-31
22898	Wyoming	322.27	131	351.86	197	NaN	12	2014-12-31

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date	year	month	week	weekday
4997	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-07	2015	6	23	6
5762	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-08	2015	6	24	0
6527	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-09	2015	6	24	1
7343	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-10	2015	6	24	2
8159	Wyoming	313.72	148	317.38	226	161.3	13	2015-06-11	2015	6	24	3

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	year	month	week	weekday
0	Alabama	339.561849	1379.414254	204.606169	1270.351893	145.978508	161.149220	2014.167038	5.953229	23.812918	2.995546
1	Alaska	291.482004	321.244989	262.046392	407.917595	394.653964	32.334076	2014.167038	5.953229	23.812918	2.995546
2	Arizona	300.667483	2392.465479	209.365345	2137.414254	188.500134	279.006682	2014.167038	5.953229	23.812918	2.995546
3	Arkansas	348.056147	751.988864	190.414655	724.683742	126.771269	135.902004	2014.167038	5.953229	23.812918	2.995546
4	California	245.376125	14947.073497	191.268909	16769.821826	189.783586	976.298441	2014.167038	5.953229	23.812918	2.995546

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date	year	month	week	weekday
20094	Alabama	339.65	1033	198.04	926	147.15	122	2013-12-27	2013	12	52	4
20859	Alabama	339.65	1033	198.04	926	147.15	122	2013-12-28	2013	12	52	5
21573	Alabama	339.75	1036	198.26	929	149.49	123	2013-12-29	2013	12	52	6
22287	Alabama	339.75	1036	198.81	930	149.49	123	2013-12-30	2013	12	1	0
22797	Alabama	339.42	1040	198.68	932	149.49	123	2013-12-31	2013	12	1	1

weekday	0	1	2	3	4	5	6
State
Alabama	339.556308	339.577656	339.559375	339.552500	339.577969	339.568254	339.541385
Alaska	291.463231	291.384687	291.390000	291.597500	291.506406	291.596032	291.438923
Arizona	300.612000	300.653906	300.643750	300.655000	300.751562	300.715873	300.642308
Arkansas	347.930462	348.011250	348.017656	347.934844	348.278281	348.235238	347.991077
California	245.348923	245.364375	245.359219	245.342813	245.434219	245.425556	245.359231
Colorado	238.948462	238.885938	238.889844	238.977187	238.885937	238.897619	238.944769
Connecticut	341.624615	341.682812	341.633750	341.616406	341.832656	341.819683	341.652308
Delaware	366.712769	366.748750	366.705937	366.659375	366.962344	366.937143	366.750615
District of Columbia	348.205385	348.055000	348.132656	348.145625	348.213594	348.277778	348.212462
Florida	302.548923	302.577031	302.529688	302.510781	302.638437	302.648095	302.541231
Georgia	326.588308	326.635937	326.594219	326.579687	326.805000	326.784762	326.633077
Hawaii	309.940615	309.895312	309.855156	309.818750	309.934062	310.033016	309.956462
Idaho	275.642615	275.627656	275.609844	275.653437	275.671250	275.654444	275.616923
Illinois	355.406769	355.460313	355.442031	355.427188	355.580625	355.502540	355.426000
Indiana	333.147077	333.184531	333.187344	333.145781	333.257031	333.253333	333.164769
Iowa	367.059692	367.086094	367.071563	367.055625	367.175781	367.173175	367.051077
Kansas	350.760615	350.785469	350.765938	350.778438	350.928906	350.903492	350.813846
Kentucky	338.906615	338.857812	338.846719	338.874062	338.796719	338.799048	338.902769
Louisiana	366.743385	366.816562	366.761094	366.733437	367.016406	366.960159	366.800615
Maine	315.272615	315.392812	315.384844	315.330156	315.570469	315.523333	315.262308
Maryland	370.884308	370.970469	370.912187	370.899219	371.160937	371.162063	370.912462
Massachusetts	348.942462	348.995625	348.968750	348.945625	349.156562	349.100952	348.974615
Michigan	292.022462	292.080312	292.044531	292.004844	292.229844	292.199841	292.055077
Minnesota	358.104154	358.228437	358.188594	358.052969	358.507969	358.471587	358.156308
Mississippi	337.535385	337.479062	337.495469	337.532656	337.615937	337.660635	337.597231
Missouri	359.098923	359.278594	359.273594	359.264844	359.418438	359.434921	359.174000
Montana	267.832615	267.860625	267.713125	267.700937	267.930781	267.934444	267.850923
Nebraska	341.441692	341.522187	341.490000	341.479375	341.768438	341.713968	341.504923
Nevada	267.794769	267.827969	267.823594	267.811875	267.866875	267.824286	267.792308
New Hampshire	359.469846	359.543906	359.519844	359.362656	359.595469	359.512222	359.331231
New Jersey	349.260769	349.313594	349.299219	349.272969	349.435469	349.396667	349.276000
New Mexico	289.536769	289.511719	289.499531	289.470156	289.670937	289.685714	289.565231
New York	346.877692	346.907031	346.866406	346.831875	347.027344	346.995079	346.886154
North Carolina	349.413846	349.476719	349.435625	349.406563	349.585156	349.565873	349.444308
North Dakota	398.349692	398.576562	398.677656	398.550000	399.090469	398.785079	398.658769
Ohio	334.526615	334.546562	334.536406	334.532969	334.615938	334.592063	334.533385
Oklahoma	361.417077	361.645625	361.615313	361.597656	361.753437	361.598730	361.389846
Oregon	208.461385	208.474844	208.454063	208.422344	208.604688	208.579048	208.479385
Pennsylvania	366.067231	366.102969	366.094375	366.060469	366.253125	366.222857	366.081846
Rhode Island	323.733077	323.828281	323.801094	323.756094	323.998594	323.951746	323.803538
South Carolina	341.612154	341.686094	341.656250	341.641094	341.837500	341.803651	341.617846
South Dakota	375.626615	375.828125	375.830937	375.837500	375.946875	376.010159	375.657692
Tennessee	352.678000	352.730469	352.711094	352.707344	352.871250	352.868730	352.707846
Texas	337.034154	337.069687	337.037969	337.023750	337.267656	337.235714	337.071231
Utah	289.552308	289.597500	289.586094	289.567344	289.634687	289.671905	289.564923
Vermont	374.172462	374.250312	374.204844	374.181250	374.395156	374.335556	374.216615
Virginia	368.084308	368.120781	368.112813	368.087969	368.264375	368.245238	368.116462
Washington	233.381231	233.377969	233.361875	233.363594	233.422500	233.426032	233.382308
West Virginia	357.408000	357.428906	357.368125	357.330156	357.395938	357.424444	357.406462
Wisconsin	352.981077	353.050313	353.040469	353.000938	353.160000	353.141905	352.996308
Wyoming	338.837538	339.090625	339.045781	338.932969	339.260469	339.072698	338.910769

	weekday	0	1	2	3	4	5	6
State	month
Alabama	1	338.018750	337.968750	338.152222	338.042000	338.020000	337.876667	338.090000
	2	338.144286	338.576667	338.561667	338.630000	338.690000	338.535000	338.027143
	3	340.758000	340.737500	340.857500	340.747500	340.665000	340.722000	340.750000
	4	341.000000	340.930000	340.962000	341.167500	341.105000	341.127500	340.970000
	5	339.645714	339.595714	339.535714	339.713750	339.685000	339.628750	339.360000
	6	339.844286	339.711667	339.755000	339.733333	340.284000	340.210000	340.235000
	7	340.630000	340.650000	340.586000	340.498000	340.537500	340.580000	340.545000
	8	340.552500	340.507500	340.470000	340.440000	340.526000	340.525000	340.592000
	9	340.434000	340.434000	340.430000	340.517500	340.455000	340.447500	340.382500
	10	339.027500	338.820000	339.032000	339.048000	339.026000	339.005000	339.067500
	11	339.422500	339.370000	339.360000	339.280000	339.197500	339.262000	339.350000
	12	339.180000	339.085000	338.882000	339.177500	339.230000	339.192000	339.134000
Alaska	1	296.395000	296.257500	295.400000	296.391000	296.365000	297.210000	296.402500
	2	292.368571	291.665000	291.706667	291.706667	291.431667	292.160000	292.424286
	3	286.508000	286.417500	286.417500	286.417500	286.160000	286.284000	286.518000
	4	288.577500	288.380000	288.186000	288.260000	288.260000	288.310000	288.092500
	5	289.794286	289.687143	289.611429	289.386250	289.487500	289.311250	290.205000
	6	289.178571	289.381667	289.380000	289.386667	288.578000	288.704000	288.268333
	7	287.992500	288.152000	288.492000	288.316000	288.310000	288.280000	288.545000
	8	289.442500	289.442500	289.470000	289.470000	289.244000	289.782500	289.618000
	9	291.356000	291.356000	291.342500	291.365000	291.365000	291.365000	291.365000
	10	291.535000	291.535000	291.510000	291.364000	291.530000	291.710000	291.747500
	11	290.967500	290.967500	290.967500	290.967500	291.052500	290.914000	290.700000
	12	298.530000	298.596667	300.566000	303.090000	300.184000	300.182000	300.182000
Arizona	1	300.628750	300.573750	300.885556	300.643000	300.635000	300.286667	300.636250
	2	300.507143	301.148333	301.195000	301.123333	301.108333	301.058333	300.518571
	3	302.794000	302.857500	302.877500	302.860000	302.825000	302.862000	302.806000
	4	301.522500	301.706000	301.632000	301.637500	301.680000	301.537500	301.522500
	5	299.955714	299.952857	299.924286	300.307500	300.270000	300.236250	299.550000
	6	300.270000	300.210000	300.216667	300.213333	300.928000	300.974000	301.183333
...	...	...	...	...	...	...	...	...
West Virginia	7	354.302500	354.268000	354.078000	353.986000	353.955000	353.980000	354.125000
	8	354.385000	354.452500	354.452500	354.480000	354.428000	354.337500	354.366000
	9	355.302000	355.352000	355.325000	355.425000	355.530000	355.530000	355.467500
	10	354.615000	354.655000	354.816000	354.816000	354.716000	354.580000	354.512500
	11	357.317500	357.315000	357.272500	357.360000	357.477500	357.046000	357.536000
	12	359.270000	359.333333	358.916000	358.862500	359.370000	359.430000	359.398000
Wisconsin	1	354.043750	354.036250	354.462222	354.029000	353.958000	353.483333	354.077500
	2	353.748571	354.648333	354.675000	354.711667	354.625000	354.680000	353.748571
	3	356.736000	356.827500	356.727500	356.697500	356.697500	356.766000	356.732000
	4	355.545000	355.646000	355.600000	355.630000	355.612500	355.565000	355.590000
	5	350.808571	350.774286	350.755714	351.315000	351.298750	351.288750	350.093750
	6	351.417143	351.041667	351.015000	351.018333	352.176000	352.116000	352.513333
	7	353.337500	353.386000	353.360000	353.322000	353.305000	353.392500	353.422500
	8	353.027500	353.045000	353.047500	352.940000	352.990000	352.935000	352.966000
	9	352.508000	352.502000	352.545000	352.380000	352.380000	352.427500	352.430000
	10	352.237500	352.220000	352.216000	352.204000	352.180000	352.212500	352.200000
	11	351.332500	351.362500	351.405000	351.325000	351.277500	351.374000	351.242000
	12	351.910000	351.980000	350.844000	350.837500	352.140000	352.158000	352.160000
Wyoming	1	337.210000	337.210000	339.286667	337.397000	337.210000	335.551111	337.210000
	2	340.128571	343.103333	343.103333	342.720000	342.820000	343.090000	340.117143
	3	354.232000	354.242500	354.242500	354.242500	354.242500	354.430000	354.640000
	4	353.935000	353.968000	353.968000	354.972500	354.972500	353.935000	353.935000
	5	335.088571	335.198571	335.071429	336.931250	336.931250	336.931250	332.307500
	6	340.477143	338.593333	338.593333	338.740000	343.744000	343.850000	344.995000
	7	350.922500	351.094000	351.046000	350.786000	351.060000	351.060000	351.060000
	8	344.870000	344.870000	344.870000	344.870000	341.978000	340.050000	341.978000
	9	328.902000	328.902000	328.992500	328.992500	328.992500	328.525000	328.525000
	10	329.297500	329.297500	329.140000	328.616000	328.616000	328.510000	329.297500
	11	326.605000	326.605000	326.757500	326.757500	326.757500	327.214000	327.214000
	12	329.785000	329.611667	323.890000	324.295000	330.482000	330.368000	330.368000

	region	total_population	percent_white	percent_black	percent_asian	percent_hispanic	per_capita_income	median_rent	median_age
0	alabama	4799277	67	26	1	4	23680	501	38.1
1	alaska	720316	63	3	5	6	32651	978	33.6
2	arizona	6479703	57	4	3	30	25358	747	36.3
3	arkansas	2933369	74	15	1	7	22170	480	37.5
4	california	37659181	40	6	13	38	29527	1119	35.4

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date	year	month	week	weekday
4	California	248.78	12096	193.56	12812	192.92	778	2014-01-01	2014	1	1	2
769	California	248.67	12125	193.56	12836	192.80	779	2014-01-02	2014	1	1	3
1483	California	248.67	12141	193.57	12853	192.67	782	2014-01-03	2014	1	1	4
2248	California	248.65	12155	193.59	12884	192.67	782	2014-01-04	2014	1	1	5
3013	California	248.68	12176	193.63	12902	192.67	782	2014-01-05	2014	1	1	6

	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	year	month	week	weekday
count	22899.000000	22899.000000	22899.000000	22899.000000	22899.000000	22899.000000	22899.000000	22899.000000	22899.000000	22899.000000
mean	329.759854	2274.743657	247.618306	2183.737805	203.624092	202.804489	2014.167038	5.953229	23.812918	2.995546
std	41.173167	2641.936586	44.276015	2789.902626	101.484265	220.531987	0.401765	3.553055	15.426018	2.005599
min	202.020000	93.000000	144.850000	134.000000	63.700000	11.000000	2013.000000	1.000000	1.000000	0.000000
25%	303.780000	597.000000	215.775000	548.000000	145.810000	51.000000	2014.000000	3.000000	9.000000	1.000000
50%	342.310000	1420.000000	245.800000	1320.000000	185.780000	139.000000	2014.000000	6.000000	22.000000	3.000000
75%	356.550000	2958.000000	274.155000	2673.000000	222.940000	263.000000	2014.000000	9.000000	37.000000	5.000000
max	415.700000	18492.000000	379.000000	22027.000000	734.650000	1287.000000	2015.000000	12.000000	52.000000	6.000000

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date	year	month	week	weekday
8572	California	248.26	12753	192.89	13618	191	818	2014-02-12	2014	2	7	2
8572	California	248.26	12753	192.89	13618	191	818	2014-02-12	2014	2	7	2

	State	HighQ	HighQN	MedQ	MedQN	LowQ	LowQN	date	year	month	week	weekday
20914	Cal	244.05	16474	189.42	19085	188.60	1093	2014-12-28	2014	12	52	6
10051	California	243.60	16664	189.19	19416	188.60	1109	2015-01-14	2015	1	3	2
11428	California	244.75	15893	190.43	18113	188.60	1042	2014-11-15	2014	11	46	5
2503	California	246.53	13435	192.83	14590	191.62	866	2014-04-04	2014	4	14	4
11275	California	245.27	15108	191.60	16881	187.94	989	2014-08-15	2014	8	33	4