Notebook Version - 1.0.0



In [0]:

    
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from google.colab import drive

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import json

Case Study: Prevalence of Obesity in 500 US Cities

Obesity is well known to correlate with health factors such as high blood pressure, but is also known to correlate with economic factors such as low-income, unemployment, etc [1][2]. The Center for Disease Control (CDC) provides prevalence percentages on health conditions such as obesity, high blood pressure, and high cholesterol for approximately 500 major cities in the US (e.g. San Francisco, New York, and Austin). Meanwhile, the US Bureau of Labor Statistics provides unemployment rates while the US Census provides poverty rates for most cities across the United States.

Even though these statistics come from different datasets across different government agencies with different storage formats, dataCommons surfaces each of these in a single, uniform knowledge graph. In fact, you can see this in the browser by looking at the provenance column. Let's use the data in dataCommons to create a linear regression model that incorporates variables:

Prevalence of high blood pressure
Unemployment rate
Percent of population living with income below the poverty line

to predict the prevalence of obesity in the 500 cities that the CDC provides data for. One thing you may note is that the US Census also provides employment statistics (you can see this by navigating to the "employment" and "employmentStatus" sections for San Francisco and observing the different provenances). Our choice of using statistics from the Bureau of Labor Statistics is purely demonstrative, but it would be interesting to see if similar results can be reproduced using US Census employment statistics.



In [5]:

    
!pip install --upgrade --quiet git+https://github.com/datacommonsorg/api-python.git@stable-1.x

import datacommons as dc









    



  Building wheel for datacommons (setup.py) ... done

We will also need to provide the API with an API key. See the to see how to set this up for a Colab Notebook.



In [6]:

    
# Mount the Drive
drive.mount('/content/drive', force_remount=True)

# REPLACE THIS with the path to your key.
key_path = '/content/drive/My Drive/DataCommons/secret.json'

# Read the key in and provide it to the Data Commons API
with open(key_path, 'r') as f:
  secrets = json.load(f)
  dc.set_api_key(secrets['dc_api_key'])









    



Mounted at /content/drive

To get started, we'll need the cities that CDC provides data for. We've listed the DCIDs of all cities with CDC obesity prevalence statistics and stored it as a variable city_dcids. For now we will hide this cell!



In [0]:

    
#@title
city_dcids = [
    "geoId/5370000", "geoId/0524550", "geoId/0820000", 
    "geoId/3502000", "geoId/0665000", "geoId/1239550", 
    "geoId/2148006", "geoId/1232000", "geoId/1757225", 
    "geoId/0622804", "geoId/0807850", "geoId/4848072", 
    "geoId/3921000", "geoId/0680000", "geoId/3650617", 
    "geoId/4813024", "geoId/0644000", "geoId/0918430", 
    "geoId/5147672", "geoId/3345140", "geoId/4856000", 
    "geoId/1714351", "geoId/1982425", "geoId/2674900", 
    "geoId/1258050", "geoId/4943660", "geoId/0541000", 
    "geoId/0659451", "geoId/1269700", "geoId/0616350", 
    "geoId/0653980", "geoId/4459000", "geoId/0650258", 
    "geoId/0622230", "geoId/4845744", "geoId/0662938", 
    "geoId/0150000", "geoId/0673080", "geoId/3725580", 
    "geoId/0603526", "geoId/0846465", "geoId/1656260", 
    "geoId/2523000", "geoId/2241155", "geoId/0427820", 
    "geoId/2970000", "geoId/0622020", "geoId/3728000", 
    "geoId/5101000", "geoId/0681204", "geoId/1851876", 
    "geoId/1768003", "geoId/4224000", "geoId/4815976", 
    "geoId/2688940", "geoId/0606000", "geoId/0454050", 
    "geoId/3006550", "geoId/1751622", "geoId/4748000", 
    "geoId/5164000", "geoId/1077580", "geoId/0681666", 
    "geoId/5548000", "geoId/5584250", "geoId/0681344", 
    "geoId/0623182", "geoId/2915670", "geoId/1368516", 
    "geoId/4260000", "geoId/0615044", "geoId/1765000", 
    "geoId/4819000", "geoId/1921000", "geoId/1212875", 
    "geoId/3413690", "geoId/1253000", "geoId/0827425", 
    "geoId/0603666", "geoId/0616000", "geoId/2255000", 
    "geoId/5539225", "geoId/3710740", "geoId/3663000", 
    "geoId/4843888", "geoId/2567000", "geoId/2646000", 
    "geoId/2954074", "geoId/1150000", "geoId/0668084", 
    "geoId/3665508", "geoId/3436000", "geoId/1270600", 
    "geoId/0980000", "geoId/2706616", "geoId/4548535", 
    "geoId/1840788", "geoId/0677000", "geoId/1779293", 
    "geoId/2836000", "geoId/4983470", "geoId/0636770", 
    "geoId/2680700", "geoId/4803000", "geoId/1239425", 
    "geoId/2509000", "geoId/2036000", "geoId/0446000", 
    "geoId/2582000", "geoId/0816000", "geoId/0477000", 
    "geoId/0616532", "geoId/0658072", "geoId/4856348", 
    "geoId/2511000", "geoId/5303180", "geoId/0877290", 
    "geoId/4847892", "geoId/5305280", "geoId/1257425", 
    "geoId/0653000", "geoId/3912000", "geoId/2649000", 
    "geoId/1912000", "geoId/0639892", "geoId/4957300", 
    "geoId/4962470", "geoId/0680812", "geoId/0640130", 
    "geoId/1822000", "geoId/4147000", "geoId/0660466", 
    "geoId/1823278", "geoId/2758000", "geoId/0681554", 
    "geoId/1207875", "geoId/0135896", "geoId/3702140", 
    "geoId/4261000", "geoId/1207300", "geoId/2669035", 
    "geoId/4810768", "geoId/0680854", "geoId/5135000", 
    "geoId/0535710", "geoId/1712385", "geoId/0107000", 
    "geoId/0655156", "geoId/1772000", "geoId/4419180", 
    "geoId/4751560", "geoId/0630000", "geoId/3916000", 
    "geoId/5566000", "geoId/0614218", "geoId/5367000", 
    "geoId/2038900", "geoId/2545000", "geoId/0659920", 
    "geoId/1938595", "geoId/0648256", "geoId/3825700", 
    "geoId/0845970", "geoId/4752006", "geoId/4841464", 
    "geoId/0678582", "geoId/0613392", "geoId/3260600", 
    "geoId/4041850", "geoId/4819972", "geoId/4827000", 
    "geoId/3240000", "geoId/0608786", "geoId/0636000", 
    "geoId/4714000", "geoId/5357745", "geoId/4105800", 
    "geoId/0455000", "geoId/1703012", "geoId/1367284", 
    "geoId/4848804", "geoId/0665042", "geoId/0684200", 
    "geoId/1827000", "geoId/1714000", "geoId/1342425", 
    "geoId/5010675", "geoId/4513330", "geoId/0611194", 
    "geoId/0621712", "geoId/0650076", "geoId/3712000", 
    "geoId/0660018", "geoId/1831000", "geoId/4982950", 
    "geoId/4715160", "geoId/1243975", "geoId/4822660", 
    "geoId/0843000", "geoId/5531000", "geoId/5167000", 
    "geoId/4105350", "geoId/4123850", "geoId/1652120", 
    "geoId/0602000", "geoId/4810912", "geoId/0628000", 
    "geoId/0804000", "geoId/5176432", "geoId/2621000", 
    "geoId/2754880", "geoId/4561405", "geoId/4829000", 
    "geoId/1973335", "geoId/5380010", "geoId/0646898", 
    "geoId/4740000", "geoId/2717000", "geoId/0668000", 
    "geoId/1230000", "geoId/5157000", "geoId/0624680", 
    "geoId/1216725", "geoId/0651182", "geoId/0636546", 
    "geoId/0523290", "geoId/0669070", "geoId/0137000", 
    "geoId/0659962", "geoId/3128000", "geoId/0613210", 
    "geoId/4801924", "geoId/0812815", "geoId/0485540", 
    "geoId/0613214", "geoId/4009050", "geoId/0600884", 
    "geoId/0662000", "geoId/1245025", "geoId/3961000", 
    "geoId/2629000", "geoId/4804000", "geoId/3570500", 
    "geoId/5182000", "geoId/4841980", "geoId/0611530", 
    "geoId/3722920", "geoId/2146027", "geoId/2052575", 
    "geoId/1759000", "geoId/3474000", "geoId/1224000", 
    "geoId/0648354", "geoId/4861796", "geoId/1254200", 
    "geoId/15003", "geoId/5305210", "geoId/4269000", 
    "geoId/4807000", "geoId/4879000", "geoId/0629000", 
    "geoId/0656000", "geoId/2935000", "geoId/2071000", 
    "geoId/4835000", "geoId/1235000", "geoId/4075000", 
    "geoId/4853388", "geoId/0617918", "geoId/4055000", 
    "geoId/1871000", "geoId/5116000", "geoId/0668196", 
    "geoId/5374060", "geoId/4830464", "geoId/2537490", 
    "geoId/1238250", "geoId/2360545", "geoId/0832155", 
    "geoId/0664000", "geoId/4801000", "geoId/0647766", 
    "geoId/0672016", "geoId/2684000", "geoId/4858016", 
    "geoId/0465000", "geoId/3563460", "geoId/0646842", 
    "geoId/3775000", "geoId/0682996", "geoId/1707133", 
    "geoId/2627440", "geoId/2239475", "geoId/2603000", 
    "geoId/5335415", "geoId/1723074", "geoId/0632548", 
    "geoId/0670000", "geoId/1836003", "geoId/0669000", 
    "geoId/5168000", "geoId/3268400", "geoId/3774440", 
    "geoId/3457000", "geoId/2964550", "geoId/4876000", 
    "geoId/5414600", "geoId/0656700", "geoId/3539380", 
    "geoId/0657792", "geoId/0670098", "geoId/0952000", 
    "geoId/0666000", "geoId/0973000", "geoId/0639496", 
    "geoId/0566080", "geoId/0600562", "geoId/3673000", 
    "geoId/1245060", "geoId/0669196", "geoId/3350260", 
    "geoId/0862000", "geoId/2676460", "geoId/0955990", 
    "geoId/1810342", "geoId/0626000", "geoId/1738570", 
    "geoId/3137000", "geoId/4052500", "geoId/4857200", 
    "geoId/3601000", "geoId/0608954", "geoId/0619766", 
    "geoId/3456550", "geoId/3474630", "geoId/4131250", 
    "geoId/4164900", "geoId/1319000", "geoId/2555745", 
    "geoId/3714100", "geoId/3231900", "geoId/0602364", 
    "geoId/4550875", "geoId/0427400", "geoId/1369000", 
    "geoId/1225175", "geoId/0654652", "geoId/1304204", 
    "geoId/4202000", "geoId/3918000", "geoId/2829700", 
    "geoId/2270000", "geoId/5367167", "geoId/0937000", 
    "geoId/4263624", "geoId/4845384", "geoId/5613900", 
    "geoId/2941348", "geoId/0682590", "geoId/1258715", 
    "geoId/4874144", "geoId/0682954", "geoId/1825000", 
    "geoId/0908000", "geoId/0633434", "geoId/0653896", 
    "geoId/4955980", "geoId/0151000", "geoId/0645484", 
    "geoId/1216475", "geoId/3728080", "geoId/2545560", 
    "geoId/0883835", "geoId/3410000", "geoId/2642160", 
    "geoId/2537000", "geoId/4848768", "geoId/1724582", 
    "geoId/4965330", "geoId/0627000", "geoId/0652526", 
    "geoId/0649670", "geoId/4870808", "geoId/4206088", 
    "geoId/1919000", "geoId/3755000", "geoId/0636448", 
    "geoId/0641992", "geoId/5553000", "geoId/0684550", 
    "geoId/0675000", "geoId/5323515", "geoId/3684000", 
    "geoId/2404000", "geoId/0613588", "geoId/4652980", 
    "geoId/3651000", "geoId/2562535", "geoId/1718823", 
    "geoId/1245000", "geoId/2507000", "geoId/0604982", 
    "geoId/0664224", "geoId/2240735", "geoId/0678120", 
    "geoId/3451000", "geoId/2751730", "geoId/0667000", 
    "geoId/4659020", "geoId/5363000", "geoId/2622000", 
    "geoId/1255775", "geoId/4863500", "geoId/0685292", 
    "geoId/0950370", "geoId/1217200", "geoId/0643000", 
    "geoId/3734200", "geoId/4837000", "geoId/0412000", 
    "geoId/4842508", "geoId/1303440", "geoId/4967000", 
    "geoId/0669088", "geoId/0660620", "geoId/0203000", 
    "geoId/1702154", "geoId/2686000", "geoId/4806128", 
    "geoId/0669084", "geoId/1304000", "geoId/1301052", 
    "geoId/0649270", "geoId/1210275", "geoId/0613014", 
    "geoId/4474300", "geoId/5322640", "geoId/2743000", 
    "geoId/0803455", "geoId/1271000", "geoId/1380508", 
    "geoId/0602252", "geoId/4134100", "geoId/4159000", 
    "geoId/3901000", "geoId/1263000", "geoId/3421000", 
    "geoId/4817000", "geoId/1245975", "geoId/0404720", 
    "geoId/4454640", "geoId/0633000", "geoId/3611000", 
    "geoId/5156000", "geoId/3977000", "geoId/4845000", 
    "geoId/0633182", "geoId/0668252", "geoId/3731400", 
    "geoId/1706613", "geoId/2634000", "geoId/4839148", 
    "geoId/4516000", "geoId/2707966", "geoId/1349000", 
    "geoId/2053775", "geoId/1214400", "geoId/4864472", 
    "geoId/2938000", "geoId/3251800", "geoId/2965000", 
    "geoId/4865000", "geoId/3649121", "geoId/4827684", 
    "geoId/0471510", "geoId/1254000", "geoId/4023200", 
    "geoId/3050200", "geoId/1276600", "geoId/0660102", 
    "geoId/0644574", "geoId/3719000", "geoId/1608830", 
    "geoId/0653322", "geoId/5502375", "geoId/2534550", 
    "geoId/4967440", "geoId/0473000", "geoId/5335275", 
    "geoId/3915000", "geoId/1805860", "geoId/4824000", 
    "geoId/4805000", "geoId/2079000", "geoId/2205000", 
    "geoId/0624638", "geoId/3988000", "geoId/0668378", 
    "geoId/0177256", "geoId/0680238",
]

We then initialize a pandas.DataFrame with the dcids in this list.



In [8]:

    
# Create the Pandas DataFrame 
data = pd.DataFrame({'City': city_dcids})

# Get the name of each city
data['CityName'] = dc.get_property_values(data['City'], 'name')
data = dc.flatten_frame(data)

# Show the top five rows
data.head(5)









    Out[8]:







  
    
      
      City
      CityName
    
  
  
    
      0
      geoId/5370000
      Tacoma
    
    
      1
      geoId/0524550
      Fort Smith
    
    
      2
      geoId/0820000
      Denver
    
    
      3
      geoId/3502000
      Albuquerque
    
    
      4
      geoId/0665000
      San Bernardino

Querying for all Statistics

Let's start by querying for all our statistics. If you have not already, take a look at this explaining how to query for StatisticalPopulations and Observations.



In [9]:

    
# Prevalence of obesity
data['ObesityPop'] = dc.get_populations(
    data['City'],
    'Person',
    constraining_properties={
        'unhealthyBehavior': 'CDC_Obesity'
    }
)
data['Obesity'] = dc.get_observations(
    data['ObesityPop'],
    'prevalence',
    'measuredValue',
    '2015',
    measurement_method='CDC_CrudePrevalence'
)

# Prevalence of high blood pressure
data['HighBpPop'] = dc.get_populations(
    data['City'],
    'Person',
    constraining_properties={
        'healthOutcome': 'CDC_BPHigh'
    }
)
data['HighBp'] = dc.get_observations(
    data['HighBpPop'],
    'prevalence',
    'measuredValue',
    '2015',
    measurement_method='CDC_CrudePrevalence'
)

# Population of all persons
data['AllPersonsPop'] = dc.get_populations(data['City'], 'Person')

# Total persons count 
data['PersonCount'] = dc.get_observations(
    data['AllPersonsPop'],
    'count',
    'measuredValue',
    '2015',
    measurement_method='CenusACS5yrSurvey'
)

# Unemployment rate
data['Unemployment'] = dc.get_observations(
    data['AllPersonsPop'],
    'unemploymentRate',
    'measuredValue',
    '2015',
    observation_period='P1Y',
    measurement_method='BLSSeasonallyUnadjusted'
)

# Count of persons living below the poverty line
data['PovertyPersonsPop'] = dc.get_populations(
    data['City'],
    'Person',
    constraining_properties={
        'povertyStatus': 'BelowPovertyLevelInThePast12Months'
    }
)
data['PovertyCount'] = dc.get_observations(
    data['PovertyPersonsPop'],
    'count',
    'measuredValue',
    '2015',
    measurement_method='CenusACS5yrSurvey'
)

# Display the first five rows.
data.head(5)









    Out[9]:







  
    
      
      City
      CityName
      ObesityPop
      Obesity
      HighBpPop
      HighBp
      AllPersonsPop
      PersonCount
      Unemployment
      PovertyPersonsPop
      PovertyCount
    
  
  
    
      0
      geoId/5370000
      Tacoma
      dc/p/ygz97bz3qrl68
      32.1
      dc/p/061211lndk972
      32.1
      dc/p/v2z7yr9e9m7t4
      203481.0
      6.8
      dc/p/9r3lcy246fb72
      35674.0
    
    
      1
      geoId/0524550
      Fort Smith
      dc/p/0wzp84d2rghp4
      31.7
      dc/p/jyfmgsp4wlsn4
      37.9
      dc/p/qep2q2lcc3rcc
      87603.0
      4.8
      dc/p/3vl1zg91yczx7
      23080.0
    
    
      2
      geoId/0820000
      Denver
      dc/p/z1dpmq8jnmhdc
      20.4
      dc/p/71jjtd68z1tl7
      23.7
      dc/p/gmw3cn8tmsnth
      649654.0
      3.7
      dc/p/r2svxn0qs9nk8
      110299.0
    
    
      3
      geoId/3502000
      Albuquerque
      dc/p/2tbc81tpz0s26
      25.7
      dc/p/0wp70nnwevmh2
      26.5
      dc/p/xjngmjh589vn5
      556092.0
      5.8
      dc/p/rzvgzt78nzer7
      105617.0
    
    
      4
      geoId/0665000
      San Bernardino
      dc/p/3hn9s9kz9gezg
      34.6
      dc/p/z8jneb443gyg9
      31.5
      dc/p/92cxc027krdcd
      214112.0
      8.2
      dc/p/001d2p6867d3g
      68990.0

Cleaning the Data

Great! We have all the statistics we'll need for this case study queried from three different data sources without having to write additional code to perform the joins. However, we'll still need to perform three data cleanup tasks to prepare the data for analysis:

Select the column of city names and all observation columns.
Convert the observation columns to numerical types. Since dataCommons is still in its early release stage, there are a couple of oddities such as how numerical observations are stored as text values!
Remove any missing rows with missing values.

Let's generate the final dataframe.



In [10]:

    
# Get the cleaned table
cleaned = dc.clean_frame(data)

# Remove population columns and compute the poverty rate
cleaned = cleaned[['CityName', 'Obesity', 'HighBp', 'Unemployment', 'PersonCount', 'PovertyCount']]
cleaned['PovertyRate'] = cleaned['PovertyCount'] / cleaned['PersonCount'] * 100

# Display the table
cleaned.head(5)









    Out[10]:







  
    
      
      CityName
      Obesity
      HighBp
      Unemployment
      PersonCount
      PovertyCount
      PovertyRate
    
  
  
    
      0
      Tacoma
      32.1
      32.1
      6.8
      203481.0
      35674.0
      17.531858
    
    
      1
      Fort Smith
      31.7
      37.9
      4.8
      87603.0
      23080.0
      26.346130
    
    
      2
      Denver
      20.4
      23.7
      3.7
      649654.0
      110299.0
      16.978115
    
    
      3
      Albuquerque
      25.7
      26.5
      5.8
      556092.0
      105617.0
      18.992721
    
    
      4
      San Bernardino
      34.6
      31.5
      8.2
      214112.0
      68990.0
      32.221454

Exploring the Data

Now's a good time to explore how each variable (high blood pressure prevalence, unemployment rate, and poverty rate) correlate with obesity prevalence. As stated before, previous research has shown that each of these variables tend to correlate positively with the prevalence of Obesity. Does our data show this?



In [11]:

    
# Plot Bphigh vs. Obesity
plt.figure(figsize=(6,4))
cleaned.plot.scatter(x='HighBp', y='Obesity')
plt.title('High Blood Pressure Prevalence v. Obesity Prevalence')

# Show the plot
plt.show()









    





<Figure size 432x288 with 0 Axes>



In [12]:

    
# Plot Unemployment with Obesity
plt.figure(figsize=(6,4))
cleaned.plot.scatter(x='Unemployment', y='Obesity')
plt.title('Unemployment Rate v. Obesity Prevalence')

# Show the plot
plt.show()









    





<Figure size 432x288 with 0 Axes>



In [13]:

    
# Plot PovertyRate with Obesity
plt.figure(figsize=(6,4))
cleaned.plot.scatter(x='PovertyRate', y='Obesity')
plt.title('Poverty Rate v. Obesity Prevalence')

# Show the plot
plt.show()









    





<Figure size 432x288 with 0 Axes>

Looks like each of variable does correlate positively with obesity prevalence.

Modeling the Data

We'll be predicting the prevalence of obesity with the following linear model.

$$f_\theta(x) = \theta_0 + \theta_1 (\text{high blood pressure}) + \theta_2 (\text{unemployment}) + \theta_3(\text{poverty rate})$$

Let's start by creating our training and test sets. We'll then train a linear regression model using Scikit learn's LinearRegression



In [14]:

    
X = cleaned[['HighBp', 'Unemployment', 'PovertyRate']]
Y = cleaned[['Obesity']]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)

model = LinearRegression(fit_intercept=True)
model.fit(x_train, y_train)

print('Model Intercept: {}'.format(model.intercept_))
print('Model Coefficients: {}'.format(model.coef_))









    



Model Intercept: [4.85067402]
Model Coefficients: [[ 0.66873351 -0.14130964  0.29417318]]

Analyzing the Model

We now have a trained model, but how well does it perform?



In [15]:

    
def mse(y_pred, y_true):
    """ Compute the mean squared error of 'y_pred' and 'y_true'. """
    return float(np.sum((y_pred - y_true) ** 2)) / len(y_pred)

train_pred = model.predict(x_train)
test_pred = model.predict(x_test)

print('Training Error: {}'.format(mse(train_pred, y_train)))
print('Test Error: {}'.format(mse(test_pred, y_test)))









    



Training Error: 10.228031794227281
Test Error: 12.858870785996542

We can also display a plot of the residuals



In [16]:

    
y_res = (test_pred - y_test) / y_test

# Plot the results
plt.title("Residual Proportions Plot")
plt.xlabel("Test Data Row Index")
plt.ylabel("resid. / actual obesity prevalence")
plt.scatter(y_res.index, y_res['Obesity'])
plt.show()

How well does your model perform? We were able to achieve an MSE for the test set of approximately 10% points from the observed obesity prevalence. Our model was also able to fit the data with the max residual bounded by +/- 30%, which for a simple model considering only three explanatory variables isn't so bad.

Conclusion

Congratulations! Rather than spending a couple of hours just searching for and joining datasets from the CDC, Bureau of Labor Statistics, and US Census Bureau, you've spent one lab session querying dataCommons for the same data and, on top of that, using it to build a linear model. Hopefully dataCommons has made the time-to-analysis much shorter!

You've also created a basic model to predict the prevalence of obesity in approximately 500 states using intuition from a long line of research. The model you've created only uses three explanatory variables, so even though it's not the most accurate at predicting obesity prevalence, it's possible that it can be improved by adding more variables. Obesity is also known to correlate with factors such as high cholesterol and diabetes, but with the data in dataCommons, we can even consider asking if obesity correlates with stranger factors such as

How many universities recognized by collegescorecard dataset are contained in a given city
The incidence rate of property arson
The average snowfall in inches

Does adding these variables into your model improve accuracy? Can you think of other variables that correlate with obesity?

	City	CityName
0	geoId/5370000	Tacoma
1	geoId/0524550	Fort Smith
2	geoId/0820000	Denver
3	geoId/3502000	Albuquerque
4	geoId/0665000	San Bernardino

	City	CityName	ObesityPop	Obesity	HighBpPop	HighBp	AllPersonsPop	PersonCount	Unemployment	PovertyPersonsPop	PovertyCount
0	geoId/5370000	Tacoma	dc/p/ygz97bz3qrl68	32.1	dc/p/061211lndk972	32.1	dc/p/v2z7yr9e9m7t4	203481.0	6.8	dc/p/9r3lcy246fb72	35674.0
1	geoId/0524550	Fort Smith	dc/p/0wzp84d2rghp4	31.7	dc/p/jyfmgsp4wlsn4	37.9	dc/p/qep2q2lcc3rcc	87603.0	4.8	dc/p/3vl1zg91yczx7	23080.0
2	geoId/0820000	Denver	dc/p/z1dpmq8jnmhdc	20.4	dc/p/71jjtd68z1tl7	23.7	dc/p/gmw3cn8tmsnth	649654.0	3.7	dc/p/r2svxn0qs9nk8	110299.0
3	geoId/3502000	Albuquerque	dc/p/2tbc81tpz0s26	25.7	dc/p/0wp70nnwevmh2	26.5	dc/p/xjngmjh589vn5	556092.0	5.8	dc/p/rzvgzt78nzer7	105617.0
4	geoId/0665000	San Bernardino	dc/p/3hn9s9kz9gezg	34.6	dc/p/z8jneb443gyg9	31.5	dc/p/92cxc027krdcd	214112.0	8.2	dc/p/001d2p6867d3g	68990.0