Data discovery

Introduction

The Data Observatory is a spatial data repository that enables data scientists to augment their data and broaden their analysis. It offers a wide range of datasets from around the globe.

This guide is intended for those who want to start augmenting their own data using CARTOframes and wish to explore CARTO's public Data Observatory catalog to find datasets that best fit their use cases and analyses.

Note: The catalog is public and you don't need a CARTO account to search for available datasets

Find demographic data for the US

In this guide we walk through the Data Observatory catalog looking for demographics data in the US.

The catalog is comprised of thousands of curated spatial datasets, so when searching for data the easiest way to find what you are looking for is to make use of a faceted search. A faceted (or hierarchical) search allows you to narrow down search results by applying multiple filters based on faceted classification of catalog datasets.

Datasets are organized in three main hierarchies:

Country
Category
Geography (or spatial resolution)

For our analysis we are looking for demographic datasets in the US with a spatial resolution at the block group level.

We can start by discovering which available geographies (or spatial resolutions) we have for demographic data in the US, by filtering the catalog by country and category and listing the available geographies.

Let's start exploring the available categories of data for the US:



In [1]:

    
from cartoframes.data.observatory import Catalog
Catalog().country('usa').categories









    Out[1]:





[<Category.get('covid19')>,
 <Category.get('demographics')>,
 <Category.get('environmental')>,
 <Category.get('financial')>,
 <Category.get('geosocial')>,
 <Category.get('housing')>,
 <Category.get('human_mobility')>,
 <Category.get('points_of_interest')>,
 <Category.get('road_traffic')>]

For the case of the US, the Data Observatory provides six different categories of datasets. Let's discover the available spatial resolutions for the demographics category (which at a first sight will contain the population data we need).



In [2]:

    
from cartoframes.data.observatory import Catalog
geographies = Catalog().country('usa').category('demographics').geographies
geographies









    Out[2]:





[<Geography.get('mbi_blockgroups_535aed6d')>,
 <Geography.get('mbi_counties_46ea8aaa')>,
 <Geography.get('mbi_county_subd_ba170144')>,
 <Geography.get('mbi_pc_5_digit_19e769c1')>,
 <Geography.get('cdb_blockgroup_7753dd51')>,
 <Geography.get('cdb_cbsa_d1b91d3b')>,
 <Geography.get('cdb_censustract_af861cba')>,
 <Geography.get('cdb_congression_478295fd')>,
 <Geography.get('cdb_county_767e79f0')>,
 <Geography.get('cdb_county_8cf054d')>,
 <Geography.get('cdb_place_93d54d1e')>,
 <Geography.get('cdb_puma_56bbc2e')>,
 <Geography.get('cdb_schooldistr_eb48e7bc')>,
 <Geography.get('cdb_schooldistr_18547e3f')>,
 <Geography.get('cdb_schooldistr_d9ca1a26')>,
 <Geography.get('cdb_state_cd83b434')>,
 <Geography.get('cdb_zcta5_f4043497')>]

Let's filter the geographies by those that contain information at the level of blockgroup. For that purpose we are converting the geographies to a pandas DataFrame and search for the string blockgroup in the id of the geographies:



In [3]:

    
df = geographies.to_dataframe()
df[df['id'].str.contains('blockgroup', case=False, na=False)]









    Out[3]:







  
    
      
      slug
      name
      description
      country_id
      provider_id
      geom_type
      geom_coverage
      update_frequency
      is_public_data
      lang
      version
      provider_name
      id
    
  
  
    
      0
      mbi_blockgroups_535aed6d
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      usa
      mbi
      MULTIPOLYGON
      None
      yearly
      False
      eng
      2020
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
    
    
      4
      cdb_blockgroup_7753dd51
      Census Block Group - United States of America
      Shoreline clipped TIGER/Line boundaries. More ...
      usa
      carto
      MULTIPOLYGON
      None
      None
      True
      eng
      2015
      CARTO
      carto-do-public-data.carto.geography_usa_block...

We have three available datasets, from three different providers: Michael Bauer International, Open Data and AGS. For this example, we are going to look for demographic datasets for the MBI blockgroups geography mbi_blockgroups_535aed6d:



In [4]:

    
datasets = Catalog().country('usa').category('demographics').geography('mbi_blockgroups_535aed6d').datasets
datasets









    Out[4]:





[<Dataset.get('mbi_consumer_sp_fdc16f97')>,
 <Dataset.get('mbi_households__ec03bf40')>,
 <Dataset.get('mbi_sociodemogr_1c54ac66')>,
 <Dataset.get('mbi_purchasing__faaee3c9')>,
 <Dataset.get('mbi_population_9d1b276f')>,
 <Dataset.get('mbi_retail_spen_6a1acff4')>,
 <Dataset.get('mbi_consumer_pr_c1d4e20e')>,
 <Dataset.get('mbi_households__60466314')>,
 <Dataset.get('mbi_education_8903fc2c')>]

Let's continue with the data discovery. We have 6 datasets in the US with demographics information at the level of MBI blockgroups:



In [5]:

    
datasets.to_dataframe()









    Out[5]:







  
    
      
      slug
      name
      description
      category_id
      country_id
      data_source_id
      provider_id
      geography_name
      geography_description
      temporal_aggregation
      time_coverage
      update_frequency
      is_public_data
      lang
      version
      category_name
      provider_name
      geography_id
      id
    
  
  
    
      0
      mbi_consumer_sp_fdc16f97
      Consumer Spending - United States of America (...
      MBI Consumer Spending by product groups quanti...
      demographics
      usa
      consumer_spending
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_consumerspending_usa...
    
    
      1
      mbi_households__ec03bf40
      Households By Type - United States of America ...
      Distribution of the households in an area by t...
      demographics
      usa
      households_by_type
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_householdsbytype_usa...
    
    
      2
      mbi_sociodemogr_1c54ac66
      Sociodemographics - United States of America (...
      MBI Sociodemographics includes:\n- Population\...
      demographics
      usa
      sociodemographics
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_sociodemographics_us...
    
    
      3
      mbi_purchasing__faaee3c9
      Purchasing Power - United States of America (B...
      Purchasing Power describes the disposable inco...
      demographics
      usa
      purchasing_power
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_purchasingpower_usa_...
    
    
      4
      mbi_population_9d1b276f
      Population - United States of America (Blockgr...
      Population figures are shown as projected aver...
      demographics
      usa
      population
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_population_usa_block...
    
    
      5
      mbi_retail_spen_6a1acff4
      Retail Spending - United States of America (Bl...
      Retail Spending relates to the proportion of P...
      demographics
      usa
      retail_spending
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_retailspending_usa_b...
    
    
      6
      mbi_consumer_pr_c1d4e20e
      Consumer Profiles - United States of America (...
      The MB International Consumer Styles describe ...
      demographics
      usa
      consumer_profiles
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_consumerprofiles_usa...
    
    
      7
      mbi_households__60466314
      Households By Income Quintiles - United States...
      On the national level the number of households...
      demographics
      usa
      households_by_income_quintiles
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_householdsbyincomequ...
    
    
      8
      mbi_education_8903fc2c
      Education - United States of America (Blockgro...
      Distribution of the population in an area by t...
      demographics
      usa
      education
      mbi
      Blockgroups - United States of America
      MBI Digital Boundaries for USA at Blockgroups ...
      yearly
      [2019-01-01, 2020-01-01)
      yearly
      False
      eng
      2020
      Demographics
      Michael Bauer International
      carto-do.mbi.geography_usa_blockgroups_2020
      carto-do.mbi.demographics_education_usa_blockg...

They comprise different information: consumer spending, retail potential, consumer profiles, etc.

At a first sight, it looks the dataset with data_source_id: sociodemographic might contain the population information we are looking for. Let's try to understand a little bit better what data this dataset contains by looking at its variables:



In [6]:

    
from cartoframes.data.observatory import Dataset
dataset = Dataset.get('ags_sociodemogr_a7e14220')
variables = dataset.variables
variables









    Out[6]:





[<Variable.get('BLOCKGROUP_30e525a6')> #'Geographic Identifier',
 <Variable.get('POPCY_4534fac4')> #'Population (2019A)',
 <Variable.get('POPCYGRP_3033ef2e')> #'Population in Group Quarters (2019A)',
 <Variable.get('POPCYGRPI_1e42899')> #'Institutional Group Quarters Population (2019A)',
 <Variable.get('AGECY0004_aaae373a')> #'Population age 0-4 (2019A)',
 <Variable.get('AGECY0509_d2d4896c')> #'Population age 5-9 (2019A)',
 <Variable.get('AGECY1014_b09611e')> #'Population age 10-14 (2019A)',
 <Variable.get('AGECY1519_7373df48')> #'Population age 15-19 (2019A)',
 <Variable.get('AGECY2024_32919d33')> #'Population age 20-24 (2019A)',
 <Variable.get('AGECY2529_4aeb2365')> #'Population age 25-29 (2019A)',
 <Variable.get('AGECY3034_9336cb17')> #'Population age 30-34 (2019A)',
 <Variable.get('AGECY3539_eb4c7541')> #'Population age 35-39 (2019A)',
 <Variable.get('AGECY4044_41a06569')> #'Population age 40-44 (2019A)',
 <Variable.get('AGECY4549_39dadb3f')> #'Population age 45-49 (2019A)',
 <Variable.get('AGECY5054_e007334d')> #'Population age 50-54 (2019A)',
 <Variable.get('AGECY5559_987d8d1b')> #'Population age 55-59 (2019A)',
 <Variable.get('AGECY6064_d99fcf60')> #'Population age 60-64 (2019A)',
 <Variable.get('AGECY6569_a1e57136')> #'Population age 65-69 (2019A)',
 <Variable.get('AGECY7074_78389944')> #'Population age 70-74 (2019A)',
 <Variable.get('AGECY7579_422712')> #'Population age 75-79 (2019A)',
 <Variable.get('AGECY8084_a7c395dd')> #'Population age 80-84 (2019A)',
 <Variable.get('AGECYGT85_ac46767d')> #'Population age 85+ (2019A)',
 <Variable.get('AGECYMED_f218d6e9')> #'Median Age (2019A)',
 <Variable.get('SEXCYMAL_8ee6ade5')> #'Population male (2019A)',
 <Variable.get('SEXCYFEM_91d8b796')> #'Population female (2019A)',
 <Variable.get('RCHCYWHNHS_b4cab6fe')> #'Non Hispanic White (2019A)',
 <Variable.get('RCHCYBLNHS_93a8395b')> #'Non Hispanic Black (2019A)',
 <Variable.get('RCHCYAMNHS_6cb424ee')> #'Non Hispanic American Indian (2019A)',
 <Variable.get('RCHCYASNHS_dc720442')> #'Non Hispanic Asian (2019A)',
 <Variable.get('RCHCYHANHS_2b72f927')> #'Non Hispanic Hawaiian/Pacific Islander (2019A)',
 <Variable.get('RCHCYOTNHS_fe95829a')> #'Non Hispanic Other Race (2019A)',
 <Variable.get('RCHCYMUNHS_3ce9b69f')> #'Non Hispanic Multiple Race (2019A)',
 <Variable.get('HISCYHISP_e62d7c2e')> #'Population Hispanic (2019A)',
 <Variable.get('MARCYNEVER_eee4f8c3')> #'Never Married (2019A)',
 <Variable.get('MARCYMARR_17f0d887')> #'Now Married (2019A)',
 <Variable.get('MARCYSEP_d4d69eb8')> #'Separated (2019A)',
 <Variable.get('MARCYWIDOW_5ce5d993')> #'Widowed (2019A)',
 <Variable.get('MARCYDIVOR_146db750')> #'Divorced (2019A)',
 <Variable.get('AGECYGT15_7d84cd34')> #'Population Age 15+ (2019A)',
 <Variable.get('EDUCYLTGR9_ed0362fa')> #'Pop 25+ less than 9th grade (2019A)',
 <Variable.get('EDUCYSHSCH_7a88e398')> #'Pop 25+ 9th-12th grade no diploma (2019A)',
 <Variable.get('EDUCYHSCH_a7a81733')> #'Pop 25+ HS graduate (2019A)',
 <Variable.get('EDUCYSCOLL_3840e65b')> #'Pop 25+ college no diploma (2019A)',
 <Variable.get('EDUCYASSOC_dcd76160')> #'Pop 25+ Associate degree (2019A)',
 <Variable.get('EDUCYBACH_d7b78049')> #'Pop 25+ Bachelor's degree (2019A)',
 <Variable.get('EDUCYGRAD_c58943fb')> #'Pop 25+ graduate or prof school degree (2019A)',
 <Variable.get('AGECYGT25_56a99ef7')> #'Population Age 25+ (2019A)',
 <Variable.get('HHDCY_935c1592')> #'Households (2019A)',
 <Variable.get('HHDCYFAM_c1a6fccf')> #'Family Households (2019A)',
 <Variable.get('HHSCYMCFCH_bd115dc2')> #'Families married couple w children (2019A)',
 <Variable.get('HHSCYLPMCH_ce8863e2')> #'Families male no wife w children (2019A)',
 <Variable.get('HHSCYLPFCH_c2dd8c03')> #'Families female no husband children (2019A)',
 <Variable.get('HHDCYAVESZ_d265f21c')> #'Average Household Size (2019A)',
 <Variable.get('HHDCYMEDAG_4f099151')> #'Median Age of Householder (2019A)',
 <Variable.get('VPHCYNONE_3755ac60')> #'Households: No Vehicle Available (2019A)',
 <Variable.get('VPHCY1_bed441da')> #'Households: One Vehicle Available (2019A)',
 <Variable.get('VPHCYGT1_e4a07c30')> #'Households: Two or More Vehicles Available (2019A)',
 <Variable.get('INCCYPCAP_7c8377cf')> #'Per capita income (2019A)',
 <Variable.get('INCCYAVEHH_1ef75363')> #'Average household Income (2019A)',
 <Variable.get('INCCYMEDHH_98692c24')> #'Median household income (2019A)',
 <Variable.get('INCCYMEDFA_7f36b90e')> #'Median family income (2019A)',
 <Variable.get('HINCYLT10_61c14e29')> #'Household Income < $10000 (2019A)',
 <Variable.get('HINCY1015_c720a11b')> #'Household Income $10000-$14999 (2019A)',
 <Variable.get('HINCY1520_9aacc4bc')> #'Household Income $15000-$19999 (2019A)',
 <Variable.get('HINCY2025_feb85d36')> #'Household Income $20000-$24999 (2019A)',
 <Variable.get('HINCY2530_91025a13')> #'Household Income $25000-$29999 (2019A)',
 <Variable.get('HINCY3035_5f1f0b12')> #'Household Income $30000-$34999 (2019A)',
 <Variable.get('HINCY3540_66ffabb1')> #'Household Income $35000-$39999 (2019A)',
 <Variable.get('HINCY4045_8d89a56c')> #'Household Income $40000-$44999 (2019A)',
 <Variable.get('HINCY4550_e233a249')> #'Household Income $45000-$49999 (2019A)',
 <Variable.get('HINCY5060_77695404')> #'Household Income $50000-$59999 (2019A)',
 <Variable.get('HINCY6075_cad3e24')> #'Household Income $60000-$74999 (2019A)',
 <Variable.get('HINCY75100_bb90c7bb')> #'Household Income $75000-$99999 (2019A)',
 <Variable.get('HINCY10025_40903e13')> #'Household Income $100000-$124999 (2019A)',
 <Variable.get('HINCY12550_d379563b')> #'Household Income $125000-$149999 (2019A)',
 <Variable.get('HINCY15020_7243aae')> #'Household Income $150000-$199999 (2019A)',
 <Variable.get('HINCYGT200_c39e094b')> #'Household Income > $200000 (2019A)',
 <Variable.get('HINCYMED24_4ac9369')> #'Median Household Income: Age < 25 (2019A)',
 <Variable.get('HINCYMED25_73aba3ff')> #'Median Household Income: Age 25-34 (2019A)',
 <Variable.get('HINCYMED35_6ab092be')> #'Median Household Income: Age 35-44 (2019A)',
 <Variable.get('HINCYMED45_25f10479')> #'Median Household Income: Age 45-54 (2019A)',
 <Variable.get('HINCYMED55_3cea3538')> #'Median Household Income: Age 55-64 (2019A)',
 <Variable.get('HINCYMED65_17c766fb')> #'Median Household Income: Age 65-74 (2019A)',
 <Variable.get('HINCYMED75_edc57ba')> #'Median Household Income: Age 75+ (2019A)',
 <Variable.get('LBFCYPOP16_75363c6f')> #'Population Age 16+ (2019A)',
 <Variable.get('LBFCYARM_c8f45b67')> #'Pop 16+ in Armed Forces (2019A)',
 <Variable.get('LBFCYEMPL_1902fd90')> #'Pop 16+ civilian employed (2019A)',
 <Variable.get('LBFCYUNEM_befc2d4')> #'Pop 16+ civilian unemployed (2019A)',
 <Variable.get('LBFCYNLF_803bfa0d')> #'Pop 16+ not in labor force (2019A)',
 <Variable.get('UNECYRATE_a642ed8a')> #'Unemployment Rate (2019A)',
 <Variable.get('LBFCYLBF_1d3c03ed')> #'Population In Labor Force (2019A)',
 <Variable.get('LNIEXSPAN_8f8728c7')> #'SPANISH SPEAKING HOUSEHOLDS',
 <Variable.get('LNIEXISOL_c2e86dc7')> #'LINGUISTICALLY ISOLATED HOUSEHOLDS (NON-ENGLISH SP...',
 <Variable.get('HOOEXMED_48df3206')> #'Median Value of Owner Occupied Housing Units',
 <Variable.get('RNTEXMED_6ac2e609')> #'Median Cash Rent',
 <Variable.get('HUSEX1DET_231a9f6c')> #'UNITS IN STRUCTURE: 1 DETACHED',
 <Variable.get('HUSEXAPT_dc7d3c72')> #'UNITS IN STRUCTURE: 20 OR MORE',
 <Variable.get('DWLCY_50c5eee2')> #'Housing units (2019A)',
 <Variable.get('DWLCYVACNT_6b929d9a')> #'Housing units vacant (2019A)',
 <Variable.get('DWLCYRENT_3601a69e')> #'Occupied units renter (2019A)',
 <Variable.get('DWLCYOWNED_858b3ad6')> #'Occupied units owner (2019A)',
 <Variable.get('POPPY_24dbbb56')> #'Population (2024A)',
 <Variable.get('HHDPY_f2b35400')> #'Households (2024A)',
 <Variable.get('DWLPY_312aaf70')> #'Housing units (2024A)',
 <Variable.get('AGEPYMED_d5583bbb')> #'Median Age (2024A)',
 <Variable.get('INCPYPCAP_f9c107fa')> #'Per capita income (2024A)',
 <Variable.get('INCPYAVEHH_48c1d530')> #'Average household Income (2024A)',
 <Variable.get('INCPYMEDHH_ce5faa77')> #'Median household income (2024A)']



In [7]:

    
from cartoframes.data.observatory import Dataset
vdf = variables.to_dataframe()
vdf









    Out[7]:







  
    
      
      slug
      name
      description
      db_type
      agg_method
      column_name
      variable_group_id
      dataset_id
      id
    
  
  
    
      0
      BLOCKGROUP_30e525a6
      BLOCKGROUP
      Geographic Identifier
      STRING
      None
      BLOCKGROUP
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      1
      POPCY_4534fac4
      Total Population
      Population (2019A)
      INTEGER
      SUM
      POPCY
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      2
      POPCYGRP_3033ef2e
      POPCYGRP
      Population in Group Quarters (2019A)
      INTEGER
      SUM
      POPCYGRP
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      3
      POPCYGRPI_1e42899
      POPCYGRPI
      Institutional Group Quarters Population (2019A)
      INTEGER
      SUM
      POPCYGRPI
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      4
      AGECY0004_aaae373a
      AGECY0004
      Population age 0-4 (2019A)
      INTEGER
      SUM
      AGECY0004
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      103
      DWLPY_312aaf70
      Number of household units
      Housing units (2024A)
      INTEGER
      SUM
      DWLPY
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      104
      AGEPYMED_d5583bbb
      AGEPYMED
      Median Age (2024A)
      FLOAT
      AVG
      AGEPYMED
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      105
      INCPYPCAP_f9c107fa
      INCPYPCAP
      Per capita income (2024A)
      FLOAT
      AVG
      INCPYPCAP
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      106
      INCPYAVEHH_48c1d530
      INCPYAVEHH
      Average household Income (2024A)
      FLOAT
      AVG
      INCPYAVEHH
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      107
      INCPYMEDHH_ce5faa77
      INCPYMEDHH
      Median household income (2024A)
      FLOAT
      AVG
      INCPYMEDHH
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
  

108 rows × 9 columns

We can see there are several variables related to population, so this is the Dataset we are looking for.



In [8]:

    
vdf[vdf['description'].str.contains('pop', case=False, na=False)]









    Out[8]:







  
    
      
      slug
      name
      description
      db_type
      agg_method
      column_name
      variable_group_id
      dataset_id
      id
    
  
  
    
      1
      POPCY_4534fac4
      Total Population
      Population (2019A)
      INTEGER
      SUM
      POPCY
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      2
      POPCYGRP_3033ef2e
      POPCYGRP
      Population in Group Quarters (2019A)
      INTEGER
      SUM
      POPCYGRP
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      3
      POPCYGRPI_1e42899
      POPCYGRPI
      Institutional Group Quarters Population (2019A)
      INTEGER
      SUM
      POPCYGRPI
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      4
      AGECY0004_aaae373a
      AGECY0004
      Population age 0-4 (2019A)
      INTEGER
      SUM
      AGECY0004
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      5
      AGECY0509_d2d4896c
      AGECY0509
      Population age 5-9 (2019A)
      INTEGER
      SUM
      AGECY0509
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      6
      AGECY1014_b09611e
      AGECY1014
      Population age 10-14 (2019A)
      INTEGER
      SUM
      AGECY1014
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      7
      AGECY1519_7373df48
      AGECY1519
      Population age 15-19 (2019A)
      INTEGER
      SUM
      AGECY1519
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      8
      AGECY2024_32919d33
      AGECY2024
      Population age 20-24 (2019A)
      INTEGER
      SUM
      AGECY2024
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      9
      AGECY2529_4aeb2365
      AGECY2529
      Population age 25-29 (2019A)
      INTEGER
      SUM
      AGECY2529
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      10
      AGECY3034_9336cb17
      AGECY3034
      Population age 30-34 (2019A)
      INTEGER
      SUM
      AGECY3034
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      11
      AGECY3539_eb4c7541
      AGECY3539
      Population age 35-39 (2019A)
      INTEGER
      SUM
      AGECY3539
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      12
      AGECY4044_41a06569
      AGECY4044
      Population age 40-44 (2019A)
      INTEGER
      SUM
      AGECY4044
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      13
      AGECY4549_39dadb3f
      AGECY4549
      Population age 45-49 (2019A)
      INTEGER
      SUM
      AGECY4549
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      14
      AGECY5054_e007334d
      AGECY5054
      Population age 50-54 (2019A)
      INTEGER
      SUM
      AGECY5054
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      15
      AGECY5559_987d8d1b
      AGECY5559
      Population age 55-59 (2019A)
      INTEGER
      SUM
      AGECY5559
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      16
      AGECY6064_d99fcf60
      AGECY6064
      Population age 60-64 (2019A)
      INTEGER
      SUM
      AGECY6064
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      17
      AGECY6569_a1e57136
      AGECY6569
      Population age 65-69 (2019A)
      INTEGER
      SUM
      AGECY6569
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      18
      AGECY7074_78389944
      AGECY7074
      Population age 70-74 (2019A)
      INTEGER
      SUM
      AGECY7074
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      19
      AGECY7579_422712
      AGECY7579
      Population age 75-79 (2019A)
      INTEGER
      SUM
      AGECY7579
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      20
      AGECY8084_a7c395dd
      AGECY8084
      Population age 80-84 (2019A)
      INTEGER
      SUM
      AGECY8084
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      21
      AGECYGT85_ac46767d
      AGECYGT85
      Population age 85+ (2019A)
      INTEGER
      SUM
      AGECYGT85
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      23
      SEXCYMAL_8ee6ade5
      SEXCYMAL
      Population male (2019A)
      INTEGER
      SUM
      SEXCYMAL
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      24
      SEXCYFEM_91d8b796
      SEXCYFEM
      Population female (2019A)
      INTEGER
      SUM
      SEXCYFEM
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      32
      HISCYHISP_e62d7c2e
      HISCYHISP
      Population Hispanic (2019A)
      INTEGER
      SUM
      HISCYHISP
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      38
      AGECYGT15_7d84cd34
      AGECYGT15
      Population Age 15+ (2019A)
      INTEGER
      SUM
      AGECYGT15
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      39
      EDUCYLTGR9_ed0362fa
      EDUCYLTGR9
      Pop 25+ less than 9th grade (2019A)
      INTEGER
      SUM
      EDUCYLTGR9
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      40
      EDUCYSHSCH_7a88e398
      EDUCYSHSCH
      Pop 25+ 9th-12th grade no diploma (2019A)
      INTEGER
      SUM
      EDUCYSHSCH
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      41
      EDUCYHSCH_a7a81733
      EDUCYHSCH
      Pop 25+ HS graduate (2019A)
      INTEGER
      SUM
      EDUCYHSCH
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      42
      EDUCYSCOLL_3840e65b
      EDUCYSCOLL
      Pop 25+ college no diploma (2019A)
      INTEGER
      SUM
      EDUCYSCOLL
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      43
      EDUCYASSOC_dcd76160
      EDUCYASSOC
      Pop 25+ Associate degree (2019A)
      INTEGER
      SUM
      EDUCYASSOC
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      44
      EDUCYBACH_d7b78049
      EDUCYBACH
      Pop 25+ Bachelor's degree (2019A)
      INTEGER
      SUM
      EDUCYBACH
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      45
      EDUCYGRAD_c58943fb
      EDUCYGRAD
      Pop 25+ graduate or prof school degree (2019A)
      INTEGER
      SUM
      EDUCYGRAD
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      46
      AGECYGT25_56a99ef7
      AGECYGT25
      Population Age 25+ (2019A)
      INTEGER
      SUM
      AGECYGT25
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      84
      LBFCYPOP16_75363c6f
      LBFCYPOP16
      Population Age 16+ (2019A)
      INTEGER
      SUM
      LBFCYPOP16
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      85
      LBFCYARM_c8f45b67
      LBFCYARM
      Pop 16+ in Armed Forces (2019A)
      INTEGER
      SUM
      LBFCYARM
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      86
      LBFCYEMPL_1902fd90
      LBFCYEMPL
      Pop 16+ civilian employed (2019A)
      INTEGER
      SUM
      LBFCYEMPL
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      87
      LBFCYUNEM_befc2d4
      LBFCYUNEM
      Pop 16+ civilian unemployed (2019A)
      INTEGER
      SUM
      LBFCYUNEM
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      88
      LBFCYNLF_803bfa0d
      LBFCYNLF
      Pop 16+ not in labor force (2019A)
      INTEGER
      SUM
      LBFCYNLF
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      90
      LBFCYLBF_1d3c03ed
      LBFCYLBF
      Population In Labor Force (2019A)
      INTEGER
      SUM
      LBFCYLBF
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...
    
    
      101
      POPPY_24dbbb56
      Total population
      Population (2024A)
      INTEGER
      SUM
      POPPY
      None
      carto-do.ags.demographics_sociodemographics_us...
      carto-do.ags.demographics_sociodemographics_us...

Dataset and variables metadata

The Data Observatory catalog is not only a repository of curated spatial datasets, it also contains valuable information that helps on understanding better the underlying data for every dataset, so you can take an informed decision on what data best fits your problem.

Some of the augmented metadata you can find for each dataset in the catalog is:

head and tail methods to get a glimpse of the actual data. This helps you to understand the available columns, data types, etc. To start modelling your problem right away.
geom_coverage to visualize on a map the geographical coverage of the data in the Dataset.
counts, fields_by_type and a full describe method with stats of the actual values in the dataset, such as: average, stdev, quantiles, min, max, median for each of the variables of the dataset.

You don't need a subscription to a dataset to be able to query the augmented metadata, it's just publicly available for anyone exploring the Data Observatory catalog.

Let's overview some of that information, starting by getting a glimpse of the ten first or last rows of the actual data of the dataset:



In [9]:

    
from cartoframes.data.observatory import Dataset
dataset = Dataset.get('ags_sociodemogr_a7e14220')



In [10]:

    
dataset.head()









    Out[10]:







  
    
      
      DWLCY
      DWLPY
      HHDCY
      HHDPY
      POPCY
      POPPY
      geoid
      VPHCY1
      do_date
      AGECYMED
      ...
      MARCYDIVOR
      MARCYNEVER
      MARCYWIDOW
      RCHCYAMNHS
      RCHCYASNHS
      RCHCYBLNHS
      RCHCYHANHS
      RCHCYMUNHS
      RCHCYOTNHS
      RCHCYWHNHS
    
  
  
    
      0
      1057
      1112
      932
      986
      1500
      1648
      040130405071
      442
      2020-01-01 00:00:00+00:00
      77.40
      ...
      149
      4
      228
      0
      11
      20
      0
      25
      0
      1317
    
    
      1
      1964
      2069
      1774
      1877
      2595
      2868
      040130405072
      1049
      2020-01-01 00:00:00+00:00
      76.88
      ...
      414
      160
      699
      0
      74
      68
      7
      55
      0
      2167
    
    
      2
      1049
      1101
      897
      933
      1585
      1716
      040130610182
      460
      2020-01-01 00:00:00+00:00
      69.88
      ...
      31
      217
      246
      2
      55
      43
      9
      26
      0
      1313
    
    
      3
      1084
      1137
      910
      940
      1503
      1616
      040138175002
      392
      2020-01-01 00:00:00+00:00
      71.44
      ...
      191
      79
      268
      8
      24
      38
      0
      8
      0
      1290
    
    
      4
      682
      706
      574
      591
      980
      1039
      040190043241
      244
      2020-01-01 00:00:00+00:00
      72.38
      ...
      30
      44
      195
      3
      9
      0
      0
      0
      0
      902
    
    
      5
      880
      910
      840
      869
      1249
      1284
      060133511032
      539
      2020-01-01 00:00:00+00:00
      76.75
      ...
      160
      40
      319
      0
      136
      19
      2
      12
      5
      1024
    
    
      6
      1467
      1534
      1314
      1467
      1658
      1800
      060590995101
      831
      2020-01-01 00:00:00+00:00
      74.58
      ...
      423
      136
      496
      3
      226
      10
      1
      16
      0
      1269
    
    
      7
      704
      753
      693
      730
      1078
      1176
      060610210391
      338
      2020-01-01 00:00:00+00:00
      73.86
      ...
      117
      63
      215
      5
      33
      7
      0
      9
      0
      986
    
    
      8
      1582
      1691
      1553
      1650
      2540
      2795
      060610236001
      818
      2020-01-01 00:00:00+00:00
      68.80
      ...
      406
      45
      301
      5
      168
      26
      3
      19
      0
      2183
    
    
      9
      1186
      1268
      1163
      1234
      1980
      2176
      060610236002
      415
      2020-01-01 00:00:00+00:00
      68.59
      ...
      253
      60
      223
      5
      97
      22
      1
      10
      0
      1750
    
  

10 rows × 110 columns

Alternatively, you can get the last ten ones with dataset.tail()

An overview of the coverage of the dataset



In [11]:

    
dataset.geom_coverage()

Some stats about the dataset:



In [12]:

    
dataset.counts()









    Out[12]:





rows                    217182.0
cells                 23890020.0
null_cells                   0.0
null_cells_percent           0.0
dtype: float64



In [13]:

    
dataset.fields_by_type()









    Out[13]:





float          5
string         2
integer      102
timestamp      1
dtype: int64



In [14]:

    
dataset.describe()









    Out[14]:







  
    
      
      POPCY
      POPCYGRP
      POPCYGRPI
      AGECY0004
      AGECY0509
      AGECY1014
      AGECY1519
      AGECY2024
      AGECY2529
      AGECY3034
      ...
      DWLCYVACNT
      DWLCYRENT
      DWLCYOWNED
      POPPY
      HHDPY
      DWLPY
      AGEPYMED
      INCPYPCAP
      INCPYAVEHH
      INCPYMEDHH
    
  
  
    
      avg
      1.520470e+03
      3.725678e+01
      1.798816e+01
      9.047040e+01
      9.341024e+01
      9.581574e+01
      9.726450e+01
      1.000845e+02
      1.083624e+02
      1.044326e+02
      ...
      4.957671e+01
      2.116296e+02
      3.835577e+02
      1.568307e+03
      6.075594e+02
      6.718555e+02
      3.989449e+01
      4.292896e+04
      1.073098e+05
      7.933328e+04
    
    
      max
      6.710000e+04
      1.975200e+04
      1.205300e+04
      5.393000e+03
      5.294000e+03
      5.195000e+03
      7.606000e+03
      1.480400e+04
      5.767000e+03
      5.616000e+03
      ...
      6.547000e+03
      1.005700e+04
      2.367600e+04
      7.584500e+04
      2.811500e+04
      3.264000e+04
      8.750000e+01
      3.824975e+06
      1.112720e+07
      3.500000e+05
    
    
      min
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      ...
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
    
    
      sum
      3.302187e+08
      8.091501e+06
      3.906704e+06
      1.964854e+07
      2.028702e+07
      2.080945e+07
      2.112410e+07
      2.173655e+07
      2.353436e+07
      2.268087e+07
      ...
      1.076717e+07
      4.596213e+07
      8.330182e+07
      3.406080e+08
      1.319510e+08
      1.459149e+08
      8.664364e+06
      9.323398e+09
      2.330576e+10
      1.722976e+10
    
    
      range
      6.710000e+04
      1.975200e+04
      1.205300e+04
      5.393000e+03
      5.294000e+03
      5.195000e+03
      7.606000e+03
      1.480400e+04
      5.767000e+03
      5.616000e+03
      ...
      6.547000e+03
      1.005700e+04
      2.367600e+04
      7.584500e+04
      2.811500e+04
      3.264000e+04
      8.750000e+01
      3.824975e+06
      1.112720e+07
      3.500000e+05
    
    
      stdev
      1.063417e+03
      2.428693e+02
      1.582057e+02
      8.044792e+01
      8.338915e+01
      8.351758e+01
      1.117828e+02
      1.244125e+02
      9.642758e+01
      9.358961e+01
      ...
      9.849786e+01
      2.353261e+02
      3.163312e+02
      1.141981e+03
      4.138545e+02
      4.467692e+02
      7.567177e+00
      3.178870e+04
      7.835131e+04
      4.262018e+04
    
    
      q1
      8.500000e+02
      0.000000e+00
      0.000000e+00
      4.400000e+01
      4.400000e+01
      4.600000e+01
      4.500000e+01
      4.300000e+01
      5.000000e+01
      4.900000e+01
      ...
      1.100000e+01
      6.000000e+01
      1.820000e+02
      8.670000e+02
      3.440000e+02
      3.840000e+02
      3.407000e+01
      2.168000e+04
      5.536100e+04
      4.701800e+04
    
    
      q3
      1.454000e+03
      0.000000e+00
      0.000000e+00
      8.300000e+01
      8.600000e+01
      8.900000e+01
      8.700000e+01
      8.600000e+01
      9.800000e+01
      9.500000e+01
      ...
      3.400000e+01
      1.780000e+02
      3.750000e+02
      1.485000e+03
      5.810000e+02
      6.480000e+02
      4.101000e+01
      4.058200e+04
      1.012720e+05
      7.908300e+04
    
    
      median
      1.125000e+03
      0.000000e+00
      0.000000e+00
      6.200000e+01
      6.300000e+01
      6.500000e+01
      6.400000e+01
      6.200000e+01
      7.100000e+01
      6.900000e+01
      ...
      2.000000e+01
      1.080000e+02
      2.740000e+02
      1.143000e+03
      4.520000e+02
      5.040000e+02
      3.771000e+01
      3.056300e+04
      7.671300e+04
      6.212200e+04
    
    
      interquartile_range
      6.040000e+02
      0.000000e+00
      0.000000e+00
      3.900000e+01
      4.200000e+01
      4.300000e+01
      4.200000e+01
      4.300000e+01
      4.800000e+01
      4.600000e+01
      ...
      2.300000e+01
      1.180000e+02
      1.930000e+02
      6.180000e+02
      2.370000e+02
      2.640000e+02
      6.940000e+00
      1.890200e+04
      4.591100e+04
      3.206500e+04
    
  

10 rows × 107 columns

Every Dataset instance in the catalog contains other useful metadata:

slug: A short ID
name and description: Free text attributes
country
geography: Every dataset is related to a Geography instance
category
provider
data source
lang
temporal aggregation
time coverage
update frequency
version
is_public_data: whether you need a license to use the dataset for enrichment purposes or not



In [15]:

    
dataset.to_dict()









    Out[15]:





{'slug': 'ags_sociodemogr_a7e14220',
 'name': 'Sociodemographics - United States of America (Census Block Group)',
 'description': 'Census and ACS sociodemographic data estimated for the current year and data projected to five years. Projected fields are general aggregates (total population, total households, median age, avg income etc.)',
 'category_id': 'demographics',
 'country_id': 'usa',
 'data_source_id': 'sociodemographics',
 'provider_id': 'ags',
 'geography_name': 'Census Block Group - United States of America',
 'geography_description': None,
 'temporal_aggregation': 'yearly',
 'time_coverage': None,
 'update_frequency': 'yearly',
 'is_public_data': False,
 'lang': 'eng',
 'version': '2020',
 'category_name': 'Demographics',
 'provider_name': 'Applied Geographic Solutions',
 'geography_id': 'carto-do.ags.geography_usa_blockgroup_2015',
 'id': 'carto-do.ags.demographics_sociodemographics_usa_blockgroup_2015_yearly_2020'}

There's also some intersting metadata, for each variable in the dataset:

id
slug: A short ID
name and description
column_name: Actual column name in the table that contains the data
db_type: SQL type in the database
dataset_id
agg_method: Aggregation method used
temporal aggregation and time coverage

Variables are the most important asset in the catalog and when exploring datasets in the Data Observatory catalog it's very important that you understand clearly what variables are available to enrich your own data.

For each Variable in each dataset, the Data Observatory provides (as it does with datasets) a set of methods and attributes to understand their underlaying data.

Some of them are:

head and tail methods to get a glimpse of the actual data and start modelling your problem right away.
counts, quantiles and a full describe method with stats of the actual values in the dataset, such as: average, stdev, quantiles, min, max, median for each of the variables of the dataset.
an histogram plot with the distribution of the values on each variable.

Let's overview some of that augmented metadata for the variables in the AGS population dataset.



In [16]:

    
from cartoframes.data.observatory import Variable
variable = Variable.get('POPCY_4534fac4')
variable









    Out[16]:





<Variable.get('POPCY_4534fac4')> #'Population (2019A)'



In [17]:

    
variable.to_dict()









    Out[17]:





{'slug': 'POPCY_4534fac4',
 'name': 'Total Population',
 'description': 'Population (2019A)',
 'db_type': 'INTEGER',
 'agg_method': 'SUM',
 'column_name': 'POPCY',
 'variable_group_id': None,
 'dataset_id': 'carto-do.ags.demographics_sociodemographics_usa_blockgroup_2015_yearly_2020',
 'id': 'carto-do.ags.demographics_sociodemographics_usa_blockgroup_2015_yearly_2020.POPCY'}

There's also some utility methods ot understand the underlying data for each variable:



In [18]:

    
variable.head()









    Out[18]:





0    1500
1    2595
2    1585
3    1503
4     980
5    1249
6    1658
7    1078
8    2540
9    1980
dtype: int64



In [19]:

    
variable.counts()









    Out[19]:





all                 217182.000000
null                     0.000000
zero                   299.000000
extreme               9073.000000
distinct              6756.000000
outliers             26998.000000
null_percent             0.000000
zero_percent             0.137673
extreme_percent          0.041776
distinct_percent         3.110755
outliers_percent         0.124310
dtype: float64



In [20]:

    
variable.quantiles()









    Out[20]:





q1                      850
q3                     1454
median                 1125
interquartile_range     604
dtype: int64



In [21]:

    
variable.histogram()









    





<Figure size 1200x700 with 1 Axes>



In [22]:

    
variable.describe()









    Out[22]:





avg                    1.520470e+03
max                    6.710000e+04
min                    0.000000e+00
sum                    3.302187e+08
range                  6.710000e+04
stdev                  1.063417e+03
q1                     8.500000e+02
q3                     1.454000e+03
median                 1.125000e+03
interquartile_range    6.040000e+02
dtype: float64

Once you have explored the catalog and have detected a dataset with the variables you need for your analysis and the right spatial resolution, you have to look at the is_public_data to know if you can just use it from CARTOframes or you first need to subscribe for a license.

Subscriptions to datasets allow you to use them from CARTOframes to enrich your own data or to download them. See the enrichment guides for more information about this.

Let's see the dataset and geography in our previous example:



In [23]:

    
dataset = Dataset.get('ags_sociodemogr_a7e14220')



In [24]:

    
dataset.is_public_data









    Out[24]:





False



In [25]:

    
from cartoframes.data.observatory import Geography
geography = Geography.get(dataset.geography)



In [26]:

    
geography.is_public_data









    Out[26]:





False

Both dataset and geography are not public data, that means you need a subscription to be able to use them to enrich your own data.

To subscribe to data in the Data Observatory catalog you need a CARTO account with access to Data Observatory



In [27]:

    
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')



In [28]:

    
dataset.subscribe()



In [29]:

    
geography.subscribe()

Licenses to data in the Data Observatory grant you the right to use the data subscribed for the period of one year. Every dataset or geography you want to use to enrich your own data, as long as they are not public data, require a valid license.

You can check the actual status of your subscriptions directly from the catalog.



In [30]:

    
Catalog().subscriptions()









    Out[30]:





Datasets: [<Dataset.get('ags_businesscou_df363a87')>, <Dataset.get('ags_crimerisk_e9cfa4d4')>, <Dataset.get('ags_retailpoten_aaf25a8c')>, <Dataset.get('ags_sociodemogr_f510a947')>, <Dataset.get('ags_sociodemogr_a7e14220')>]
Geographies: [<Geography.get('cdb_blockgroup_7753dd51')>, <Geography.get('ags_blockgroup_1c63771c')>]

Conclusion

In this guide you've seen how to explore the Data Observatory catalog to identify variables of datasets that you can use to enrich your own data.

You've learned how to:

Explore the catalog using nested hierarchical filters.
Describe the three main entities in the catalog: Geography, Dataset and their Variables.
Taken a look at the data and stats taken from the actual repository, to make a more informed decision on which variables to choose.
How to subscribe to the chosen dataset to get a license that grants the right to enrich your own data.

We also recommend checking out the resources below to learn more about the Data Observatory catalog:

The CARTOframes enrichment guide
Our public website
Your user dashboard: Under the data section
The CARTOframes catalog API reference

	slug	name	description	country_id	provider_id	geom_type	geom_coverage	update_frequency	is_public_data	lang	version	provider_name	id
0	mbi_blockgroups_535aed6d	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	usa	mbi	MULTIPOLYGON	None	yearly	False	eng	2020	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020
4	cdb_blockgroup_7753dd51	Census Block Group - United States of America	Shoreline clipped TIGER/Line boundaries. More ...	usa	carto	MULTIPOLYGON	None	None	True	eng	2015	CARTO	carto-do-public-data.carto.geography_usa_block...

	slug	name	description	category_id	country_id	data_source_id	provider_id	geography_name	geography_description	temporal_aggregation	time_coverage	update_frequency	is_public_data	lang	version	category_name	provider_name	geography_id	id
0	mbi_consumer_sp_fdc16f97	Consumer Spending - United States of America (...	MBI Consumer Spending by product groups quanti...	demographics	usa	consumer_spending	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_consumerspending_usa...
1	mbi_households__ec03bf40	Households By Type - United States of America ...	Distribution of the households in an area by t...	demographics	usa	households_by_type	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_householdsbytype_usa...
2	mbi_sociodemogr_1c54ac66	Sociodemographics - United States of America (...	MBI Sociodemographics includes:\n- Population\...	demographics	usa	sociodemographics	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_sociodemographics_us...
3	mbi_purchasing__faaee3c9	Purchasing Power - United States of America (B...	Purchasing Power describes the disposable inco...	demographics	usa	purchasing_power	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_purchasingpower_usa_...
4	mbi_population_9d1b276f	Population - United States of America (Blockgr...	Population figures are shown as projected aver...	demographics	usa	population	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_population_usa_block...
5	mbi_retail_spen_6a1acff4	Retail Spending - United States of America (Bl...	Retail Spending relates to the proportion of P...	demographics	usa	retail_spending	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_retailspending_usa_b...
6	mbi_consumer_pr_c1d4e20e	Consumer Profiles - United States of America (...	The MB International Consumer Styles describe ...	demographics	usa	consumer_profiles	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_consumerprofiles_usa...
7	mbi_households__60466314	Households By Income Quintiles - United States...	On the national level the number of households...	demographics	usa	households_by_income_quintiles	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_householdsbyincomequ...
8	mbi_education_8903fc2c	Education - United States of America (Blockgro...	Distribution of the population in an area by t...	demographics	usa	education	mbi	Blockgroups - United States of America	MBI Digital Boundaries for USA at Blockgroups ...	yearly	[2019-01-01, 2020-01-01)	yearly	False	eng	2020	Demographics	Michael Bauer International	carto-do.mbi.geography_usa_blockgroups_2020	carto-do.mbi.demographics_education_usa_blockg...

	slug	name	description	db_type	agg_method	column_name	variable_group_id	dataset_id	id
0	BLOCKGROUP_30e525a6	BLOCKGROUP	Geographic Identifier	STRING	None	BLOCKGROUP	None	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
1	POPCY_4534fac4	Total Population	Population (2019A)	INTEGER	SUM	POPCY	None	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
2	POPCYGRP_3033ef2e	POPCYGRP	Population in Group Quarters (2019A)	INTEGER	SUM	POPCYGRP	None	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
3	POPCYGRPI_1e42899	POPCYGRPI	Institutional Group Quarters Population (2019A)	INTEGER	SUM	POPCYGRPI	None	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
4	AGECY0004_aaae373a	AGECY0004	Population age 0-4 (2019A)	INTEGER	SUM	AGECY0004	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
...	...	...	...	...	...	...	...	...	...
103	DWLPY_312aaf70	Number of household units	Housing units (2024A)	INTEGER	SUM	DWLPY	None	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
104	AGEPYMED_d5583bbb	AGEPYMED	Median Age (2024A)	FLOAT	AVG	AGEPYMED	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
105	INCPYPCAP_f9c107fa	INCPYPCAP	Per capita income (2024A)	FLOAT	AVG	INCPYPCAP	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
106	INCPYAVEHH_48c1d530	INCPYAVEHH	Average household Income (2024A)	FLOAT	AVG	INCPYAVEHH	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...
107	INCPYMEDHH_ce5faa77	INCPYMEDHH	Median household income (2024A)	FLOAT	AVG	INCPYMEDHH	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...	carto-do.ags.demographics_sociodemographics_us...

	DWLCY	DWLPY	HHDCY	HHDPY	POPCY	POPPY	geoid	VPHCY1	do_date	AGECYMED	...	MARCYDIVOR	MARCYNEVER	MARCYWIDOW	RCHCYAMNHS	RCHCYASNHS	RCHCYBLNHS	RCHCYHANHS	RCHCYMUNHS	RCHCYOTNHS	RCHCYWHNHS
0	1057	1112	932	986	1500	1648	040130405071	442	2020-01-01 00:00:00+00:00	77.40	...	149	4	228	0	11	20	0	25	0	1317
1	1964	2069	1774	1877	2595	2868	040130405072	1049	2020-01-01 00:00:00+00:00	76.88	...	414	160	699	0	74	68	7	55	0	2167
2	1049	1101	897	933	1585	1716	040130610182	460	2020-01-01 00:00:00+00:00	69.88	...	31	217	246	2	55	43	9	26	0	1313
3	1084	1137	910	940	1503	1616	040138175002	392	2020-01-01 00:00:00+00:00	71.44	...	191	79	268	8	24	38	0	8	0	1290
4	682	706	574	591	980	1039	040190043241	244	2020-01-01 00:00:00+00:00	72.38	...	30	44	195	3	9	0	0	0	0	902
5	880	910	840	869	1249	1284	060133511032	539	2020-01-01 00:00:00+00:00	76.75	...	160	40	319	0	136	19	2	12	5	1024
6	1467	1534	1314	1467	1658	1800	060590995101	831	2020-01-01 00:00:00+00:00	74.58	...	423	136	496	3	226	10	1	16	0	1269
7	704	753	693	730	1078	1176	060610210391	338	2020-01-01 00:00:00+00:00	73.86	...	117	63	215	5	33	7	0	9	0	986
8	1582	1691	1553	1650	2540	2795	060610236001	818	2020-01-01 00:00:00+00:00	68.80	...	406	45	301	5	168	26	3	19	0	2183
9	1186	1268	1163	1234	1980	2176	060610236002	415	2020-01-01 00:00:00+00:00	68.59	...	253	60	223	5	97	22	1	10	0	1750

	POPCY	POPCYGRP	POPCYGRPI	AGECY0004	AGECY0509	AGECY1014	AGECY1519	AGECY2024	AGECY2529	AGECY3034	...	DWLCYVACNT	DWLCYRENT	DWLCYOWNED	POPPY	HHDPY	DWLPY	AGEPYMED	INCPYPCAP	INCPYAVEHH	INCPYMEDHH
avg	1.520470e+03	3.725678e+01	1.798816e+01	9.047040e+01	9.341024e+01	9.581574e+01	9.726450e+01	1.000845e+02	1.083624e+02	1.044326e+02	...	4.957671e+01	2.116296e+02	3.835577e+02	1.568307e+03	6.075594e+02	6.718555e+02	3.989449e+01	4.292896e+04	1.073098e+05	7.933328e+04
max	6.710000e+04	1.975200e+04	1.205300e+04	5.393000e+03	5.294000e+03	5.195000e+03	7.606000e+03	1.480400e+04	5.767000e+03	5.616000e+03	...	6.547000e+03	1.005700e+04	2.367600e+04	7.584500e+04	2.811500e+04	3.264000e+04	8.750000e+01	3.824975e+06	1.112720e+07	3.500000e+05
min	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	...	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
sum	3.302187e+08	8.091501e+06	3.906704e+06	1.964854e+07	2.028702e+07	2.080945e+07	2.112410e+07	2.173655e+07	2.353436e+07	2.268087e+07	...	1.076717e+07	4.596213e+07	8.330182e+07	3.406080e+08	1.319510e+08	1.459149e+08	8.664364e+06	9.323398e+09	2.330576e+10	1.722976e+10
range	6.710000e+04	1.975200e+04	1.205300e+04	5.393000e+03	5.294000e+03	5.195000e+03	7.606000e+03	1.480400e+04	5.767000e+03	5.616000e+03	...	6.547000e+03	1.005700e+04	2.367600e+04	7.584500e+04	2.811500e+04	3.264000e+04	8.750000e+01	3.824975e+06	1.112720e+07	3.500000e+05
stdev	1.063417e+03	2.428693e+02	1.582057e+02	8.044792e+01	8.338915e+01	8.351758e+01	1.117828e+02	1.244125e+02	9.642758e+01	9.358961e+01	...	9.849786e+01	2.353261e+02	3.163312e+02	1.141981e+03	4.138545e+02	4.467692e+02	7.567177e+00	3.178870e+04	7.835131e+04	4.262018e+04
q1	8.500000e+02	0.000000e+00	0.000000e+00	4.400000e+01	4.400000e+01	4.600000e+01	4.500000e+01	4.300000e+01	5.000000e+01	4.900000e+01	...	1.100000e+01	6.000000e+01	1.820000e+02	8.670000e+02	3.440000e+02	3.840000e+02	3.407000e+01	2.168000e+04	5.536100e+04	4.701800e+04
q3	1.454000e+03	0.000000e+00	0.000000e+00	8.300000e+01	8.600000e+01	8.900000e+01	8.700000e+01	8.600000e+01	9.800000e+01	9.500000e+01	...	3.400000e+01	1.780000e+02	3.750000e+02	1.485000e+03	5.810000e+02	6.480000e+02	4.101000e+01	4.058200e+04	1.012720e+05	7.908300e+04
median	1.125000e+03	0.000000e+00	0.000000e+00	6.200000e+01	6.300000e+01	6.500000e+01	6.400000e+01	6.200000e+01	7.100000e+01	6.900000e+01	...	2.000000e+01	1.080000e+02	2.740000e+02	1.143000e+03	4.520000e+02	5.040000e+02	3.771000e+01	3.056300e+04	7.671300e+04	6.212200e+04
interquartile_range	6.040000e+02	0.000000e+00	0.000000e+00	3.900000e+01	4.200000e+01	4.300000e+01	4.200000e+01	4.300000e+01	4.800000e+01	4.600000e+01	...	2.300000e+01	1.180000e+02	1.930000e+02	6.180000e+02	2.370000e+02	2.640000e+02	6.940000e+00	1.890200e+04	4.591100e+04	3.206500e+04

Data discovery

Introduction

Find demographic data for the US

Dataset and variables metadata

Subscribe to a Dataset in the catalog

Conclusion