Data discovery

Introduction

The Data Observatory is a spatial data repository that enables data scientists to augment their data and broaden their analysis. It offers a wide range of datasets from around the globe.

This guide is intended for those who want to start augmenting their own data using CARTOframes and wish to explore CARTO's public Data Observatory catalog to find datasets that best fit their use cases and analyses.

Note: The catalog is public and you don't need a CARTO account to search for available datasets

Find demographic data for the US

In this guide we walk through the Data Observatory catalog looking for demographics data in the US.

The catalog is comprised of thousands of curated spatial datasets, so when searching for data the easiest way to find what you are looking for is to make use of a faceted search. A faceted (or hierarchical) search allows you to narrow down search results by applying multiple filters based on faceted classification of catalog datasets.

Datasets are organized in three main hierarchies:

  • Country
  • Category
  • Geography (or spatial resolution)

For our analysis we are looking for demographic datasets in the US with a spatial resolution at the block group level.

We can start by discovering which available geographies (or spatial resolutions) we have for demographic data in the US, by filtering the catalog by country and category and listing the available geographies.

Let's start exploring the available categories of data for the US:


In [1]:
from cartoframes.data.observatory import Catalog
Catalog().country('usa').categories


Out[1]:
[<Category.get('covid19')>,
 <Category.get('demographics')>,
 <Category.get('environmental')>,
 <Category.get('financial')>,
 <Category.get('geosocial')>,
 <Category.get('housing')>,
 <Category.get('human_mobility')>,
 <Category.get('points_of_interest')>,
 <Category.get('road_traffic')>]

For the case of the US, the Data Observatory provides six different categories of datasets. Let's discover the available spatial resolutions for the demographics category (which at a first sight will contain the population data we need).


In [2]:
from cartoframes.data.observatory import Catalog
geographies = Catalog().country('usa').category('demographics').geographies
geographies


Out[2]:
[<Geography.get('mbi_blockgroups_535aed6d')>,
 <Geography.get('mbi_counties_46ea8aaa')>,
 <Geography.get('mbi_county_subd_ba170144')>,
 <Geography.get('mbi_pc_5_digit_19e769c1')>,
 <Geography.get('cdb_blockgroup_7753dd51')>,
 <Geography.get('cdb_cbsa_d1b91d3b')>,
 <Geography.get('cdb_censustract_af861cba')>,
 <Geography.get('cdb_congression_478295fd')>,
 <Geography.get('cdb_county_767e79f0')>,
 <Geography.get('cdb_county_8cf054d')>,
 <Geography.get('cdb_place_93d54d1e')>,
 <Geography.get('cdb_puma_56bbc2e')>,
 <Geography.get('cdb_schooldistr_eb48e7bc')>,
 <Geography.get('cdb_schooldistr_18547e3f')>,
 <Geography.get('cdb_schooldistr_d9ca1a26')>,
 <Geography.get('cdb_state_cd83b434')>,
 <Geography.get('cdb_zcta5_f4043497')>]

Let's filter the geographies by those that contain information at the level of blockgroup. For that purpose we are converting the geographies to a pandas DataFrame and search for the string blockgroup in the id of the geographies:


In [3]:
df = geographies.to_dataframe()
df[df['id'].str.contains('blockgroup', case=False, na=False)]


Out[3]:
slug name description country_id provider_id geom_type geom_coverage update_frequency is_public_data lang version provider_name id
0 mbi_blockgroups_535aed6d Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... usa mbi MULTIPOLYGON None yearly False eng 2020 Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020
4 cdb_blockgroup_7753dd51 Census Block Group - United States of America Shoreline clipped TIGER/Line boundaries. More ... usa carto MULTIPOLYGON None None True eng 2015 CARTO carto-do-public-data.carto.geography_usa_block...

We have three available datasets, from three different providers: Michael Bauer International, Open Data and AGS. For this example, we are going to look for demographic datasets for the MBI blockgroups geography mbi_blockgroups_535aed6d:


In [4]:
datasets = Catalog().country('usa').category('demographics').geography('mbi_blockgroups_535aed6d').datasets
datasets


Out[4]:
[<Dataset.get('mbi_consumer_sp_fdc16f97')>,
 <Dataset.get('mbi_households__ec03bf40')>,
 <Dataset.get('mbi_sociodemogr_1c54ac66')>,
 <Dataset.get('mbi_purchasing__faaee3c9')>,
 <Dataset.get('mbi_population_9d1b276f')>,
 <Dataset.get('mbi_retail_spen_6a1acff4')>,
 <Dataset.get('mbi_consumer_pr_c1d4e20e')>,
 <Dataset.get('mbi_households__60466314')>,
 <Dataset.get('mbi_education_8903fc2c')>]

Let's continue with the data discovery. We have 6 datasets in the US with demographics information at the level of MBI blockgroups:


In [5]:
datasets.to_dataframe()


Out[5]:
slug name description category_id country_id data_source_id provider_id geography_name geography_description temporal_aggregation time_coverage update_frequency is_public_data lang version category_name provider_name geography_id id
0 mbi_consumer_sp_fdc16f97 Consumer Spending - United States of America (... MBI Consumer Spending by product groups quanti... demographics usa consumer_spending mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_consumerspending_usa...
1 mbi_households__ec03bf40 Households By Type - United States of America ... Distribution of the households in an area by t... demographics usa households_by_type mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_householdsbytype_usa...
2 mbi_sociodemogr_1c54ac66 Sociodemographics - United States of America (... MBI Sociodemographics includes:\n- Population\... demographics usa sociodemographics mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_sociodemographics_us...
3 mbi_purchasing__faaee3c9 Purchasing Power - United States of America (B... Purchasing Power describes the disposable inco... demographics usa purchasing_power mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_purchasingpower_usa_...
4 mbi_population_9d1b276f Population - United States of America (Blockgr... Population figures are shown as projected aver... demographics usa population mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_population_usa_block...
5 mbi_retail_spen_6a1acff4 Retail Spending - United States of America (Bl... Retail Spending relates to the proportion of P... demographics usa retail_spending mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_retailspending_usa_b...
6 mbi_consumer_pr_c1d4e20e Consumer Profiles - United States of America (... The MB International Consumer Styles describe ... demographics usa consumer_profiles mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_consumerprofiles_usa...
7 mbi_households__60466314 Households By Income Quintiles - United States... On the national level the number of households... demographics usa households_by_income_quintiles mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_householdsbyincomequ...
8 mbi_education_8903fc2c Education - United States of America (Blockgro... Distribution of the population in an area by t... demographics usa education mbi Blockgroups - United States of America MBI Digital Boundaries for USA at Blockgroups ... yearly [2019-01-01, 2020-01-01) yearly False eng 2020 Demographics Michael Bauer International carto-do.mbi.geography_usa_blockgroups_2020 carto-do.mbi.demographics_education_usa_blockg...

They comprise different information: consumer spending, retail potential, consumer profiles, etc.

At a first sight, it looks the dataset with data_source_id: sociodemographic might contain the population information we are looking for. Let's try to understand a little bit better what data this dataset contains by looking at its variables:


In [6]:
from cartoframes.data.observatory import Dataset
dataset = Dataset.get('ags_sociodemogr_a7e14220')
variables = dataset.variables
variables


Out[6]:
[<Variable.get('BLOCKGROUP_30e525a6')> #'Geographic Identifier',
 <Variable.get('POPCY_4534fac4')> #'Population (2019A)',
 <Variable.get('POPCYGRP_3033ef2e')> #'Population in Group Quarters (2019A)',
 <Variable.get('POPCYGRPI_1e42899')> #'Institutional Group Quarters Population (2019A)',
 <Variable.get('AGECY0004_aaae373a')> #'Population age 0-4 (2019A)',
 <Variable.get('AGECY0509_d2d4896c')> #'Population age 5-9 (2019A)',
 <Variable.get('AGECY1014_b09611e')> #'Population age 10-14 (2019A)',
 <Variable.get('AGECY1519_7373df48')> #'Population age 15-19 (2019A)',
 <Variable.get('AGECY2024_32919d33')> #'Population age 20-24 (2019A)',
 <Variable.get('AGECY2529_4aeb2365')> #'Population age 25-29 (2019A)',
 <Variable.get('AGECY3034_9336cb17')> #'Population age 30-34 (2019A)',
 <Variable.get('AGECY3539_eb4c7541')> #'Population age 35-39 (2019A)',
 <Variable.get('AGECY4044_41a06569')> #'Population age 40-44 (2019A)',
 <Variable.get('AGECY4549_39dadb3f')> #'Population age 45-49 (2019A)',
 <Variable.get('AGECY5054_e007334d')> #'Population age 50-54 (2019A)',
 <Variable.get('AGECY5559_987d8d1b')> #'Population age 55-59 (2019A)',
 <Variable.get('AGECY6064_d99fcf60')> #'Population age 60-64 (2019A)',
 <Variable.get('AGECY6569_a1e57136')> #'Population age 65-69 (2019A)',
 <Variable.get('AGECY7074_78389944')> #'Population age 70-74 (2019A)',
 <Variable.get('AGECY7579_422712')> #'Population age 75-79 (2019A)',
 <Variable.get('AGECY8084_a7c395dd')> #'Population age 80-84 (2019A)',
 <Variable.get('AGECYGT85_ac46767d')> #'Population age 85+ (2019A)',
 <Variable.get('AGECYMED_f218d6e9')> #'Median Age (2019A)',
 <Variable.get('SEXCYMAL_8ee6ade5')> #'Population male (2019A)',
 <Variable.get('SEXCYFEM_91d8b796')> #'Population female (2019A)',
 <Variable.get('RCHCYWHNHS_b4cab6fe')> #'Non Hispanic White (2019A)',
 <Variable.get('RCHCYBLNHS_93a8395b')> #'Non Hispanic Black (2019A)',
 <Variable.get('RCHCYAMNHS_6cb424ee')> #'Non Hispanic American Indian (2019A)',
 <Variable.get('RCHCYASNHS_dc720442')> #'Non Hispanic Asian (2019A)',
 <Variable.get('RCHCYHANHS_2b72f927')> #'Non Hispanic Hawaiian/Pacific Islander (2019A)',
 <Variable.get('RCHCYOTNHS_fe95829a')> #'Non Hispanic Other Race (2019A)',
 <Variable.get('RCHCYMUNHS_3ce9b69f')> #'Non Hispanic Multiple Race (2019A)',
 <Variable.get('HISCYHISP_e62d7c2e')> #'Population Hispanic (2019A)',
 <Variable.get('MARCYNEVER_eee4f8c3')> #'Never Married (2019A)',
 <Variable.get('MARCYMARR_17f0d887')> #'Now Married (2019A)',
 <Variable.get('MARCYSEP_d4d69eb8')> #'Separated (2019A)',
 <Variable.get('MARCYWIDOW_5ce5d993')> #'Widowed (2019A)',
 <Variable.get('MARCYDIVOR_146db750')> #'Divorced (2019A)',
 <Variable.get('AGECYGT15_7d84cd34')> #'Population Age 15+ (2019A)',
 <Variable.get('EDUCYLTGR9_ed0362fa')> #'Pop 25+ less than 9th grade (2019A)',
 <Variable.get('EDUCYSHSCH_7a88e398')> #'Pop 25+ 9th-12th grade no diploma (2019A)',
 <Variable.get('EDUCYHSCH_a7a81733')> #'Pop 25+ HS graduate (2019A)',
 <Variable.get('EDUCYSCOLL_3840e65b')> #'Pop 25+ college no diploma (2019A)',
 <Variable.get('EDUCYASSOC_dcd76160')> #'Pop 25+ Associate degree (2019A)',
 <Variable.get('EDUCYBACH_d7b78049')> #'Pop 25+ Bachelor's degree (2019A)',
 <Variable.get('EDUCYGRAD_c58943fb')> #'Pop 25+ graduate or prof school degree (2019A)',
 <Variable.get('AGECYGT25_56a99ef7')> #'Population Age 25+ (2019A)',
 <Variable.get('HHDCY_935c1592')> #'Households (2019A)',
 <Variable.get('HHDCYFAM_c1a6fccf')> #'Family Households (2019A)',
 <Variable.get('HHSCYMCFCH_bd115dc2')> #'Families married couple w children (2019A)',
 <Variable.get('HHSCYLPMCH_ce8863e2')> #'Families male no wife w children (2019A)',
 <Variable.get('HHSCYLPFCH_c2dd8c03')> #'Families female no husband children (2019A)',
 <Variable.get('HHDCYAVESZ_d265f21c')> #'Average Household Size (2019A)',
 <Variable.get('HHDCYMEDAG_4f099151')> #'Median Age of Householder (2019A)',
 <Variable.get('VPHCYNONE_3755ac60')> #'Households: No Vehicle Available (2019A)',
 <Variable.get('VPHCY1_bed441da')> #'Households: One Vehicle Available (2019A)',
 <Variable.get('VPHCYGT1_e4a07c30')> #'Households: Two or More Vehicles Available (2019A)',
 <Variable.get('INCCYPCAP_7c8377cf')> #'Per capita income (2019A)',
 <Variable.get('INCCYAVEHH_1ef75363')> #'Average household Income (2019A)',
 <Variable.get('INCCYMEDHH_98692c24')> #'Median household income (2019A)',
 <Variable.get('INCCYMEDFA_7f36b90e')> #'Median family income (2019A)',
 <Variable.get('HINCYLT10_61c14e29')> #'Household Income < $10000 (2019A)',
 <Variable.get('HINCY1015_c720a11b')> #'Household Income $10000-$14999 (2019A)',
 <Variable.get('HINCY1520_9aacc4bc')> #'Household Income $15000-$19999 (2019A)',
 <Variable.get('HINCY2025_feb85d36')> #'Household Income $20000-$24999 (2019A)',
 <Variable.get('HINCY2530_91025a13')> #'Household Income $25000-$29999 (2019A)',
 <Variable.get('HINCY3035_5f1f0b12')> #'Household Income $30000-$34999 (2019A)',
 <Variable.get('HINCY3540_66ffabb1')> #'Household Income $35000-$39999 (2019A)',
 <Variable.get('HINCY4045_8d89a56c')> #'Household Income $40000-$44999 (2019A)',
 <Variable.get('HINCY4550_e233a249')> #'Household Income $45000-$49999 (2019A)',
 <Variable.get('HINCY5060_77695404')> #'Household Income $50000-$59999 (2019A)',
 <Variable.get('HINCY6075_cad3e24')> #'Household Income $60000-$74999 (2019A)',
 <Variable.get('HINCY75100_bb90c7bb')> #'Household Income $75000-$99999 (2019A)',
 <Variable.get('HINCY10025_40903e13')> #'Household Income $100000-$124999 (2019A)',
 <Variable.get('HINCY12550_d379563b')> #'Household Income $125000-$149999 (2019A)',
 <Variable.get('HINCY15020_7243aae')> #'Household Income $150000-$199999 (2019A)',
 <Variable.get('HINCYGT200_c39e094b')> #'Household Income > $200000 (2019A)',
 <Variable.get('HINCYMED24_4ac9369')> #'Median Household Income: Age < 25 (2019A)',
 <Variable.get('HINCYMED25_73aba3ff')> #'Median Household Income: Age 25-34 (2019A)',
 <Variable.get('HINCYMED35_6ab092be')> #'Median Household Income: Age 35-44 (2019A)',
 <Variable.get('HINCYMED45_25f10479')> #'Median Household Income: Age 45-54 (2019A)',
 <Variable.get('HINCYMED55_3cea3538')> #'Median Household Income: Age 55-64 (2019A)',
 <Variable.get('HINCYMED65_17c766fb')> #'Median Household Income: Age 65-74 (2019A)',
 <Variable.get('HINCYMED75_edc57ba')> #'Median Household Income: Age 75+ (2019A)',
 <Variable.get('LBFCYPOP16_75363c6f')> #'Population Age 16+ (2019A)',
 <Variable.get('LBFCYARM_c8f45b67')> #'Pop 16+ in Armed Forces (2019A)',
 <Variable.get('LBFCYEMPL_1902fd90')> #'Pop 16+ civilian employed (2019A)',
 <Variable.get('LBFCYUNEM_befc2d4')> #'Pop 16+ civilian unemployed (2019A)',
 <Variable.get('LBFCYNLF_803bfa0d')> #'Pop 16+ not in labor force (2019A)',
 <Variable.get('UNECYRATE_a642ed8a')> #'Unemployment Rate (2019A)',
 <Variable.get('LBFCYLBF_1d3c03ed')> #'Population In Labor Force (2019A)',
 <Variable.get('LNIEXSPAN_8f8728c7')> #'SPANISH SPEAKING HOUSEHOLDS',
 <Variable.get('LNIEXISOL_c2e86dc7')> #'LINGUISTICALLY ISOLATED HOUSEHOLDS (NON-ENGLISH SP...',
 <Variable.get('HOOEXMED_48df3206')> #'Median Value of Owner Occupied Housing Units',
 <Variable.get('RNTEXMED_6ac2e609')> #'Median Cash Rent',
 <Variable.get('HUSEX1DET_231a9f6c')> #'UNITS IN STRUCTURE: 1 DETACHED',
 <Variable.get('HUSEXAPT_dc7d3c72')> #'UNITS IN STRUCTURE: 20 OR MORE',
 <Variable.get('DWLCY_50c5eee2')> #'Housing units (2019A)',
 <Variable.get('DWLCYVACNT_6b929d9a')> #'Housing units vacant (2019A)',
 <Variable.get('DWLCYRENT_3601a69e')> #'Occupied units renter (2019A)',
 <Variable.get('DWLCYOWNED_858b3ad6')> #'Occupied units owner (2019A)',
 <Variable.get('POPPY_24dbbb56')> #'Population (2024A)',
 <Variable.get('HHDPY_f2b35400')> #'Households (2024A)',
 <Variable.get('DWLPY_312aaf70')> #'Housing units (2024A)',
 <Variable.get('AGEPYMED_d5583bbb')> #'Median Age (2024A)',
 <Variable.get('INCPYPCAP_f9c107fa')> #'Per capita income (2024A)',
 <Variable.get('INCPYAVEHH_48c1d530')> #'Average household Income (2024A)',
 <Variable.get('INCPYMEDHH_ce5faa77')> #'Median household income (2024A)']

In [7]:
from cartoframes.data.observatory import Dataset
vdf = variables.to_dataframe()
vdf


Out[7]:
slug name description db_type agg_method column_name variable_group_id dataset_id id
0 BLOCKGROUP_30e525a6 BLOCKGROUP Geographic Identifier STRING None BLOCKGROUP None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
1 POPCY_4534fac4 Total Population Population (2019A) INTEGER SUM POPCY None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
2 POPCYGRP_3033ef2e POPCYGRP Population in Group Quarters (2019A) INTEGER SUM POPCYGRP None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
3 POPCYGRPI_1e42899 POPCYGRPI Institutional Group Quarters Population (2019A) INTEGER SUM POPCYGRPI None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
4 AGECY0004_aaae373a AGECY0004 Population age 0-4 (2019A) INTEGER SUM AGECY0004 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
... ... ... ... ... ... ... ... ... ...
103 DWLPY_312aaf70 Number of household units Housing units (2024A) INTEGER SUM DWLPY None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
104 AGEPYMED_d5583bbb AGEPYMED Median Age (2024A) FLOAT AVG AGEPYMED carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
105 INCPYPCAP_f9c107fa INCPYPCAP Per capita income (2024A) FLOAT AVG INCPYPCAP carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
106 INCPYAVEHH_48c1d530 INCPYAVEHH Average household Income (2024A) FLOAT AVG INCPYAVEHH carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
107 INCPYMEDHH_ce5faa77 INCPYMEDHH Median household income (2024A) FLOAT AVG INCPYMEDHH carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...

108 rows × 9 columns

We can see there are several variables related to population, so this is the Dataset we are looking for.


In [8]:
vdf[vdf['description'].str.contains('pop', case=False, na=False)]


Out[8]:
slug name description db_type agg_method column_name variable_group_id dataset_id id
1 POPCY_4534fac4 Total Population Population (2019A) INTEGER SUM POPCY None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
2 POPCYGRP_3033ef2e POPCYGRP Population in Group Quarters (2019A) INTEGER SUM POPCYGRP None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
3 POPCYGRPI_1e42899 POPCYGRPI Institutional Group Quarters Population (2019A) INTEGER SUM POPCYGRPI None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
4 AGECY0004_aaae373a AGECY0004 Population age 0-4 (2019A) INTEGER SUM AGECY0004 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
5 AGECY0509_d2d4896c AGECY0509 Population age 5-9 (2019A) INTEGER SUM AGECY0509 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
6 AGECY1014_b09611e AGECY1014 Population age 10-14 (2019A) INTEGER SUM AGECY1014 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
7 AGECY1519_7373df48 AGECY1519 Population age 15-19 (2019A) INTEGER SUM AGECY1519 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
8 AGECY2024_32919d33 AGECY2024 Population age 20-24 (2019A) INTEGER SUM AGECY2024 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
9 AGECY2529_4aeb2365 AGECY2529 Population age 25-29 (2019A) INTEGER SUM AGECY2529 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
10 AGECY3034_9336cb17 AGECY3034 Population age 30-34 (2019A) INTEGER SUM AGECY3034 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
11 AGECY3539_eb4c7541 AGECY3539 Population age 35-39 (2019A) INTEGER SUM AGECY3539 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
12 AGECY4044_41a06569 AGECY4044 Population age 40-44 (2019A) INTEGER SUM AGECY4044 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
13 AGECY4549_39dadb3f AGECY4549 Population age 45-49 (2019A) INTEGER SUM AGECY4549 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
14 AGECY5054_e007334d AGECY5054 Population age 50-54 (2019A) INTEGER SUM AGECY5054 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
15 AGECY5559_987d8d1b AGECY5559 Population age 55-59 (2019A) INTEGER SUM AGECY5559 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
16 AGECY6064_d99fcf60 AGECY6064 Population age 60-64 (2019A) INTEGER SUM AGECY6064 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
17 AGECY6569_a1e57136 AGECY6569 Population age 65-69 (2019A) INTEGER SUM AGECY6569 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
18 AGECY7074_78389944 AGECY7074 Population age 70-74 (2019A) INTEGER SUM AGECY7074 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
19 AGECY7579_422712 AGECY7579 Population age 75-79 (2019A) INTEGER SUM AGECY7579 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
20 AGECY8084_a7c395dd AGECY8084 Population age 80-84 (2019A) INTEGER SUM AGECY8084 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
21 AGECYGT85_ac46767d AGECYGT85 Population age 85+ (2019A) INTEGER SUM AGECYGT85 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
23 SEXCYMAL_8ee6ade5 SEXCYMAL Population male (2019A) INTEGER SUM SEXCYMAL carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
24 SEXCYFEM_91d8b796 SEXCYFEM Population female (2019A) INTEGER SUM SEXCYFEM carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
32 HISCYHISP_e62d7c2e HISCYHISP Population Hispanic (2019A) INTEGER SUM HISCYHISP carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
38 AGECYGT15_7d84cd34 AGECYGT15 Population Age 15+ (2019A) INTEGER SUM AGECYGT15 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
39 EDUCYLTGR9_ed0362fa EDUCYLTGR9 Pop 25+ less than 9th grade (2019A) INTEGER SUM EDUCYLTGR9 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
40 EDUCYSHSCH_7a88e398 EDUCYSHSCH Pop 25+ 9th-12th grade no diploma (2019A) INTEGER SUM EDUCYSHSCH carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
41 EDUCYHSCH_a7a81733 EDUCYHSCH Pop 25+ HS graduate (2019A) INTEGER SUM EDUCYHSCH carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
42 EDUCYSCOLL_3840e65b EDUCYSCOLL Pop 25+ college no diploma (2019A) INTEGER SUM EDUCYSCOLL carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
43 EDUCYASSOC_dcd76160 EDUCYASSOC Pop 25+ Associate degree (2019A) INTEGER SUM EDUCYASSOC carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
44 EDUCYBACH_d7b78049 EDUCYBACH Pop 25+ Bachelor's degree (2019A) INTEGER SUM EDUCYBACH carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
45 EDUCYGRAD_c58943fb EDUCYGRAD Pop 25+ graduate or prof school degree (2019A) INTEGER SUM EDUCYGRAD carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
46 AGECYGT25_56a99ef7 AGECYGT25 Population Age 25+ (2019A) INTEGER SUM AGECYGT25 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
84 LBFCYPOP16_75363c6f LBFCYPOP16 Population Age 16+ (2019A) INTEGER SUM LBFCYPOP16 carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
85 LBFCYARM_c8f45b67 LBFCYARM Pop 16+ in Armed Forces (2019A) INTEGER SUM LBFCYARM carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
86 LBFCYEMPL_1902fd90 LBFCYEMPL Pop 16+ civilian employed (2019A) INTEGER SUM LBFCYEMPL carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
87 LBFCYUNEM_befc2d4 LBFCYUNEM Pop 16+ civilian unemployed (2019A) INTEGER SUM LBFCYUNEM carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
88 LBFCYNLF_803bfa0d LBFCYNLF Pop 16+ not in labor force (2019A) INTEGER SUM LBFCYNLF carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
90 LBFCYLBF_1d3c03ed LBFCYLBF Population In Labor Force (2019A) INTEGER SUM LBFCYLBF carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...
101 POPPY_24dbbb56 Total population Population (2024A) INTEGER SUM POPPY None carto-do.ags.demographics_sociodemographics_us... carto-do.ags.demographics_sociodemographics_us...

Dataset and variables metadata

The Data Observatory catalog is not only a repository of curated spatial datasets, it also contains valuable information that helps on understanding better the underlying data for every dataset, so you can take an informed decision on what data best fits your problem.

Some of the augmented metadata you can find for each dataset in the catalog is:

  • head and tail methods to get a glimpse of the actual data. This helps you to understand the available columns, data types, etc. To start modelling your problem right away.
  • geom_coverage to visualize on a map the geographical coverage of the data in the Dataset.
  • counts, fields_by_type and a full describe method with stats of the actual values in the dataset, such as: average, stdev, quantiles, min, max, median for each of the variables of the dataset.

You don't need a subscription to a dataset to be able to query the augmented metadata, it's just publicly available for anyone exploring the Data Observatory catalog.

Let's overview some of that information, starting by getting a glimpse of the ten first or last rows of the actual data of the dataset:


In [9]:
from cartoframes.data.observatory import Dataset
dataset = Dataset.get('ags_sociodemogr_a7e14220')

In [10]:
dataset.head()


Out[10]:
DWLCY DWLPY HHDCY HHDPY POPCY POPPY geoid VPHCY1 do_date AGECYMED ... MARCYDIVOR MARCYNEVER MARCYWIDOW RCHCYAMNHS RCHCYASNHS RCHCYBLNHS RCHCYHANHS RCHCYMUNHS RCHCYOTNHS RCHCYWHNHS
0 1057 1112 932 986 1500 1648 040130405071 442 2020-01-01 00:00:00+00:00 77.40 ... 149 4 228 0 11 20 0 25 0 1317
1 1964 2069 1774 1877 2595 2868 040130405072 1049 2020-01-01 00:00:00+00:00 76.88 ... 414 160 699 0 74 68 7 55 0 2167
2 1049 1101 897 933 1585 1716 040130610182 460 2020-01-01 00:00:00+00:00 69.88 ... 31 217 246 2 55 43 9 26 0 1313
3 1084 1137 910 940 1503 1616 040138175002 392 2020-01-01 00:00:00+00:00 71.44 ... 191 79 268 8 24 38 0 8 0 1290
4 682 706 574 591 980 1039 040190043241 244 2020-01-01 00:00:00+00:00 72.38 ... 30 44 195 3 9 0 0 0 0 902
5 880 910 840 869 1249 1284 060133511032 539 2020-01-01 00:00:00+00:00 76.75 ... 160 40 319 0 136 19 2 12 5 1024
6 1467 1534 1314 1467 1658 1800 060590995101 831 2020-01-01 00:00:00+00:00 74.58 ... 423 136 496 3 226 10 1 16 0 1269
7 704 753 693 730 1078 1176 060610210391 338 2020-01-01 00:00:00+00:00 73.86 ... 117 63 215 5 33 7 0 9 0 986
8 1582 1691 1553 1650 2540 2795 060610236001 818 2020-01-01 00:00:00+00:00 68.80 ... 406 45 301 5 168 26 3 19 0 2183
9 1186 1268 1163 1234 1980 2176 060610236002 415 2020-01-01 00:00:00+00:00 68.59 ... 253 60 223 5 97 22 1 10 0 1750

10 rows × 110 columns

Alternatively, you can get the last ten ones with dataset.tail()

An overview of the coverage of the dataset


In [11]:
dataset.geom_coverage()


Out[11]:
:
StackTrace
    ">

    Some stats about the dataset:

    
    
    In [12]:
    dataset.counts()
    
    
    
    
    Out[12]:
    rows                    217182.0
    cells                 23890020.0
    null_cells                   0.0
    null_cells_percent           0.0
    dtype: float64
    
    
    In [13]:
    dataset.fields_by_type()
    
    
    
    
    Out[13]:
    float          5
    string         2
    integer      102
    timestamp      1
    dtype: int64
    
    
    In [14]:
    dataset.describe()
    
    
    
    
    Out[14]:
    POPCY POPCYGRP POPCYGRPI AGECY0004 AGECY0509 AGECY1014 AGECY1519 AGECY2024 AGECY2529 AGECY3034 ... DWLCYVACNT DWLCYRENT DWLCYOWNED POPPY HHDPY DWLPY AGEPYMED INCPYPCAP INCPYAVEHH INCPYMEDHH
    avg 1.520470e+03 3.725678e+01 1.798816e+01 9.047040e+01 9.341024e+01 9.581574e+01 9.726450e+01 1.000845e+02 1.083624e+02 1.044326e+02 ... 4.957671e+01 2.116296e+02 3.835577e+02 1.568307e+03 6.075594e+02 6.718555e+02 3.989449e+01 4.292896e+04 1.073098e+05 7.933328e+04
    max 6.710000e+04 1.975200e+04 1.205300e+04 5.393000e+03 5.294000e+03 5.195000e+03 7.606000e+03 1.480400e+04 5.767000e+03 5.616000e+03 ... 6.547000e+03 1.005700e+04 2.367600e+04 7.584500e+04 2.811500e+04 3.264000e+04 8.750000e+01 3.824975e+06 1.112720e+07 3.500000e+05
    min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
    sum 3.302187e+08 8.091501e+06 3.906704e+06 1.964854e+07 2.028702e+07 2.080945e+07 2.112410e+07 2.173655e+07 2.353436e+07 2.268087e+07 ... 1.076717e+07 4.596213e+07 8.330182e+07 3.406080e+08 1.319510e+08 1.459149e+08 8.664364e+06 9.323398e+09 2.330576e+10 1.722976e+10
    range 6.710000e+04 1.975200e+04 1.205300e+04 5.393000e+03 5.294000e+03 5.195000e+03 7.606000e+03 1.480400e+04 5.767000e+03 5.616000e+03 ... 6.547000e+03 1.005700e+04 2.367600e+04 7.584500e+04 2.811500e+04 3.264000e+04 8.750000e+01 3.824975e+06 1.112720e+07 3.500000e+05
    stdev 1.063417e+03 2.428693e+02 1.582057e+02 8.044792e+01 8.338915e+01 8.351758e+01 1.117828e+02 1.244125e+02 9.642758e+01 9.358961e+01 ... 9.849786e+01 2.353261e+02 3.163312e+02 1.141981e+03 4.138545e+02 4.467692e+02 7.567177e+00 3.178870e+04 7.835131e+04 4.262018e+04
    q1 8.500000e+02 0.000000e+00 0.000000e+00 4.400000e+01 4.400000e+01 4.600000e+01 4.500000e+01 4.300000e+01 5.000000e+01 4.900000e+01 ... 1.100000e+01 6.000000e+01 1.820000e+02 8.670000e+02 3.440000e+02 3.840000e+02 3.407000e+01 2.168000e+04 5.536100e+04 4.701800e+04
    q3 1.454000e+03 0.000000e+00 0.000000e+00 8.300000e+01 8.600000e+01 8.900000e+01 8.700000e+01 8.600000e+01 9.800000e+01 9.500000e+01 ... 3.400000e+01 1.780000e+02 3.750000e+02 1.485000e+03 5.810000e+02 6.480000e+02 4.101000e+01 4.058200e+04 1.012720e+05 7.908300e+04
    median 1.125000e+03 0.000000e+00 0.000000e+00 6.200000e+01 6.300000e+01 6.500000e+01 6.400000e+01 6.200000e+01 7.100000e+01 6.900000e+01 ... 2.000000e+01 1.080000e+02 2.740000e+02 1.143000e+03 4.520000e+02 5.040000e+02 3.771000e+01 3.056300e+04 7.671300e+04 6.212200e+04
    interquartile_range 6.040000e+02 0.000000e+00 0.000000e+00 3.900000e+01 4.200000e+01 4.300000e+01 4.200000e+01 4.300000e+01 4.800000e+01 4.600000e+01 ... 2.300000e+01 1.180000e+02 1.930000e+02 6.180000e+02 2.370000e+02 2.640000e+02 6.940000e+00 1.890200e+04 4.591100e+04 3.206500e+04

    10 rows × 107 columns

    Every Dataset instance in the catalog contains other useful metadata:

    • slug: A short ID
    • name and description: Free text attributes
    • country
    • geography: Every dataset is related to a Geography instance
    • category
    • provider
    • data source
    • lang
    • temporal aggregation
    • time coverage
    • update frequency
    • version
    • is_public_data: whether you need a license to use the dataset for enrichment purposes or not
    
    
    In [15]:
    dataset.to_dict()
    
    
    
    
    Out[15]:
    {'slug': 'ags_sociodemogr_a7e14220',
     'name': 'Sociodemographics - United States of America (Census Block Group)',
     'description': 'Census and ACS sociodemographic data estimated for the current year and data projected to five years. Projected fields are general aggregates (total population, total households, median age, avg income etc.)',
     'category_id': 'demographics',
     'country_id': 'usa',
     'data_source_id': 'sociodemographics',
     'provider_id': 'ags',
     'geography_name': 'Census Block Group - United States of America',
     'geography_description': None,
     'temporal_aggregation': 'yearly',
     'time_coverage': None,
     'update_frequency': 'yearly',
     'is_public_data': False,
     'lang': 'eng',
     'version': '2020',
     'category_name': 'Demographics',
     'provider_name': 'Applied Geographic Solutions',
     'geography_id': 'carto-do.ags.geography_usa_blockgroup_2015',
     'id': 'carto-do.ags.demographics_sociodemographics_usa_blockgroup_2015_yearly_2020'}

    There's also some intersting metadata, for each variable in the dataset:

    • id
    • slug: A short ID
    • name and description
    • column_name: Actual column name in the table that contains the data
    • db_type: SQL type in the database
    • dataset_id
    • agg_method: Aggregation method used
    • temporal aggregation and time coverage

    Variables are the most important asset in the catalog and when exploring datasets in the Data Observatory catalog it's very important that you understand clearly what variables are available to enrich your own data.

    For each Variable in each dataset, the Data Observatory provides (as it does with datasets) a set of methods and attributes to understand their underlaying data.

    Some of them are:

    • head and tail methods to get a glimpse of the actual data and start modelling your problem right away.
    • counts, quantiles and a full describe method with stats of the actual values in the dataset, such as: average, stdev, quantiles, min, max, median for each of the variables of the dataset.
    • an histogram plot with the distribution of the values on each variable.

    Let's overview some of that augmented metadata for the variables in the AGS population dataset.

    
    
    In [16]:
    from cartoframes.data.observatory import Variable
    variable = Variable.get('POPCY_4534fac4')
    variable
    
    
    
    
    Out[16]:
    <Variable.get('POPCY_4534fac4')> #'Population (2019A)'
    
    
    In [17]:
    variable.to_dict()
    
    
    
    
    Out[17]:
    {'slug': 'POPCY_4534fac4',
     'name': 'Total Population',
     'description': 'Population (2019A)',
     'db_type': 'INTEGER',
     'agg_method': 'SUM',
     'column_name': 'POPCY',
     'variable_group_id': None,
     'dataset_id': 'carto-do.ags.demographics_sociodemographics_usa_blockgroup_2015_yearly_2020',
     'id': 'carto-do.ags.demographics_sociodemographics_usa_blockgroup_2015_yearly_2020.POPCY'}

    There's also some utility methods ot understand the underlying data for each variable:

    
    
    In [18]:
    variable.head()
    
    
    
    
    Out[18]:
    0    1500
    1    2595
    2    1585
    3    1503
    4     980
    5    1249
    6    1658
    7    1078
    8    2540
    9    1980
    dtype: int64
    
    
    In [19]:
    variable.counts()
    
    
    
    
    Out[19]:
    all                 217182.000000
    null                     0.000000
    zero                   299.000000
    extreme               9073.000000
    distinct              6756.000000
    outliers             26998.000000
    null_percent             0.000000
    zero_percent             0.137673
    extreme_percent          0.041776
    distinct_percent         3.110755
    outliers_percent         0.124310
    dtype: float64
    
    
    In [20]:
    variable.quantiles()
    
    
    
    
    Out[20]:
    q1                      850
    q3                     1454
    median                 1125
    interquartile_range     604
    dtype: int64
    
    
    In [21]:
    variable.histogram()
    
    
    
    
    <Figure size 1200x700 with 1 Axes>
    
    
    In [22]:
    variable.describe()
    
    
    
    
    Out[22]:
    avg                    1.520470e+03
    max                    6.710000e+04
    min                    0.000000e+00
    sum                    3.302187e+08
    range                  6.710000e+04
    stdev                  1.063417e+03
    q1                     8.500000e+02
    q3                     1.454000e+03
    median                 1.125000e+03
    interquartile_range    6.040000e+02
    dtype: float64

    Subscribe to a Dataset in the catalog

    Once you have explored the catalog and have detected a dataset with the variables you need for your analysis and the right spatial resolution, you have to look at the is_public_data to know if you can just use it from CARTOframes or you first need to subscribe for a license.

    Subscriptions to datasets allow you to use them from CARTOframes to enrich your own data or to download them. See the enrichment guides for more information about this.

    Let's see the dataset and geography in our previous example:

    
    
    In [23]:
    dataset = Dataset.get('ags_sociodemogr_a7e14220')
    
    
    
    In [24]:
    dataset.is_public_data
    
    
    
    
    Out[24]:
    False
    
    
    In [25]:
    from cartoframes.data.observatory import Geography
    geography = Geography.get(dataset.geography)
    
    
    
    In [26]:
    geography.is_public_data
    
    
    
    
    Out[26]:
    False

    Both dataset and geography are not public data, that means you need a subscription to be able to use them to enrich your own data.

    To subscribe to data in the Data Observatory catalog you need a CARTO account with access to Data Observatory

    
    
    In [27]:
    from cartoframes.auth import set_default_credentials
    
    set_default_credentials('creds.json')
    
    
    
    In [28]:
    dataset.subscribe()
    
    
    
    
    
    
    In [29]:
    geography.subscribe()
    
    
    
    

    Licenses to data in the Data Observatory grant you the right to use the data subscribed for the period of one year. Every dataset or geography you want to use to enrich your own data, as long as they are not public data, require a valid license.

    You can check the actual status of your subscriptions directly from the catalog.

    
    
    In [30]:
    Catalog().subscriptions()
    
    
    
    
    Out[30]:
    Datasets: [<Dataset.get('ags_businesscou_df363a87')>, <Dataset.get('ags_crimerisk_e9cfa4d4')>, <Dataset.get('ags_retailpoten_aaf25a8c')>, <Dataset.get('ags_sociodemogr_f510a947')>, <Dataset.get('ags_sociodemogr_a7e14220')>]
    Geographies: [<Geography.get('cdb_blockgroup_7753dd51')>, <Geography.get('ags_blockgroup_1c63771c')>]

    Conclusion

    In this guide you've seen how to explore the Data Observatory catalog to identify variables of datasets that you can use to enrich your own data.

    You've learned how to:

    • Explore the catalog using nested hierarchical filters.
    • Describe the three main entities in the catalog: Geography, Dataset and their Variables.
    • Taken a look at the data and stats taken from the actual repository, to make a more informed decision on which variables to choose.
    • How to subscribe to the chosen dataset to get a license that grants the right to enrich your own data.

    We also recommend checking out the resources below to learn more about the Data Observatory catalog: