Quickstart

Introduction

Hi! Glad to see you made it to the Quickstart guide! In this guide you are introduced to how CARTOframes can be used by data scientists in spatial analysis workflows. Using simulated Starbucks revenue data, this guide walks through some common steps a data scientist takes to answer the following question: which stores are performing better than others?

Before you get started, we encourage you to have CARTOframes and Python 3 installed so you can get a feel for the library by using it:


In [1]:
pip install cartoframes

For additional ways to install CARTOframes, check out the Installation Guide.

Spatial analysis scenario

Let's say you are a data scientist working for Starbucks and you want to better understand why some stores in Brooklyn, New York, perform better than others.

To begin, let's outline a workflow:

  • Get and explore your company's data
  • Create areas of influence for your stores
  • Enrich your data with demographic data
  • And finally, share the results of your analysis with your team

Let's get started!

Get and explore your company's data

We are going to use a dataset that contains information about the location of Starbucks and each store's annual revenue.

As a first exploratory step, read it into a Jupyter Notebook using pandas.


In [2]:
import pandas as pd

stores_df = pd.read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn.csv')
stores_df.head()


Out[2]:
name address revenue
0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772
1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418
2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699
3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676
4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411

To display your stores as points on a map, you first have to convert the address column into geometries. This process is called geocoding and CARTO provides a straightforward way to do it (you can learn more about it in the Location Data Services guide).

In order to geocode, you have to set your CARTO credentials. If you aren't sure about your API key, check the Authentication guide to learn how to get it. In case you want to see the geocoded result, without being logged in, you can get it here.

Note: If you don't have an account yet, you can get a trial, or a free account if you are a student, by signing up here.


In [3]:
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')

Now that your credentials are set, we are ready to geocode the dataframe. The resulting data will be a GeoDataFrame.


In [4]:
from cartoframes.data.services import Geocoding

stores_gdf, _ = Geocoding().geocode(stores_df, street='address')
stores_gdf.head()


Success! Data geocoded correctly
Out[4]:
the_geom name address revenue gc_status_rel carto_geocode_hash
0 POINT (-73.95746 40.67102) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.91 9212e0e908d8c64d07c6a94827322397
1 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.97 b1bbfe2893914a350193969a682dc1f5
2 POINT (-73.98978 40.61944) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.94 e47cf7b16d6c9b53c63e86a0418add1d
3 POINT (-74.02750 40.63202) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.94 2f21749c02f73116892eb3b6fd5d5738
4 POINT (-74.00098 40.59321) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.94 134c23973313802448365db6235783f9

Done! Now that the stores are geocoded, you will notice a new column named geometry has been added. This column stores the geographic location of each store and it's used to plot each location on the map.

You can quickly visualize your geocoded dataframe using the Map and Layer classes. Check out the Visualization guide to learn more about the visualization capabilities inside of CARTOframes.


In [5]:
from cartoframes.viz import Map, Layer

Map(Layer(stores_gdf))


Out[5]:
:
StackTrace
    ">

    Great! You have a map!

    With the stores plotted on the map, you now have a better sense about where each one is. To continue your exploration, you want to know which stores earn the most yearly revenue. To do this, you can use the size_continuous_style visualization layer:

    
    
    In [6]:
    from cartoframes.viz import Map, Layer, size_continuous_style
    
    Map(Layer(stores_gdf, size_continuous_style('revenue', size_range=[10,40]), title='Annual Revenue ($)'))
    
    
    
    
    Out[6]:
    :
    StackTrace
      ">

      Good job! By using the size continuous visualization style you can see right away where the stores with higher revenue are. By default, visualization styles also provide a popup with the mapped value and an appropriate legend.

      Create your areas of influence

      Similar to geocoding, there is a straightforward method for creating isochrones to define areas of influence around each store. Isochrones are concentric polygons that display equally calculated levels over a given surface area measured by time.

      For this analysis, let's create isochrones for each store that cover the area within a 15 minute walk.

      To do this you will use the Isolines data service:

      
      
      In [7]:
      from cartoframes.data.services import Isolines
      
      isochrones_gdf, _ = Isolines().isochrones(stores_gdf, [15*60], mode='walk')
      isochrones_gdf.head()
      
      
      
      
      Success! Isolines created correctly
      
      Out[7]:
      source_id data_range the_geom
      0 0 900 MULTIPOLYGON (((-73.96743 40.67345, -73.96683 ...
      1 1 900 MULTIPOLYGON (((-73.97017 40.57732, -73.96957 ...
      2 2 900 MULTIPOLYGON (((-73.99781 40.62418, -73.99755 ...
      3 3 900 MULTIPOLYGON (((-74.03678 40.63431, -74.03618 ...
      4 4 900 MULTIPOLYGON (((-74.00451 40.59723, -74.00391 ...
      
      
      In [8]:
      stores_map = Map([
          Layer(isochrones_gdf),
          Layer(stores_gdf, size_continuous_style('revenue', size_range=[10,40]), title='Annual Revenue ($)')
      ])
      
      stores_map
      
      
      
      
      Out[8]:
      :
      StackTrace
        ">

        There they are! To learn more about creating isochrones and isodistances check out the Location Data Services guide.

        Note: You will see how to publish a map in the last section. If you already want to publish this map, you can do it by calling stores_map.publish('starbucks_isochrones', password=None).

        Enrich your data with demographic data

        Now that you have the area of influence calculated for each store, let's take a look at how to augment the result with population information to help better understand a store's average revenue per person.

        Note: To be able to use the Enrichment functions you need an enterprise CARTO account with Data Observatory 2.0 enabled. Contact your CSM or contact us at sales@carto.com for more information.

        First, let's find the demographic variable we need. We will use the Catalog class that can be filter by country and category. In our case, we have to look for USA demographics datasets. Let's see which public ones are available.

        
        
        In [9]:
        from cartoframes.data.observatory import Catalog
        
        datasets_df = Catalog().country('usa').category('demographics').datasets.to_dataframe()
        datasets_df[datasets_df['is_public_data'] == True]
        
        
        
        
        Out[9]:
        slug name description category_id country_id data_source_id provider_id geography_name geography_description temporal_aggregation time_coverage update_frequency is_public_data lang version category_name provider_name geography_id id
        2 acs_sociodemogr_ecbce31e Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs County - United States of America Shoreline clipped TIGER/Line boundaries. More ... 3yrs [2007-01-01, 2010-01-01) yearly True eng 20072009 Demographics American Community Survey carto-do-public-data.carto.geography_usa_count... carto-do-public-data.usa_acs.demographics_soci...
        3 acs_sociodemogr_516e1d44 Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs County - United States of America Shoreline clipped TIGER/Line boundaries. More ... 3yrs [2011-01-01, 2014-01-01) yearly True eng 20112013 Demographics American Community Survey carto-do-public-data.carto.geography_usa_count... carto-do-public-data.usa_acs.demographics_soci...
        4 acs_sociodemogr_477ca600 Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs County - United States of America Shoreline clipped TIGER/Line boundaries. More ... yearly [2009-01-01, 2010-01-01) yearly True eng 2009 Demographics American Community Survey carto-do-public-data.carto.geography_usa_count... carto-do-public-data.usa_acs.demographics_soci...
        6 acs_sociodemogr_5f00d4dc Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs Core-based Statistical Area - United States of... Shoreline clipped TIGER/Line boundaries. More ... 5yrs [2007-01-01, 2012-01-01) yearly True eng 20072011 Demographics American Community Survey carto-do-public-data.carto.geography_usa_cbsa_... carto-do-public-data.usa_acs.demographics_soci...
        11 acs_sociodemogr_18e867ac Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs County - United States of America Shoreline clipped TIGER/Line boundaries. More ... 5yrs [2011-01-01, 2016-01-01) yearly True eng 20112015 Demographics American Community Survey carto-do-public-data.carto.geography_usa_count... carto-do-public-data.usa_acs.demographics_soci...
        ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
        365 acs_sociodemogr_1b4fe990 Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs State - United States of America Shoreline clipped TIGER/Line boundaries. More ... 5yrs [2006-01-01, 2011-01-01) yearly True eng 20062010 Demographics American Community Survey carto-do-public-data.carto.geography_usa_state... carto-do-public-data.usa_acs.demographics_soci...
        366 acs_sociodemogr_5bdf6842 Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs State - United States of America Shoreline clipped TIGER/Line boundaries. More ... 3yrs [2011-01-01, 2014-01-01) yearly True eng 20112013 Demographics American Community Survey carto-do-public-data.carto.geography_usa_state... carto-do-public-data.usa_acs.demographics_soci...
        367 acs_sociodemogr_5128f0b6 Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs State - United States of America Shoreline clipped TIGER/Line boundaries. More ... 5yrs [2007-01-01, 2012-01-01) yearly True eng 20072011 Demographics American Community Survey carto-do-public-data.carto.geography_usa_state... carto-do-public-data.usa_acs.demographics_soci...
        368 acs_sociodemogr_880f964f Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs School District (elementary) - United States o... Shoreline clipped TIGER/Line boundaries. More ... yearly [2015-01-01, 2016-01-01) yearly True eng 2015 Demographics American Community Survey carto-do-public-data.carto.geography_usa_schoo... carto-do-public-data.usa_acs.demographics_soci...
        369 acs_sociodemogr_ff08a6d9 Sociodemographics - United States of America (... The American Community Survey (ACS) is an ongo... demographics usa sociodemographics usa_acs School District (elementary) - United States o... Shoreline clipped TIGER/Line boundaries. More ... yearly [2014-01-01, 2015-01-01) yearly True eng 2014 Demographics American Community Survey carto-do-public-data.carto.geography_usa_schoo... carto-do-public-data.usa_acs.demographics_soci...

        321 rows × 19 columns

        Nice! Let's take the first one (acs_sociodemogr_b758e778) that has aggregated data from 2013 to 2018 and check which of its variables have data about the total population.

        
        
        In [10]:
        from cartoframes.data.observatory import Dataset
        
        dataset = Dataset.get('acs_sociodemogr_b758e778')
        variables_df = dataset.variables.to_dataframe()
        variables_df[variables_df['description'].str.contains('total population', case=False, na=False)]
        
        
        
        
        Out[10]:
        slug name description db_type agg_method column_name variable_group_id dataset_id id
        2 total_pop_3cf008b3 Total Population Total Population. The total number of all peop... FLOAT SUM total_pop None carto-do-public-data.usa_acs.demographics_soci... carto-do-public-data.usa_acs.demographics_soci...
        68 income_per_capi_8a9352e0 Income per capita Per Capita Income in the past 12 Months. Per c... FLOAT AVG income_per_capita None carto-do-public-data.usa_acs.demographics_soci... carto-do-public-data.usa_acs.demographics_soci...

        We can see the variable that contains the total population is the one with the slug total_pop_3cf008b3. Now we are ready to enrich our areas of influence with that variable.

        
        
        In [11]:
        from cartoframes.data.observatory import Variable
        from cartoframes.data.observatory import Enrichment
        
        variable = Variable.get('total_pop_3cf008b3')
        
        isochrones_gdf = Enrichment().enrich_polygons(isochrones_gdf, [variable])
        isochrones_gdf.head()
        
        
        
        
        Out[11]:
        source_id data_range the_geom total_pop
        0 0 900 MULTIPOLYGON (((-73.96743 40.67345, -73.96683 ... 39029.654005
        1 1 900 MULTIPOLYGON (((-73.97017 40.57732, -73.96957 ... 32309.490606
        2 2 900 MULTIPOLYGON (((-73.99781 40.62418, -73.99755 ... 29685.123333
        3 3 900 MULTIPOLYGON (((-74.03678 40.63431, -74.03618 ... 28988.433349
        4 4 900 MULTIPOLYGON (((-74.00451 40.59723, -74.00391 ... 6155.559780

        Great! Let's see the result on a map:

        
        
        In [12]:
        from cartoframes.viz import color_continuous_style
        
        Map([
          Layer(isochrones_gdf, color_continuous_style('total_pop'), title='Total Population'),
          Layer(stores_gdf, size_continuous_style('revenue', size_range=[10,40]), title='Annual Revenue ($)') 
        ])
        
        
        
        
        Out[12]:
        :
        StackTrace
          ">

          At this stage, we could say that the store on the right performs better than others because its area of influence is the one with the lowest population but the store is not the one with lowest revenue. This insight will help us to focus on them in further analyses.

          To learn more about discovering the data you want, check out the data discovery guide. To learn more about enriching your data check out the data enrichment guide.

          Publish and share your results

          The final step in the workflow is to share this interactive map with your colleagues so they can explore the information on their own. Let's do it!

          Let's visualize them and add widgets to them so people are able to see some graphs of the information and filter it. To do this, we only have to add default_widget=True to the layers.

          
          
          In [13]:
          result_map = Map([
              Layer(
                  isochrones_gdf,
                  color_continuous_style('total_pop', stroke_width=0, opacity=0.7),
                  title='Total Population',
                  default_widget=True
              ),
              Layer(
                  stores_gdf,
                  size_continuous_style('revenue', size_range=[10,40], stroke_color='white'),
                  title='Annual Revenue ($)',
                  default_widget=True
              ) 
          ])
          result_map
          
          
          
          
          Out[13]:
          :
          StackTrace
            ">

            Cool! Now that you have a small dashboard to play with, let's publish it on CARTO so you are able to share it with anyone. To do this, you just need to call the publish method from the Map class:

            
            
            In [14]:
            result_map.publish('starbucks_analysis', password=None, if_exists='replace')
            
            
            
            
            Out[14]:
            {'id': '607ba79e-5728-4f25-b44a-f01f64ab7bb0',
             'url': 'https://team.carto.com/u/arroyo-carto/kuviz/607ba79e-5728-4f25-b44a-f01f64ab7bb0',
             'name': 'starbucks_analysis',
             'privacy': 'public'}

            In order to improve the performance and reduce the size of your map, we recommend to upload the data to CARTO and use the table names in the layers instead. To upload your data, you just need to call to_carto with your GeoDataFrame:

            
            
            In [15]:
            from cartoframes import to_carto
            
            to_carto(stores_gdf, 'starbucks_stores', if_exists='replace')
            to_carto(isochrones_gdf, 'starbucks_isochrones', if_exists='replace')
            
            
            
            
            Success! Data uploaded to table "starbucks_stores" correctly
            Success! Data uploaded to table "starbucks_isochrones" correctly
            
            Out[15]:
            'starbucks_isochrones'

            Conclusion

            Congratulations! You have finished this guide and have a sense about how CARTOframes can speed up your workflow. To continue learning, you can check out a specific Guide, the Reference to know everything about a class or a method or check the Examples to see CARTOframes in action.