Location Data Services

Introduction

CARTOframes provides the functionality to use the CARTO Data Services API. This API consists of a set of location-based functions that can be applied to your data to perform geospatial analyses without leaving the context of your notebook.

For instance, you can geocode a pandas DataFrame with addresses on the fly, and then perform a trade areas analysis by computing isodistances or isochrones programmatically.

Given a set of ten simulated Starbucks store addresses, this guide walks through the use case of finding good location candidates to open an additional store.

Based on your account plan, some of these location data services are subject to different quota limitations

Data

This guide uses the same dataset of simulated Starbucks locations that has been used in the other guides and can be downloaded here.

Authentication

Using Location Data Services requires to be authenticated. For more information about how to authenticate, please read the Login to CARTO Platform guide


In [1]:
from cartoframes.auth import Credentials, set_default_credentials

set_default_credentials('creds.json')

Geocoding

To get started, let's read in and explore the Starbucks location data we have. With the Starbucks store data in a DataFrame, we can see that there are two columns that can be used in the geocoding service: name and address. There's also a third column that reflects the annual revenue of the store.


In [2]:
import pandas as pd

df = pd.read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn.csv')
df


Out[2]:
name address revenue
0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1.321041e+06
1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1.268080e+06
2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1.248134e+06
3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1.185703e+06
4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1.148427e+06
5 Court St & Dean St 167 Court Street,Brooklyn, NY 11201 1.144067e+06
6 Target Gateway T-1401 519 Gateway Dr,Brooklyn, NY 11239 1.021083e+06
7 3rd Ave & 92nd St 9202 Third Avenue,Brooklyn, NY 11209 9.257073e+05
8 Lam Group @ Sheraton Brooklyn 228 Duffield st,Brooklyn, NY 11201 7.657935e+05
9 33-42 Hillel Place 33-42 Hillel Place,Brooklyn, NY 11210 7.492163e+05

Quota consumption

Each time you run Location Data Services, you consume quota. For this reason, we provide the ability to check in advance the amount of credits an operation will consume by using the dry_run parameter when running the service function.

It is also possible to check the available quota by running the available_quota function.


In [3]:
from cartoframes.data.services import Geocoding

geo_service = Geocoding()

_, geo_dry_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    dry_run=True
)

In [4]:
geo_dry_metadata


Out[4]:
{'total_rows': 10,
 'required_quota': 10,
 'previously_geocoded': 0,
 'previously_failed': 0,
 'records_with_geometry': 0}

In [5]:
geo_service.available_quota()


Out[5]:
4977588

In [6]:
geo_gdf, geo_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'}
)


Success! Data geocoded correctly

If the input data file should ever change, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records.

In order to use cached results, we have to save the results to a CARTO table using the table_name and cached=True parameters.


In [7]:
geo_gdf_cached, geo_metadata_cached = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    table_name='starbucks_cache',
    cached=True
)


Success! Data geocoded correctly

Let's compare geo_dry_metadata and geo_metadata to see the differences between the information returned with and without the dry_run option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota.


In [8]:
geo_metadata


Out[8]:
{'total_rows': 10,
 'required_quota': 10,
 'previously_geocoded': 0,
 'previously_failed': 0,
 'records_with_geometry': 0,
 'final_records_with_geometry': 10,
 'geocoded_increment': 10,
 'successfully_geocoded': 10,
 'failed_geocodings': 0}

The resulting data is a GeoDataFrame that contains three new columns:

  • geometry: The resulting geometry
  • gc_status_rel: The percentage of accuracy of each location
  • carto_geocode_hash: Geocode information

In [9]:
geo_gdf.head()


Out[9]:
the_geom name address revenue gc_status_rel carto_geocode_hash
0 POINT (-73.95746 40.67102) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1
1 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79
2 POINT (-73.98978 40.61944) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718
3 POINT (-74.02750 40.63202) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a
4 POINT (-74.00098 40.59321) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495

In addition, to prevent geocoding records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom and carto_geocode_hash columns generated by the geocoding process.

This will happen automatically in these cases:

  1. Your input is a table from CARTO processed in place (without a table_name parameter)
  2. If you save your results to a CARTO table using the table_name parameter, and only use the resulting table for any further geocoding.

If you try to geocode this DataFrame now, that contains both the_geom and the carto_geocode_hash, you will see that the required quota is 0 because it has already been geocoded.


In [10]:
_, repeat_geo_metadata = geo_service.geocode(
    geo_gdf,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    dry_run=True
)

In [11]:
repeat_geo_metadata.get('required_quota')


Out[11]:
0

Precision

The address column is more complete than the name column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the name column (0.95, 0.93, 0.96, 0.83, 0.78, 0.9) are lower than the ones we get by using the address column for geocoding (0.97, 0.99, 0.98).


In [12]:
geo_name_gdf, geo_name_metadata = geo_service.geocode(
    df,
    street='name',
    city={'value': 'New York'},
    country={'value': 'USA'}
)


Success! Data geocoded correctly

In [13]:
geo_name_gdf.head()


Out[13]:
the_geom name address revenue gc_status_rel carto_geocode_hash
0 POINT (-73.95795 40.67071) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.93 0be7693fc688eca36e1077656dcb00a5
1 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.96 084a5c4d42ccf3c3c8e69426619f270e
2 POINT (-73.99018 40.61914) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.93 1d9a17c20c11d0454aff10548a328c47
3 POINT (-74.02778 40.63146) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.96 d531df27fc02336dc722cb4e7028b244
4 POINT (-75.29322 43.07849) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.85 9d8c13b5b4a93591f427d3ce0b5b4ead

In [14]:
geo_name_gdf.gc_status_rel.unique()


Out[14]:
array([0.93, 0.96, 0.85, 0.83, 0.74, 0.87])

In [15]:
geo_gdf.head()


Out[15]:
the_geom name address revenue gc_status_rel carto_geocode_hash
0 POINT (-73.95746 40.67102) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1
1 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79
2 POINT (-73.98978 40.61944) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718
3 POINT (-74.02750 40.63202) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a
4 POINT (-74.00098 40.59321) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495

Visualize the results

Finally, we can visualize the precision of the geocoded results using a CARTOframes visualization layer.


In [16]:
from cartoframes.viz import Layer, color_bins_style, popup_element

Layer(
    geo_gdf,
    color_bins_style(
        'gc_status_rel',
        method='equal',
        bins=geo_gdf.gc_status_rel.unique().size,
    ),
    popup_hover=[
        popup_element('address', 'Address'),
        popup_element('gc_status_rel', 'Precision')
    ],
    title='Geocoding Precision'
)


Out[16]:
:
StackTrace
    ">

    Isolines

    There are two Isoline functions: isochrones and isodistances. In this guide we will use the isochrones function to calculate walking areas by time for each Starbucks store and the isodistances function to calculate the walking area by distance.

    By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by:

    • Time in the case of isochrones
    • Distance in the case of isodistances

    Isochrones

    For isochrones, let's calculate the time ranges of: 5, 15 and 30 min. These ranges are input in seconds, so they will be 300, 900, and 1800 respectively.

    
    
    In [17]:
    from cartoframes.data.services import Isolines
    
    iso_service = Isolines()
    
    _, isochrones_dry_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk', dry_run=True)
    

    Remember to always check the quota using dry_run parameter and available_quota method before running the service!

    
    
    In [18]:
    print('available {0}, required {1}'.format(
        iso_service.available_quota(),
        isochrones_dry_metadata.get('required_quota'))
    )
    
    
    
    
    available 112699, required 30
    
    
    
    In [19]:
    isochrones_gdf, isochrones_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk')
    
    
    
    
    Success! Isolines created correctly
    
    
    
    In [20]:
    isochrones_gdf.head()
    
    
    
    
    Out[20]:
    source_id data_range the_geom
    0 9 1800 MULTIPOLYGON (((-73.96485 40.63379, -73.96460 ...
    1 2 1800 MULTIPOLYGON (((-74.00605 40.62899, -74.00579 ...
    2 6 1800 MULTIPOLYGON (((-73.88520 40.65371, -73.88494 ...
    3 5 1800 MULTIPOLYGON (((-74.00674 40.68598, -74.00648 ...
    4 4 1800 MULTIPOLYGON (((-74.01412 40.60341, -74.01369 ...
    
    
    In [21]:
    from cartoframes.viz import basic_style
    
    Layer(isochrones_gdf, basic_style(opacity=0.5))
    
    
    
    
    Out[21]:
    :
    StackTrace
      ">

      Isodistances

      For isodistances, let's calculate the distance ranges of: 100, 500 and 1000 meters. These ranges are input in meters, so they will be 100, 500, and 1000 respectively.

      
      
      In [22]:
      isodistances_gdf, isodistances_dry_metadata = iso_service.isodistances(
          geo_gdf,
          [100, 500, 1000],
          mode='walk',
          dry_run=True
      )
      
      
      
      In [23]:
      print('available {0}, required {1}'.format(
          iso_service.available_quota(),
          isodistances_dry_metadata.get('required_quota'))
      )
      
      
      
      
      available 112669, required 30
      
      
      
      In [24]:
      isodistances_gdf, isodistances_metadata = iso_service.isodistances(
          geo_gdf,
          [100, 500, 1000],
          mode='walk'
      )
      
      
      
      
      Success! Isolines created correctly
      
      
      
      In [25]:
      isodistances_gdf.head()
      
      
      
      
      Out[25]:
      source_id data_range the_geom
      0 9 1000 MULTIPOLYGON (((-73.95867 40.63311, -73.95842 ...
      1 2 1000 MULTIPOLYGON (((-73.99850 40.62281, -73.99841 ...
      2 6 1000 MULTIPOLYGON (((-73.87696 40.65371, -73.87671 ...
      3 5 1000 MULTIPOLYGON (((-74.00245 40.69061, -74.00185 ...
      4 4 1000 MULTIPOLYGON (((-74.00451 40.59860, -74.00391 ...
      
      
      In [26]:
      Layer(isodistances_gdf, basic_style(opacity=0.5))
      
      
      
      
      Out[26]:
      :
      StackTrace
        ">

        All together

        Let's visualize the data in one map to see what insights we can find.

        
        
        In [27]:
        from cartoframes.viz import Map, Layer, size_continuous_style
        
        Map([
            Layer(
                isochrones_gdf,
                basic_style(opacity=0.5),
                title='Walking Time'
            ),
            Layer(
                geo_gdf,
                size_continuous_style(
                    'revenue',
                    color='white',
                    opacity='0.2',
                    stroke_color='blue',
                    size_range=[20, 80],
                ),
                popup_hover=[
                    popup_element('address', 'Address'),
                    popup_element('gc_status_rel', 'Precision'),
                    popup_element('revenue', 'Revenue')
                ],
                title='Revenue $',
            )
        ])
        
        
        
        
        Out[27]:
        :
        StackTrace
          ">

          Looking at the map above, we can see the store at 228 Duffield St, Brooklyn, NY 11201 is really close to another store with higher revenue, which means we could even think about closing that one in favor of another one with a better location.

          We could try to calculate where to place a new store between other stores that don't have as much revenue as others and that are placed separately.

          Now, let's calculate the centroid of three different stores that we've identified previously and use it as a possible location for a new spot:

          
          
          In [28]:
          from shapely import geometry
          
          new_store_location = [
              geo_gdf.iloc[6].the_geom,
              geo_gdf.iloc[9].the_geom,
              geo_gdf.iloc[1].the_geom
          ]
          
          # Create a polygon using three points from the geo_gdf
          polygon = geometry.Polygon([[p.x, p.y] for p in new_store_location])
          
          
          
          In [29]:
          from geopandas import GeoDataFrame, points_from_xy
          
          new_store_gdf = GeoDataFrame({
              'name': ['New Store'],
              'geometry': points_from_xy([polygon.centroid.x], [polygon.centroid.y])
          })
              
          isochrones_new_gdf, isochrones_new_metadata = iso_service.isochrones(new_store_gdf, [300, 900, 1800], mode='walk')
          
          
          
          
          Success! Isolines created correctly
          
          
          
          In [30]:
          from cartoframes.viz import Map, Layer, size_continuous_style
          
          Map([
              Layer(
                  isochrones_gdf,
                  basic_style(opacity=0.2),
                  title='Walking Time - Current'
              ),
              Layer(
                  isochrones_new_gdf,
                  basic_style(opacity=0.7),
                  title='Walking Time - New'
              ),
              Layer(
                  geo_gdf,
                  size_continuous_style(
                      'revenue',
                      color='white',
                      opacity='0.2',
                      stroke_color='blue',
                      size_range=[20, 80]
                  ),
                  popup_hover=[
                      popup_element('address', 'Address'),
                      popup_element('gc_status_rel', 'Precision'),
                      popup_element('revenue', 'Revenue')
                  ],
                  title='Revenue $',
              ),
              Layer(new_store_gdf)
          ])
          
          
          
          
          Out[30]:
          :
          StackTrace
            ">

            Conclusion

            In this example you've seen how to use Location Data Services to perform a trade area analysis using CARTOframes built-in functionality without leaving the notebook.

            Using the results, we've calculated a possible new location for a store, and used the isoline areas to help in the decision making process.

            Take into account that finding optimal spots for new stores is not an easy task and requires more analysis, but this is a great first step!