CARTOframes provides the functionality to use the CARTO Data Services API. This API consists of a set of location-based functions that can be applied to your data to perform geospatial analyses without leaving the context of your notebook.
For instance, you can geocode a pandas DataFrame with addresses on the fly, and then perform a trade areas analysis by computing isodistances or isochrones programmatically.
Given a set of ten simulated Starbucks store addresses, this guide walks through the use case of finding good location candidates to open an additional store.
Based on your account plan, some of these location data services are subject to different quota limitations
This guide uses the same dataset of simulated Starbucks locations that has been used in the other guides and can be downloaded here.
Using Location Data Services requires to be authenticated. For more information about how to authenticate, please read the Login to CARTO Platform guide
In [1]:
from cartoframes.auth import Credentials, set_default_credentials
set_default_credentials('creds.json')
To get started, let's read in and explore the Starbucks location data we have. With the Starbucks store data in a DataFrame, we can see that there are two columns that can be used in the geocoding service: name
and address
. There's also a third column that reflects the annual revenue of the store.
In [2]:
import pandas as pd
df = pd.read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn.csv')
df
Out[2]:
Each time you run Location Data Services, you consume quota. For this reason, we provide the ability to check in advance the amount of credits an operation will consume by using the dry_run
parameter when running the service function.
It is also possible to check the available quota by running the available_quota
function.
In [3]:
from cartoframes.data.services import Geocoding
geo_service = Geocoding()
_, geo_dry_metadata = geo_service.geocode(
df,
street='address',
city={'value': 'New York'},
country={'value': 'USA'},
dry_run=True
)
In [4]:
geo_dry_metadata
Out[4]:
In [5]:
geo_service.available_quota()
Out[5]:
In [6]:
geo_gdf, geo_metadata = geo_service.geocode(
df,
street='address',
city={'value': 'New York'},
country={'value': 'USA'}
)
If the input data file should ever change, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records.
In order to use cached results, we have to save the results to a CARTO table using the table_name
and cached=True
parameters.
In [7]:
geo_gdf_cached, geo_metadata_cached = geo_service.geocode(
df,
street='address',
city={'value': 'New York'},
country={'value': 'USA'},
table_name='starbucks_cache',
cached=True
)
Let's compare geo_dry_metadata
and geo_metadata
to see the differences between the information returned with and without the dry_run
option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota.
In [8]:
geo_metadata
Out[8]:
The resulting data is a GeoDataFrame
that contains three new columns:
geometry
: The resulting geometrygc_status_rel
: The percentage of accuracy of each locationcarto_geocode_hash
: Geocode information
In [9]:
geo_gdf.head()
Out[9]:
In addition, to prevent geocoding records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom
and carto_geocode_hash
columns generated by the geocoding process.
This will happen automatically in these cases:
table_name
parameter)table_name
parameter, and only use the resulting table for any further geocoding.If you try to geocode this DataFrame now, that contains both the_geom
and the carto_geocode_hash
, you will see that the required quota is 0 because it has already been geocoded.
In [10]:
_, repeat_geo_metadata = geo_service.geocode(
geo_gdf,
street='address',
city={'value': 'New York'},
country={'value': 'USA'},
dry_run=True
)
In [11]:
repeat_geo_metadata.get('required_quota')
Out[11]:
The address
column is more complete than the name
column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the name
column (0.95, 0.93, 0.96, 0.83, 0.78, 0.9
) are lower than the ones we get by using the address
column for geocoding (0.97, 0.99, 0.98
).
In [12]:
geo_name_gdf, geo_name_metadata = geo_service.geocode(
df,
street='name',
city={'value': 'New York'},
country={'value': 'USA'}
)
In [13]:
geo_name_gdf.head()
Out[13]:
In [14]:
geo_name_gdf.gc_status_rel.unique()
Out[14]:
In [15]:
geo_gdf.head()
Out[15]:
Finally, we can visualize the precision of the geocoded results using a CARTOframes visualization layer.
In [16]:
from cartoframes.viz import Layer, color_bins_style, popup_element
Layer(
geo_gdf,
color_bins_style(
'gc_status_rel',
method='equal',
bins=geo_gdf.gc_status_rel.unique().size,
),
popup_hover=[
popup_element('address', 'Address'),
popup_element('gc_status_rel', 'Precision')
],
title='Geocoding Precision'
)
Out[16]:
There are two Isoline functions: isochrones and isodistances. In this guide we will use the isochrones function to calculate walking areas by time for each Starbucks store and the isodistances function to calculate the walking area by distance.
By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by:
In [17]:
from cartoframes.data.services import Isolines
iso_service = Isolines()
_, isochrones_dry_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk', dry_run=True)
Remember to always check the quota using dry_run
parameter and available_quota
method before running the service!
In [18]:
print('available {0}, required {1}'.format(
iso_service.available_quota(),
isochrones_dry_metadata.get('required_quota'))
)
In [19]:
isochrones_gdf, isochrones_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk')
In [20]:
isochrones_gdf.head()
Out[20]:
In [21]:
from cartoframes.viz import basic_style
Layer(isochrones_gdf, basic_style(opacity=0.5))
Out[21]:
In [22]:
isodistances_gdf, isodistances_dry_metadata = iso_service.isodistances(
geo_gdf,
[100, 500, 1000],
mode='walk',
dry_run=True
)
In [23]:
print('available {0}, required {1}'.format(
iso_service.available_quota(),
isodistances_dry_metadata.get('required_quota'))
)
In [24]:
isodistances_gdf, isodistances_metadata = iso_service.isodistances(
geo_gdf,
[100, 500, 1000],
mode='walk'
)
In [25]:
isodistances_gdf.head()
Out[25]:
In [26]:
Layer(isodistances_gdf, basic_style(opacity=0.5))
Out[26]:
In [27]:
from cartoframes.viz import Map, Layer, size_continuous_style
Map([
Layer(
isochrones_gdf,
basic_style(opacity=0.5),
title='Walking Time'
),
Layer(
geo_gdf,
size_continuous_style(
'revenue',
color='white',
opacity='0.2',
stroke_color='blue',
size_range=[20, 80],
),
popup_hover=[
popup_element('address', 'Address'),
popup_element('gc_status_rel', 'Precision'),
popup_element('revenue', 'Revenue')
],
title='Revenue $',
)
])
Out[27]:
Looking at the map above, we can see the store at 228 Duffield St, Brooklyn, NY 11201 is really close to another store with higher revenue, which means we could even think about closing that one in favor of another one with a better location.
We could try to calculate where to place a new store between other stores that don't have as much revenue as others and that are placed separately.
Now, let's calculate the centroid of three different stores that we've identified previously and use it as a possible location for a new spot:
In [28]:
from shapely import geometry
new_store_location = [
geo_gdf.iloc[6].the_geom,
geo_gdf.iloc[9].the_geom,
geo_gdf.iloc[1].the_geom
]
# Create a polygon using three points from the geo_gdf
polygon = geometry.Polygon([[p.x, p.y] for p in new_store_location])
In [29]:
from geopandas import GeoDataFrame, points_from_xy
new_store_gdf = GeoDataFrame({
'name': ['New Store'],
'geometry': points_from_xy([polygon.centroid.x], [polygon.centroid.y])
})
isochrones_new_gdf, isochrones_new_metadata = iso_service.isochrones(new_store_gdf, [300, 900, 1800], mode='walk')
In [30]:
from cartoframes.viz import Map, Layer, size_continuous_style
Map([
Layer(
isochrones_gdf,
basic_style(opacity=0.2),
title='Walking Time - Current'
),
Layer(
isochrones_new_gdf,
basic_style(opacity=0.7),
title='Walking Time - New'
),
Layer(
geo_gdf,
size_continuous_style(
'revenue',
color='white',
opacity='0.2',
stroke_color='blue',
size_range=[20, 80]
),
popup_hover=[
popup_element('address', 'Address'),
popup_element('gc_status_rel', 'Precision'),
popup_element('revenue', 'Revenue')
],
title='Revenue $',
),
Layer(new_store_gdf)
])
Out[30]:
In this example you've seen how to use Location Data Services to perform a trade area analysis using CARTOframes built-in functionality without leaving the notebook.
Using the results, we've calculated a possible new location for a store, and used the isoline areas to help in the decision making process.
Take into account that finding optimal spots for new stores is not an easy task and requires more analysis, but this is a great first step!