As a data scientist, you likely work through a data exploration processes on nearly every project. Exploratory data analysis can entail many things from finding relevant data and cleaning it to running analysis and building models. The ability to visually analyze and interact with data is key during the exploratory process and the final presentation of insights.
With that in mind, this guide introduces the basic building blocks for creating web-based, dynamic, and interactive map visualizations inside of a Jupyter Notebook with CARTOframes.
In this guide you are introduced to the Map and Layer classes, how to explore data with Widgets and Popups, how to use visualization styles to quickly symbolize thematic attributes, and options for creating maps to share your findings.
This guide uses two datasets: a point dataset of simulated Starbucks locations in Brooklyn, New York and 15 minute walk time polygons (isochrones) around each store augmented with demographic variables from CARTO's Data Observatory. To follow along, you can get the point dataset here and the polygon dataset here.
As a first step, load both datasets as pandas.DataFrame into the notebook:
In [1]:
from pandas import read_csv
from geopandas import GeoDataFrame
from cartoframes.utils import decode_geometry
# store point locations
stores_df = read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn_geocoded.csv')
stores_gdf = GeoDataFrame(stores_df, geometry=decode_geometry(stores_df['the_geom']))
# 15 minute walk time polygons
iso_df = read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn_iso_enriched.csv')
iso_gdf = GeoDataFrame(iso_df, geometry=decode_geometry(iso_df['the_geom']))
Next, import the Map and Layer classes from the Viz namespace to visualize the two datasets. The resulting map draws each dataset with default symbology on top of CARTO's Positron basemap with the zoom and center set to the extent of both datasets:
In [2]:
from cartoframes.viz import Map, Layer
Map([
Layer(iso_gdf),
Layer(stores_gdf)
])
Out[2]:
To learn more about basemap options, visit the Map Configuration Examples section of the CARTOframes Developer Center
Before going further, take a look at the attributes in each dataset to get a sense of the information available to visualize and interact with.
First, explore the store location attributes. In this dataset you will use the fields:
id_store
that is a unique identifier for each of the locationsrevenue
which provides the information about how much a particular store earned in the year 2018
In [3]:
stores_gdf.head(3)
Out[3]:
From the isochrone Layer, you will use the demographic attributes:
popcy
which counts the total population in each areainccymedhh
that is the median household income in each arealbfcyempl
counts the employed populationeducybach
counts the number of people with a bachelor's degreeid_store
which matches the unique id in the store points
In [4]:
iso_gdf.head(3)
Out[4]:
Now that you've taken a first look at the fields in the data and done a basic visualization, let's look at how you can use the map as a tool for visual and interactive exploration to better understand the relationship between a store's annual revenue and the surrounding area's demographic characteristics.
As seen in the table summaries above, there are a variety of demographic attributes in the isochrone Layer that would be helpful to better understand the characteristics around each store.
To make this information available while exploring each location on the map, you can add each attribute as a Widget. For this case specifically, you will use Formula Widgets to summarize the demographic variables and a Category Widget on the categorical attribute of id_store
.
To add Widgets, you first need to import the types that you want to use and then, inside of the iso_gdf
Layer add one widget for each attribute of interest. The Formula Widget accepts different types of aggregations. For this map, you will aggregate each demographic variable using sum
so the totals update as you zoom, pan and interact with the map. You will also label each Widget appropriately using the title
parameter.
In [5]:
from cartoframes.viz import formula_widget, category_widget
Map([
Layer(
iso_gdf,
widgets=[
formula_widget(
'popcy',
'sum',
title='Total Population Served'
),
formula_widget(
'inccymedhh',
'sum',
title='Median Income ($)'
),
formula_widget(
'lbfcyempl',
'sum',
title='Employed Population',
),
formula_widget(
'educybach',
'sum',
title='Number of People with Bachelor Degree',
),
category_widget(
'id_store',
title='Store ID'
)
]
),
Layer(
stores_gdf
)
])
Out[5]:
At this point, take a few minutes to explore the map to see how the Widget values update. For example, select a Store ID from the Category Widget to summarize the demographics for a particular store. Alternatively, zoom and pan the map to get summary statistics for the features in the current map view.
In order to aid this map-based exploration, import the Popup class and use the hover option on the iso_gdf
Layer to be able to quickly hover over stores and get their ID:
In [6]:
from cartoframes.viz import popup_element
Map([
Layer(
iso_gdf,
widgets=[
formula_widget(
'popcy',
'sum',
title='Total Population Served'
),
formula_widget(
'inccymedhh',
'sum',
title='Median Income ($)'
),
formula_widget(
'lbfcyempl',
'sum',
title='Employed Population',
),
formula_widget(
'educybach',
'sum',
title='Number of People with Bachelor Degree',
),
category_widget(
'id_store',
title='Store ID'
)
],
popup_hover=[
popup_element('id_store', 'Store ID')
]
),
Layer(
stores_gdf
)
])
Out[6]:
Now, as you explore the map and summarize demographics, it is much easier to relate the summarized values to a unique store ID.
At this point, you have some really useful information available on the map but only coming from the isochrone Layer. Sizing the store points by the attribute revenue
will provide a way to visually locate which stores are performing better than others. A quick way to visualize numeric or categorical attributes during the data exploration process is to take advantage of visualization styles.
To size the store points proportionate to their revenue, you'll use the size_continuous_style
:
In [7]:
from cartoframes.viz import size_continuous_style
Map([
Layer(
iso_gdf,
widgets=[
formula_widget(
'popcy',
'sum',
title='Total Population Served'
),
formula_widget(
'inccymedhh',
'sum',
title='Median Income ($)'
),
formula_widget(
'lbfcyempl',
'sum',
title='Employed Population',
),
formula_widget(
'educybach',
'sum',
title='Number of People with Bachelor Degree',
),
category_widget(
'id_store',
title='Store ID'
)
],
popup_hover=[
popup_element('id_store', title='Store ID')
]
),
Layer(
stores_gdf,
size_continuous_style('revenue')
)
])
Out[7]:
Now you have a proportional symbol map where points are sized by revenue. You will also notice that an appropriate legend has been added to the map and when you hover over the points, you will see each store's revenue value.
Next, let's take a look at how to modify some of the defaults.
Every Visualization Layer has a set of parameters available to customize the defaults to better suit a given map. A quick way to see which parameters are available for customization in the size_continuous_style
, is to run help(size_continuous_style)
in a notebook cell.
Let's make a few adjustments to make it easier to distinguish and locate the highest and lowest performing stores:
size_range=[10,50]
title=Annual Revenue ($)
default_widget=True
In [8]:
from cartoframes.viz import size_continuous_style
Map([
Layer(
iso_gdf,
widgets=[
formula_widget(
'popcy',
'sum',
title='Total Population Served'
),
formula_widget(
'inccymedhh',
'sum',
title='Median Income ($)'
),
formula_widget(
'lbfcyempl',
'sum',
title='Employed Population',
),
formula_widget(
'educybach',
'sum',
title='Number of People with Bachelor Degree',
),
category_widget(
'id_store',
title='Store ID'
)
],
popup_hover=[
popup_element('id_store', 'Store ID')
]
),
Layer(
stores_gdf,
size_continuous_style(
'revenue',
size_range=[10,50]
),
title='Annual Revenue ($)',
default_widget=True
)
])
Out[8]:
And now you have a map to visually and interactively explore the relationship between revenue and demographic variables for each store:
The map above provides a way to explore the data both visually and interactively in different ways:
Use the map to see if you can find the highest and lowest performing stores and summarize the demographic characteristics of each one!
Now that you have gained insight into the relationship between revenue and demographics, let's say that the most influential factor of how well a store performed was median income and you want to create a map to show that particular relationship.
To show this, the map below uses another Visualization Layer, this time the color_bins_style
to color each isochrone according to the range of median household income it falls within. Additionally, the size_continuous_style
used in the previous map has been further customized to account for the new thematic median income style, and the store points have been added again as a third Layer to show their location and ID on hover. The map also has a custom viewport set to center it on the highest performing (A) and lowest performing (J) stores that have similar median income values.
In [9]:
from cartoframes.viz import Map, Layer, color_bins_style, size_continuous_style, default_legend
Map([
Layer(
iso_gdf,
style=color_bins_style(
'inccymedhh',
bins=7,
palette='pinkyl',
opacity=0.8,
stroke_width=0
),
legends=default_legend(title='Median Household Income ($)', footer='Source: US Census Bureau')
),
Layer(
stores_gdf,
style=size_continuous_style(
'revenue',
size_range=[10,50],
range_max=1000000,
opacity=0,
stroke_color='turquoise',
stroke_width=2
),
legends=default_legend(title='Legend Title')
),
Layer(
stores_gdf,
popup_hover=[
popup_element('id_store', title='Store ID')
]
)
], viewport={'zoom': 12, 'lat': 40.644417, 'lng': -73.934710})
Out[9]:
If you want to compare store revenue with multiple demographic variables, you can create a Layout with multiple maps.
In the example below, one map symbolizes annual revenue and the other three maps symbolize three demographic variables that use the same color palette where yellow is low and red is high. Each map has a title to label which attribute is being mapped.
In [10]:
from cartoframes.viz import Layout
Layout([
Map([
Layer(
stores_gdf,
style=size_continuous_style(
'revenue',
size_range=[10,50],
range_max=1000000,
opacity=0,
stroke_color='turquoise',
stroke_width=2
),
legends=default_legend(title='Annual Revenue'),
default_popup_hover=False,
default_popup_click=False
),
Layer(stores_gdf)
]),
Map([
Layer(
iso_gdf,
style=color_bins_style(
'inccymedhh',
bins=7,
palette='pinkyl',
stroke_width=0
),
legends=default_legend(title='Median Income'),
default_popup_hover=False,
default_popup_click=False
),
Layer(stores_gdf)
]),
Map([
Layer(iso_gdf,
style=color_bins_style(
'popcy',
bins=7,
palette='pinkyl',
stroke_width=0
),
legends=default_legend(title='Total Pop'),
default_popup_hover=False,
default_popup_click=False
),
Layer(stores_gdf)
]),
Map([
Layer(
iso_gdf,
style=color_bins_style(
'lbfcyempl',
bins=7,
palette='pinkyl',
stroke_width=0
),
legends=default_legend(title='Employed Pop'),
default_popup_hover=False,
default_popup_click=False
),
Layer(stores_gdf)
]),
],2,2,viewport={'zoom': 10, 'lat': 40.64, 'lng': -73.92}, map_height=400, is_static=True)
Out[10]:
:
StackTrace
">