Visualization

Introduction

As a data scientist, you likely work through a data exploration processes on nearly every project. Exploratory data analysis can entail many things from finding relevant data and cleaning it to running analysis and building models. The ability to visually analyze and interact with data is key during the exploratory process and the final presentation of insights.

With that in mind, this guide introduces the basic building blocks for creating web-based, dynamic, and interactive map visualizations inside of a Jupyter Notebook with CARTOframes.

In this guide you are introduced to the Map and Layer classes, how to explore data with Widgets and Popups, how to use visualization styles to quickly symbolize thematic attributes, and options for creating maps to share your findings.

Data

This guide uses two datasets: a point dataset of simulated Starbucks locations in Brooklyn, New York and 15 minute walk time polygons (isochrones) around each store augmented with demographic variables from CARTO's Data Observatory. To follow along, you can get the point dataset here and the polygon dataset here.

As a first step, load both datasets as pandas.DataFrame into the notebook:



In [1]:

    
from pandas import read_csv
from geopandas import GeoDataFrame
from cartoframes.utils import decode_geometry

# store point locations
stores_df = read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn_geocoded.csv')
stores_gdf = GeoDataFrame(stores_df, geometry=decode_geometry(stores_df['the_geom']))

# 15 minute walk time polygons
iso_df = read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn_iso_enriched.csv')
iso_gdf = GeoDataFrame(iso_df, geometry=decode_geometry(iso_df['the_geom']))

Add Layers to a Map

Next, import the Map and Layer classes from the Viz namespace to visualize the two datasets. The resulting map draws each dataset with default symbology on top of CARTO's Positron basemap with the zoom and center set to the extent of both datasets:



In [2]:

    
from cartoframes.viz import Map, Layer

Map([
    Layer(iso_gdf),
    Layer(stores_gdf)
])

To learn more about basemap options, visit the Map Configuration Examples section of the CARTOframes Developer Center

Explore Attributes

Before going further, take a look at the attributes in each dataset to get a sense of the information available to visualize and interact with.

First, explore the store location attributes. In this dataset you will use the fields:

id_store that is a unique identifier for each of the locations
revenue which provides the information about how much a particular store earned in the year 2018



In [3]:

    
stores_gdf.head(3)









    Out[3]:







  
    
      
      the_geom
      cartodb_id
      field_1
      name
      address
      revenue
      id_store
      geometry
    
  
  
    
      0
      0101000020E61000005EA27A6B607D52C01956F146E655...
      1
      0
      Franklin Ave & Eastern Pkwy
      341 Eastern Pkwy,Brooklyn, NY 11238
      1321040.772
      A
      POINT (-73.95901 40.67109)
    
    
      1
      0101000020E6100000B610E4A0847D52C0B532E197FA49...
      2
      1
      607 Brighton Beach Ave
      607 Brighton Beach Avenue,Brooklyn, NY 11235
      1268080.418
      B
      POINT (-73.96122 40.57796)
    
    
      2
      0101000020E6100000E5B8533A587F52C05726FC523F4F...
      3
      2
      65th St & 18th Ave
      6423 18th Avenue,Brooklyn, NY 11204
      1248133.699
      C
      POINT (-73.98976 40.61912)

From the isochrone Layer, you will use the demographic attributes:

popcy which counts the total population in each area
inccymedhh that is the median household income in each area
lbfcyempl counts the employed population
educybach counts the number of people with a bachelor's degree
id_store which matches the unique id in the store points



In [4]:

    
iso_gdf.head(3)









    Out[4]:







  
    
      
      the_geom
      cartodb_id
      popcy
      data_range
      range_label
      lbfcyempl
      educybach
      inccymedhh
      id_store
      geometry
    
  
  
    
      0
      0106000020E61000000100000001030000000100000033...
      3
      1311.667005
      900
      15 min.
      568.006658
      151.682217
      48475.834346
      C
      MULTIPOLYGON (((-73.99082 40.62694, -73.99170 ...
    
    
      1
      0106000020E61000000100000001030000000100000033...
      7
      2215.539290
      900
      15 min.
      1181.265882
      313.739810
      35125.870621
      G
      MULTIPOLYGON (((-73.87101 40.66114, -73.87166 ...
    
    
      2
      0106000020E61000000100000001030000000100000033...
      9
      1683.229186
      900
      15 min.
      1012.737753
      449.871005
      87079.135091
      I
      MULTIPOLYGON (((-73.98467 40.70054, -73.98658 ...

Visual and Interactive Data Exploration

Now that you've taken a first look at the fields in the data and done a basic visualization, let's look at how you can use the map as a tool for visual and interactive exploration to better understand the relationship between a store's annual revenue and the surrounding area's demographic characteristics.

Add Widgets

As seen in the table summaries above, there are a variety of demographic attributes in the isochrone Layer that would be helpful to better understand the characteristics around each store.

To make this information available while exploring each location on the map, you can add each attribute as a Widget. For this case specifically, you will use Formula Widgets to summarize the demographic variables and a Category Widget on the categorical attribute of id_store.

To add Widgets, you first need to import the types that you want to use and then, inside of the iso_gdf Layer add one widget for each attribute of interest. The Formula Widget accepts different types of aggregations. For this map, you will aggregate each demographic variable using sum so the totals update as you zoom, pan and interact with the map. You will also label each Widget appropriately using the title parameter.



In [5]:

    
from cartoframes.viz import formula_widget, category_widget

Map([
    Layer(
        iso_gdf,
        widgets=[
            formula_widget(
                'popcy',
                'sum',
                title='Total Population Served'
            ),
            formula_widget(
                'inccymedhh',
                'sum',
                title='Median Income ($)'
            ),
            formula_widget(
                'lbfcyempl',
                'sum',
                title='Employed Population',
            ),
            formula_widget(
                'educybach',
                'sum',
                title='Number of People with Bachelor Degree',
            ),
            category_widget(
                'id_store',
                title='Store ID'
            )
        ]
    ),
    Layer(
        stores_gdf
    )
])

At this point, take a few minutes to explore the map to see how the Widget values update. For example, select a Store ID from the Category Widget to summarize the demographics for a particular store. Alternatively, zoom and pan the map to get summary statistics for the features in the current map view.

Add Popups

In order to aid this map-based exploration, import the Popup class and use the hover option on the iso_gdf Layer to be able to quickly hover over stores and get their ID:



In [6]:

    
from cartoframes.viz import popup_element

Map([
    Layer(
        iso_gdf,
        widgets=[
            formula_widget(
                'popcy',
                'sum',
                title='Total Population Served'
            ),
            formula_widget(
                'inccymedhh',
                'sum',
                title='Median Income ($)'
            ),
            formula_widget(
                'lbfcyempl',
                'sum',
                title='Employed Population',
            ),
            formula_widget(
                'educybach',
                'sum',
                title='Number of People with Bachelor Degree',
            ),
            category_widget(
                'id_store',
                title='Store ID'
            )
        ],
        popup_hover=[
            popup_element('id_store', 'Store ID')
        ]
    ),
    Layer(
        stores_gdf
    )
])

Now, as you explore the map and summarize demographics, it is much easier to relate the summarized values to a unique store ID.

Symbolize Store Points

At this point, you have some really useful information available on the map but only coming from the isochrone Layer. Sizing the store points by the attribute revenue will provide a way to visually locate which stores are performing better than others. A quick way to visualize numeric or categorical attributes during the data exploration process is to take advantage of visualization styles.

To size the store points proportionate to their revenue, you'll use the size_continuous_style:



In [7]:

    
from cartoframes.viz import size_continuous_style

Map([
    Layer(
        iso_gdf,
        widgets=[
            formula_widget(
                'popcy',
                'sum',
                title='Total Population Served'
            ),
            formula_widget(
                'inccymedhh',
                'sum',
                title='Median Income ($)'
            ),
            formula_widget(
                'lbfcyempl',
                'sum',
                title='Employed Population',
            ),
            formula_widget(
                'educybach',
                'sum',
                title='Number of People with Bachelor Degree',
            ),
            category_widget(
                'id_store',
                title='Store ID'
            )
        ],
        popup_hover=[
            popup_element('id_store', title='Store ID')
        ]
    ),
    Layer(
        stores_gdf,
        size_continuous_style('revenue')
    )
])

Now you have a proportional symbol map where points are sized by revenue. You will also notice that an appropriate legend has been added to the map and when you hover over the points, you will see each store's revenue value.

Next, let's take a look at how to modify some of the defaults.

Every Visualization Layer has a set of parameters available to customize the defaults to better suit a given map. A quick way to see which parameters are available for customization in the size_continuous_style, is to run help(size_continuous_style) in a notebook cell.

Let's make a few adjustments to make it easier to distinguish and locate the highest and lowest performing stores:

The continuous point size reads between a minimum and maximum range of symbol sizes. Since the smallest revenue value on this map is hard to see, set size_range=[10,50]
By default both the Legend and Popup titles are set to the attribute being visualized. To give them more descriptive titles, set title=Annual Revenue ($)
In order to see and interact with the distribution of revenue values, you can also add a Histogram Widget (turned off by default) by setting default_widget=True



In [8]:

    
from cartoframes.viz import size_continuous_style

Map([
    Layer(
        iso_gdf,
        widgets=[
            formula_widget(
                'popcy',
                'sum',
                title='Total Population Served'
            ),
            formula_widget(
                'inccymedhh',
                'sum',
                title='Median Income ($)'
            ),
            formula_widget(
                'lbfcyempl',
                'sum',
                title='Employed Population',
            ),
            formula_widget(
                'educybach',
                'sum',
                title='Number of People with Bachelor Degree',
            ),
            category_widget(
                'id_store',
                title='Store ID'
            )
        ],
        popup_hover=[
            popup_element('id_store', 'Store ID')
        ]
    ),
    Layer(
        stores_gdf,
        size_continuous_style(
            'revenue',
            size_range=[10,50]
        ),
        title='Annual Revenue ($)',
        default_widget=True
    )
])

And now you have a map to visually and interactively explore the relationship between revenue and demographic variables for each store:

Insights

The map above provides a way to explore the data both visually and interactively in different ways:

you can almost instantaneously locate higher and lower performing stores based on the symbol sizes
you can zoom in on any store to summarize demographic characteristics
you can quickly find out the store ID by hovering on it
you can select a range of revenues from the Histogram Widget and have the map update to only display those stores
you can use the Store ID Category Widget to isolate a particular store and summarize values

Use the map to see if you can find the highest and lowest performing stores and summarize the demographic characteristics of each one!

Present Insights

Now that you have gained insight into the relationship between revenue and demographics, let's say that the most influential factor of how well a store performed was median income and you want to create a map to show that particular relationship.

To show this, the map below uses another Visualization Layer, this time the color_bins_style to color each isochrone according to the range of median household income it falls within. Additionally, the size_continuous_style used in the previous map has been further customized to account for the new thematic median income style, and the store points have been added again as a third Layer to show their location and ID on hover. The map also has a custom viewport set to center it on the highest performing (A) and lowest performing (J) stores that have similar median income values.



In [9]:

    
from cartoframes.viz import Map, Layer, color_bins_style, size_continuous_style, default_legend

Map([
    Layer(
        iso_gdf,
        style=color_bins_style(
            'inccymedhh',
            bins=7,
            palette='pinkyl',
            opacity=0.8,
            stroke_width=0
        ),
        legends=default_legend(title='Median Household Income ($)', footer='Source: US Census Bureau')
    ),
    Layer(
        stores_gdf,
        style=size_continuous_style(
            'revenue',
            size_range=[10,50],
            range_max=1000000,
            opacity=0,
            stroke_color='turquoise',
            stroke_width=2
        ),
        legends=default_legend(title='Legend Title')
    ),
    Layer(
        stores_gdf,
        popup_hover=[
            popup_element('id_store', title='Store ID')
        ]
    )
], viewport={'zoom': 12, 'lat': 40.644417, 'lng': -73.934710})









    Out[9]:





  
    
    
      
        
        
        
          
            
              


  
    
      
    
      
        
        
        
          
            
              
                
                
              
            
          
        
      
    
      
        
        
        
          
            
              
                
                
                  Source: US Census Bureau
                
              
            
          
        
      
    
  

            
 
          
 
        
      
 
    
 
  

  

  
  
    :
    
      
      
    
  

  
    StackTrace
    
  






">

Compare Variables with a Layout

If you want to compare store revenue with multiple demographic variables, you can create a Layout with multiple maps.

In the example below, one map symbolizes annual revenue and the other three maps symbolize three demographic variables that use the same color palette where yellow is low and red is high. Each map has a title to label which attribute is being mapped.



In [10]:

    
from cartoframes.viz import Layout

Layout([
    Map([
        Layer(
            stores_gdf,
            style=size_continuous_style(
                'revenue',
                size_range=[10,50],
                range_max=1000000,
                opacity=0,
                stroke_color='turquoise',
                stroke_width=2
            ),
            legends=default_legend(title='Annual Revenue'),
            default_popup_hover=False,
            default_popup_click=False
        ),
        Layer(stores_gdf)
    ]),
    Map([
        Layer(
            iso_gdf,
            style=color_bins_style(
                'inccymedhh',
                bins=7,
                palette='pinkyl',
                stroke_width=0
            ),
            legends=default_legend(title='Median Income'),
            default_popup_hover=False,
            default_popup_click=False
        ),
        Layer(stores_gdf)
    ]),
    Map([
        Layer(iso_gdf,
            style=color_bins_style(
                'popcy',
                bins=7,
                palette='pinkyl',
                stroke_width=0
            ),
            legends=default_legend(title='Total Pop'),
            default_popup_hover=False,
            default_popup_click=False
        ),
        Layer(stores_gdf)
    ]),
    Map([
        Layer(
            iso_gdf,
            style=color_bins_style(
                'lbfcyempl',
                bins=7,
                palette='pinkyl',
                stroke_width=0
            ),
            legends=default_legend(title='Employed Pop'),
            default_popup_hover=False,
            default_popup_click=False
        ),
        Layer(stores_gdf)
    ]),
],2,2,viewport={'zoom': 10, 'lat': 40.64, 'lng': -73.92}, map_height=400, is_static=True)

Conclusion

In this guide you were introduced to the Map and Layer classes, saw how to explore data with Widgets and Popups, and how to use visualization styles to quickly symbolize thematic attributes. You also saw some options for creating different maps of your findings.

	the_geom	cartodb_id	field_1	name	address	revenue	id_store	geometry
0	0101000020E61000005EA27A6B607D52C01956F146E655...	1	0	Franklin Ave & Eastern Pkwy	341 Eastern Pkwy,Brooklyn, NY 11238	1321040.772	A	POINT (-73.95901 40.67109)
1	0101000020E6100000B610E4A0847D52C0B532E197FA49...	2	1	607 Brighton Beach Ave	607 Brighton Beach Avenue,Brooklyn, NY 11235	1268080.418	B	POINT (-73.96122 40.57796)
2	0101000020E6100000E5B8533A587F52C05726FC523F4F...	3	2	65th St & 18th Ave	6423 18th Avenue,Brooklyn, NY 11204	1248133.699	C	POINT (-73.98976 40.61912)

	the_geom	cartodb_id	popcy	data_range	range_label	lbfcyempl	educybach	inccymedhh	id_store	geometry
0	0106000020E61000000100000001030000000100000033...	3	1311.667005	900	15 min.	568.006658	151.682217	48475.834346	C	MULTIPOLYGON (((-73.99082 40.62694, -73.99170 ...
1	0106000020E61000000100000001030000000100000033...	7	2215.539290	900	15 min.	1181.265882	313.739810	35125.870621	G	MULTIPOLYGON (((-73.87101 40.66114, -73.87166 ...
2	0106000020E61000000100000001030000000100000033...	9	1683.229186	900	15 min.	1012.737753	449.871005	87079.135091	I	MULTIPOLYGON (((-73.98467 40.70054, -73.98658 ...