Visualization

Introduction

As a data scientist, you likely work through a data exploration processes on nearly every project. Exploratory data analysis can entail many things from finding relevant data and cleaning it to running analysis and building models. The ability to visually analyze and interact with data is key during the exploratory process and the final presentation of insights.

With that in mind, this guide introduces the basic building blocks for creating web-based, dynamic, and interactive map visualizations inside of a Jupyter Notebook with CARTOframes.

In this guide you are introduced to the Map and Layer classes, how to explore data with Widgets and Popups, how to use visualization styles to quickly symbolize thematic attributes, and options for creating maps to share your findings.

Data

This guide uses two datasets: a point dataset of simulated Starbucks locations in Brooklyn, New York and 15 minute walk time polygons (isochrones) around each store augmented with demographic variables from CARTO's Data Observatory. To follow along, you can get the point dataset here and the polygon dataset here.

As a first step, load both datasets as pandas.DataFrame into the notebook:


In [1]:
from pandas import read_csv
from geopandas import GeoDataFrame
from cartoframes.utils import decode_geometry

# store point locations
stores_df = read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn_geocoded.csv')
stores_gdf = GeoDataFrame(stores_df, geometry=decode_geometry(stores_df['the_geom']))

# 15 minute walk time polygons
iso_df = read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn_iso_enriched.csv')
iso_gdf = GeoDataFrame(iso_df, geometry=decode_geometry(iso_df['the_geom']))

Add Layers to a Map

Next, import the Map and Layer classes from the Viz namespace to visualize the two datasets. The resulting map draws each dataset with default symbology on top of CARTO's Positron basemap with the zoom and center set to the extent of both datasets:


In [2]:
from cartoframes.viz import Map, Layer

Map([
    Layer(iso_gdf),
    Layer(stores_gdf)
])


Out[2]:
:
StackTrace
    ">

    To learn more about basemap options, visit the Map Configuration Examples section of the CARTOframes Developer Center

    Explore Attributes

    Before going further, take a look at the attributes in each dataset to get a sense of the information available to visualize and interact with.

    First, explore the store location attributes. In this dataset you will use the fields:

    • id_store that is a unique identifier for each of the locations
    • revenue which provides the information about how much a particular store earned in the year 2018
    
    
    In [3]:
    stores_gdf.head(3)
    
    
    
    
    Out[3]:
    the_geom cartodb_id field_1 name address revenue id_store geometry
    0 0101000020E61000005EA27A6B607D52C01956F146E655... 1 0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 A POINT (-73.95901 40.67109)
    1 0101000020E6100000B610E4A0847D52C0B532E197FA49... 2 1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 B POINT (-73.96122 40.57796)
    2 0101000020E6100000E5B8533A587F52C05726FC523F4F... 3 2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 C POINT (-73.98976 40.61912)

    From the isochrone Layer, you will use the demographic attributes:

    • popcy which counts the total population in each area
    • inccymedhh that is the median household income in each area
    • lbfcyempl counts the employed population
    • educybach counts the number of people with a bachelor's degree
    • id_store which matches the unique id in the store points
    
    
    In [4]:
    iso_gdf.head(3)
    
    
    
    
    Out[4]:
    the_geom cartodb_id popcy data_range range_label lbfcyempl educybach inccymedhh id_store geometry
    0 0106000020E61000000100000001030000000100000033... 3 1311.667005 900 15 min. 568.006658 151.682217 48475.834346 C MULTIPOLYGON (((-73.99082 40.62694, -73.99170 ...
    1 0106000020E61000000100000001030000000100000033... 7 2215.539290 900 15 min. 1181.265882 313.739810 35125.870621 G MULTIPOLYGON (((-73.87101 40.66114, -73.87166 ...
    2 0106000020E61000000100000001030000000100000033... 9 1683.229186 900 15 min. 1012.737753 449.871005 87079.135091 I MULTIPOLYGON (((-73.98467 40.70054, -73.98658 ...

    Visual and Interactive Data Exploration

    Now that you've taken a first look at the fields in the data and done a basic visualization, let's look at how you can use the map as a tool for visual and interactive exploration to better understand the relationship between a store's annual revenue and the surrounding area's demographic characteristics.

    Add Widgets

    As seen in the table summaries above, there are a variety of demographic attributes in the isochrone Layer that would be helpful to better understand the characteristics around each store.

    To make this information available while exploring each location on the map, you can add each attribute as a Widget. For this case specifically, you will use Formula Widgets to summarize the demographic variables and a Category Widget on the categorical attribute of id_store.

    To add Widgets, you first need to import the types that you want to use and then, inside of the iso_gdf Layer add one widget for each attribute of interest. The Formula Widget accepts different types of aggregations. For this map, you will aggregate each demographic variable using sum so the totals update as you zoom, pan and interact with the map. You will also label each Widget appropriately using the title parameter.

    
    
    In [5]:
    from cartoframes.viz import formula_widget, category_widget
    
    Map([
        Layer(
            iso_gdf,
            widgets=[
                formula_widget(
                    'popcy',
                    'sum',
                    title='Total Population Served'
                ),
                formula_widget(
                    'inccymedhh',
                    'sum',
                    title='Median Income ($)'
                ),
                formula_widget(
                    'lbfcyempl',
                    'sum',
                    title='Employed Population',
                ),
                formula_widget(
                    'educybach',
                    'sum',
                    title='Number of People with Bachelor Degree',
                ),
                category_widget(
                    'id_store',
                    title='Store ID'
                )
            ]
        ),
        Layer(
            stores_gdf
        )
    ])
    
    
    
    
    Out[5]:
    :
    StackTrace
      ">

      At this point, take a few minutes to explore the map to see how the Widget values update. For example, select a Store ID from the Category Widget to summarize the demographics for a particular store. Alternatively, zoom and pan the map to get summary statistics for the features in the current map view.

      Add Popups

      In order to aid this map-based exploration, import the Popup class and use the hover option on the iso_gdf Layer to be able to quickly hover over stores and get their ID:

      
      
      In [6]:
      from cartoframes.viz import popup_element
      
      Map([
          Layer(
              iso_gdf,
              widgets=[
                  formula_widget(
                      'popcy',
                      'sum',
                      title='Total Population Served'
                  ),
                  formula_widget(
                      'inccymedhh',
                      'sum',
                      title='Median Income ($)'
                  ),
                  formula_widget(
                      'lbfcyempl',
                      'sum',
                      title='Employed Population',
                  ),
                  formula_widget(
                      'educybach',
                      'sum',
                      title='Number of People with Bachelor Degree',
                  ),
                  category_widget(
                      'id_store',
                      title='Store ID'
                  )
              ],
              popup_hover=[
                  popup_element('id_store', 'Store ID')
              ]
          ),
          Layer(
              stores_gdf
          )
      ])
      
      
      
      
      Out[6]:
      :
      StackTrace
        ">

        Now, as you explore the map and summarize demographics, it is much easier to relate the summarized values to a unique store ID.

        Symbolize Store Points

        At this point, you have some really useful information available on the map but only coming from the isochrone Layer. Sizing the store points by the attribute revenue will provide a way to visually locate which stores are performing better than others. A quick way to visualize numeric or categorical attributes during the data exploration process is to take advantage of visualization styles.

        To size the store points proportionate to their revenue, you'll use the size_continuous_style:

        
        
        In [7]:
        from cartoframes.viz import size_continuous_style
        
        Map([
            Layer(
                iso_gdf,
                widgets=[
                    formula_widget(
                        'popcy',
                        'sum',
                        title='Total Population Served'
                    ),
                    formula_widget(
                        'inccymedhh',
                        'sum',
                        title='Median Income ($)'
                    ),
                    formula_widget(
                        'lbfcyempl',
                        'sum',
                        title='Employed Population',
                    ),
                    formula_widget(
                        'educybach',
                        'sum',
                        title='Number of People with Bachelor Degree',
                    ),
                    category_widget(
                        'id_store',
                        title='Store ID'
                    )
                ],
                popup_hover=[
                    popup_element('id_store', title='Store ID')
                ]
            ),
            Layer(
                stores_gdf,
                size_continuous_style('revenue')
            )
        ])
        
        
        
        
        Out[7]:
        :
        StackTrace
          ">

          Now you have a proportional symbol map where points are sized by revenue. You will also notice that an appropriate legend has been added to the map and when you hover over the points, you will see each store's revenue value.

          Next, let's take a look at how to modify some of the defaults.

          Every Visualization Layer has a set of parameters available to customize the defaults to better suit a given map. A quick way to see which parameters are available for customization in the size_continuous_style, is to run help(size_continuous_style) in a notebook cell.

          Let's make a few adjustments to make it easier to distinguish and locate the highest and lowest performing stores:

          • The continuous point size reads between a minimum and maximum range of symbol sizes. Since the smallest revenue value on this map is hard to see, set size_range=[10,50]
          • By default both the Legend and Popup titles are set to the attribute being visualized. To give them more descriptive titles, set title=Annual Revenue ($)
          • In order to see and interact with the distribution of revenue values, you can also add a Histogram Widget (turned off by default) by setting default_widget=True
          
          
          In [8]:
          from cartoframes.viz import size_continuous_style
          
          Map([
              Layer(
                  iso_gdf,
                  widgets=[
                      formula_widget(
                          'popcy',
                          'sum',
                          title='Total Population Served'
                      ),
                      formula_widget(
                          'inccymedhh',
                          'sum',
                          title='Median Income ($)'
                      ),
                      formula_widget(
                          'lbfcyempl',
                          'sum',
                          title='Employed Population',
                      ),
                      formula_widget(
                          'educybach',
                          'sum',
                          title='Number of People with Bachelor Degree',
                      ),
                      category_widget(
                          'id_store',
                          title='Store ID'
                      )
                  ],
                  popup_hover=[
                      popup_element('id_store', 'Store ID')
                  ]
              ),
              Layer(
                  stores_gdf,
                  size_continuous_style(
                      'revenue',
                      size_range=[10,50]
                  ),
                  title='Annual Revenue ($)',
                  default_widget=True
              )
          ])
          
          
          
          
          Out[8]:
          :
          StackTrace
            ">

            And now you have a map to visually and interactively explore the relationship between revenue and demographic variables for each store:

            Insights

            The map above provides a way to explore the data both visually and interactively in different ways:

            • you can almost instantaneously locate higher and lower performing stores based on the symbol sizes
            • you can zoom in on any store to summarize demographic characteristics
            • you can quickly find out the store ID by hovering on it
            • you can select a range of revenues from the Histogram Widget and have the map update to only display those stores
            • you can use the Store ID Category Widget to isolate a particular store and summarize values

            Use the map to see if you can find the highest and lowest performing stores and summarize the demographic characteristics of each one!

            Present Insights

            Now that you have gained insight into the relationship between revenue and demographics, let's say that the most influential factor of how well a store performed was median income and you want to create a map to show that particular relationship.

            To show this, the map below uses another Visualization Layer, this time the color_bins_style to color each isochrone according to the range of median household income it falls within. Additionally, the size_continuous_style used in the previous map has been further customized to account for the new thematic median income style, and the store points have been added again as a third Layer to show their location and ID on hover. The map also has a custom viewport set to center it on the highest performing (A) and lowest performing (J) stores that have similar median income values.

            
            
            In [9]:
            from cartoframes.viz import Map, Layer, color_bins_style, size_continuous_style, default_legend
            
            Map([
                Layer(
                    iso_gdf,
                    style=color_bins_style(
                        'inccymedhh',
                        bins=7,
                        palette='pinkyl',
                        opacity=0.8,
                        stroke_width=0
                    ),
                    legends=default_legend(title='Median Household Income ($)', footer='Source: US Census Bureau')
                ),
                Layer(
                    stores_gdf,
                    style=size_continuous_style(
                        'revenue',
                        size_range=[10,50],
                        range_max=1000000,
                        opacity=0,
                        stroke_color='turquoise',
                        stroke_width=2
                    ),
                    legends=default_legend(title='Legend Title')
                ),
                Layer(
                    stores_gdf,
                    popup_hover=[
                        popup_element('id_store', title='Store ID')
                    ]
                )
            ], viewport={'zoom': 12, 'lat': 40.644417, 'lng': -73.934710})
            
            
            
            
            Out[9]:
            Source: US Census Bureau
            :
            StackTrace
              ">

              Compare Variables with a Layout

              If you want to compare store revenue with multiple demographic variables, you can create a Layout with multiple maps.

              In the example below, one map symbolizes annual revenue and the other three maps symbolize three demographic variables that use the same color palette where yellow is low and red is high. Each map has a title to label which attribute is being mapped.

              
              
              In [10]:
              from cartoframes.viz import Layout
              
              Layout([
                  Map([
                      Layer(
                          stores_gdf,
                          style=size_continuous_style(
                              'revenue',
                              size_range=[10,50],
                              range_max=1000000,
                              opacity=0,
                              stroke_color='turquoise',
                              stroke_width=2
                          ),
                          legends=default_legend(title='Annual Revenue'),
                          default_popup_hover=False,
                          default_popup_click=False
                      ),
                      Layer(stores_gdf)
                  ]),
                  Map([
                      Layer(
                          iso_gdf,
                          style=color_bins_style(
                              'inccymedhh',
                              bins=7,
                              palette='pinkyl',
                              stroke_width=0
                          ),
                          legends=default_legend(title='Median Income'),
                          default_popup_hover=False,
                          default_popup_click=False
                      ),
                      Layer(stores_gdf)
                  ]),
                  Map([
                      Layer(iso_gdf,
                          style=color_bins_style(
                              'popcy',
                              bins=7,
                              palette='pinkyl',
                              stroke_width=0
                          ),
                          legends=default_legend(title='Total Pop'),
                          default_popup_hover=False,
                          default_popup_click=False
                      ),
                      Layer(stores_gdf)
                  ]),
                  Map([
                      Layer(
                          iso_gdf,
                          style=color_bins_style(
                              'lbfcyempl',
                              bins=7,
                              palette='pinkyl',
                              stroke_width=0
                          ),
                          legends=default_legend(title='Employed Pop'),
                          default_popup_hover=False,
                          default_popup_click=False
                      ),
                      Layer(stores_gdf)
                  ]),
              ],2,2,viewport={'zoom': 10, 'lat': 40.64, 'lng': -73.92}, map_height=400, is_static=True)
              
              
              
              
              Out[10]:
              :
              StackTrace
                ">

                Conclusion

                In this guide you were introduced to the Map and Layer classes, saw how to explore data with Widgets and Popups, and how to use visualization styles to quickly symbolize thematic attributes. You also saw some options for creating different maps of your findings.