Data management

Introduction

CARTOframes is built on top of Pandas and GeoPandas. Therefore, it's compatible with all the data formats supported in those projects like CSV, GeoJSON, Shapefile, etc. This guide will show how to load different data files into DataFrames and how to interact with the CARTO platform to upload DataFrames into tables and download tables or SQL queries into DataFrames.

There are two main concepts we should know before continuing with the guide:

A DataFrame is a two-dimensional data structure for generic data. It can be thought of as a table with rows and columns. It's composed of Series objects which are one-dimensional data structures.
A GeoDataFrame is a DataFrame with an extra geometry column. This geometry column is a GeoSeries object.

Every time we manage Geographic data, a GeoDataFrame should be used. In case a DataFrame with an encoded geometry column is used (WKB, WKT, etc) every method contains a geom_col param to provide the name of that column and decode the geometry internally.

For more information, you can see all the examples here.

Steps

To show how to manage your data with CARTOframes, we will follow the next steps:

Load San Francisco neighborhoods from a GeoJSON file
Load San Francisco incidents from a CSV file
Upload data to CARTO
Calculate their intersection using a SQL query
Download the result

Load San Francisco neighborhoods from a GeoJSON file

Let's start by loading and visualizing San Francisco neighborhoods reading an external GeoJSON file.



In [1]:

    
from geopandas import read_file

neighborhoods_gdf = read_file('https://data.sfgov.org/api/geospatial/pty2-tcw4?method=export&format=GeoJSON')
neighborhoods_gdf.head()









    Out[1]:







  
    
      
      link
      name
      geometry
    
  
  
    
      0
      http://en.wikipedia.org/wiki/Sea_Cliff,_San_Fr...
      Seacliff
      MULTIPOLYGON (((-122.49346 37.78352, -122.4937...
    
    
      1
      None
      Lake Street
      MULTIPOLYGON (((-122.48715 37.78379, -122.4872...
    
    
      2
      http://www.nps.gov/prsf/index.htm
      Presidio National Park
      MULTIPOLYGON (((-122.47758 37.81099, -122.4771...
    
    
      3
      None
      Presidio Terrace
      MULTIPOLYGON (((-122.47241 37.78735, -122.4710...
    
    
      4
      http://www.sfgate.com/neighborhoods/sf/innerri...
      Inner Richmond
      MULTIPOLYGON (((-122.47263 37.78631, -122.4668...



In [2]:

    
from cartoframes.viz import Layer

Layer(neighborhoods_gdf)

Load San Francisco incidents from a CSV file

Let's do the same with San Francisco incidents. In this case, we will work with an external CSV file.



In [3]:

    
from pandas import read_csv
from geopandas import GeoDataFrame, points_from_xy

df = read_csv('http://data.sfgov.org/resource/wg3w-h783.csv')

# Clean NaN values
df = df[df['longitude'].notna()]

incidents_gdf = GeoDataFrame(df, geometry=points_from_xy(df['longitude'], df['latitude']))
incidents_gdf.head()









    Out[3]:







  
    
      
      incident_datetime
      incident_date
      incident_time
      incident_year
      incident_day_of_week
      report_datetime
      row_id
      incident_id
      incident_number
      cad_number
      ...
      :@computed_region_qgnn_b9vv
      :@computed_region_26cr_cadq
      :@computed_region_ajp5_b2md
      :@computed_region_nqbw_i6c3
      :@computed_region_2dwj_jsy4
      :@computed_region_h4ep_8xdi
      :@computed_region_y6ts_4iup
      :@computed_region_jg9y_a9du
      :@computed_region_6pnf_4xz7
      geometry
    
  
  
    
      4
      2020-02-03T14:45:00.000
      2020-02-03T00:00:00.000
      14:45
      2020
      Monday
      2020-02-03T17:50:00.000
      89881675000
      898816
      200085557
      200342870.0
      ...
      10.0
      8.0
      16.0
      NaN
      NaN
      NaN
      NaN
      NaN
      2.0
      POINT (-122.47604 37.72695)
    
    
      6
      2020-02-03T03:45:00.000
      2020-02-03T00:00:00.000
      03:45
      2020
      Monday
      2020-02-03T03:45:00.000
      89860711012
      898607
      200083749
      200340316.0
      ...
      3.0
      2.0
      20.0
      3.0
      NaN
      NaN
      NaN
      NaN
      2.0
      POINT (-122.41517 37.75244)
    
    
      7
      2020-02-03T10:00:00.000
      2020-02-03T00:00:00.000
      10:00
      2020
      Monday
      2020-02-03T10:06:00.000
      89867264015
      898672
      200084060
      200340808.0
      ...
      5.0
      3.0
      8.0
      NaN
      35.0
      NaN
      NaN
      NaN
      2.0
      POINT (-122.40734 37.78456)
    
    
      9
      2020-01-05T00:00:00.000
      2020-01-05T00:00:00.000
      00:00
      2020
      Sunday
      2020-02-03T16:09:00.000
      89877368020
      898773
      200085193
      200342341.0
      ...
      4.0
      6.0
      30.0
      NaN
      NaN
      NaN
      NaN
      NaN
      1.0
      POINT (-122.44025 37.78711)
    
    
      10
      2020-02-03T08:36:00.000
      2020-02-03T00:00:00.000
      08:36
      2020
      Monday
      2020-02-03T08:36:00.000
      89876268020
      898762
      200083909
      200340826.0
      ...
      6.0
      3.0
      8.0
      NaN
      NaN
      NaN
      NaN
      NaN
      1.0
      POINT (-122.39951 37.79693)
    
  

5 rows × 37 columns



In [4]:

    
from cartoframes.viz import Layer

Layer(incidents_gdf)

Upload data to CARTO

Let's upload both GeoDataFrames to CARTO so we can see how to interact with the platform. In order to continue, you have to set your CARTO credentials. If you aren't sure about your API key, check the Authentication guide to learn how to get it.



In [5]:

    
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')



In [6]:

    
from cartoframes import to_carto


neighborhoods_table = 'sf_neighborhoods'
incidents_table = 'sf_incidents'

to_carto(neighborhoods_gdf, neighborhoods_table, if_exists='replace')
to_carto(incidents_gdf, incidents_table, if_exists='replace')









    



Success! Data uploaded to table "sf_neighborhoods" correctly
Success! Data uploaded to table "sf_incidents" correctly






    Out[6]:





'sf_incidents'

Now that we have uploaded the data, we can directly visualize the tables using:

Layer(neighborhoods_table)
Layer(incidents_table)

Calculate their intersection using a SQL query

Let's see how we can apply a SQL query to inserct both tables and download the result of the query tp visualize it.



In [7]:

    
from cartoframes import read_carto

incidents_neighborhoods_gdf = read_carto('''
    SELECT n.cartodb_id, n.the_geom, n.the_geom_webmercator, n.name, COUNT(*) AS incidents
    FROM sf_incidents i INNER JOIN sf_neighborhoods n ON ST_Intersects(i.the_geom, n.the_geom)
    GROUP BY n.cartodb_id
''')
incidents_neighborhoods_gdf.head()









    Out[7]:







  
    
      
      cartodb_id
      the_geom
      name
      incidents
    
  
  
    
      0
      828
      MULTIPOLYGON (((-122.51314 37.77133, -122.5110...
      Golden Gate Park
      7
    
    
      1
      934
      MULTIPOLYGON (((-122.43734 37.76235, -122.4370...
      Eureka Valley
      4
    
    
      2
      908
      MULTIPOLYGON (((-122.39899 37.71534, -122.3988...
      Little Hollywood
      2
    
    
      3
      850
      MULTIPOLYGON (((-122.38452 37.78739, -122.3843...
      South Beach
      7
    
    
      4
      894
      MULTIPOLYGON (((-122.41622 37.70833, -122.4159...
      Visitacion Valley
      4



In [8]:

    
from cartoframes.viz import color_continuous_style

Layer(incidents_neighborhoods_gdf, style=color_continuous_style('incidents'))

Conclusion

Congratulations! You have seen how to load data locally, upload it to CARTO, apply a SQL query and download the results. We recommend to upload your data to CARTO when it is too big (> 30MB) to be visualized from a GeoDataFrame or when you want to apply PostGIS queries.

	link	name	geometry
0	http://en.wikipedia.org/wiki/Sea_Cliff,_San_Fr...	Seacliff	MULTIPOLYGON (((-122.49346 37.78352, -122.4937...
1	None	Lake Street	MULTIPOLYGON (((-122.48715 37.78379, -122.4872...
2	http://www.nps.gov/prsf/index.htm	Presidio National Park	MULTIPOLYGON (((-122.47758 37.81099, -122.4771...
3	None	Presidio Terrace	MULTIPOLYGON (((-122.47241 37.78735, -122.4710...
4	http://www.sfgate.com/neighborhoods/sf/innerri...	Inner Richmond	MULTIPOLYGON (((-122.47263 37.78631, -122.4668...

	incident_datetime	incident_date	incident_time	incident_year	incident_day_of_week	report_datetime	row_id	incident_id	incident_number	cad_number	...	:@computed_region_qgnn_b9vv	:@computed_region_26cr_cadq	:@computed_region_ajp5_b2md	:@computed_region_nqbw_i6c3	:@computed_region_2dwj_jsy4	:@computed_region_h4ep_8xdi	:@computed_region_y6ts_4iup	:@computed_region_jg9y_a9du	:@computed_region_6pnf_4xz7	geometry
4	2020-02-03T14:45:00.000	2020-02-03T00:00:00.000	14:45	2020	Monday	2020-02-03T17:50:00.000	89881675000	898816	200085557	200342870.0	...	10.0	8.0	16.0	NaN	NaN	NaN	NaN	NaN	2.0	POINT (-122.47604 37.72695)
6	2020-02-03T03:45:00.000	2020-02-03T00:00:00.000	03:45	2020	Monday	2020-02-03T03:45:00.000	89860711012	898607	200083749	200340316.0	...	3.0	2.0	20.0	3.0	NaN	NaN	NaN	NaN	2.0	POINT (-122.41517 37.75244)
7	2020-02-03T10:00:00.000	2020-02-03T00:00:00.000	10:00	2020	Monday	2020-02-03T10:06:00.000	89867264015	898672	200084060	200340808.0	...	5.0	3.0	8.0	NaN	35.0	NaN	NaN	NaN	2.0	POINT (-122.40734 37.78456)
9	2020-01-05T00:00:00.000	2020-01-05T00:00:00.000	00:00	2020	Sunday	2020-02-03T16:09:00.000	89877368020	898773	200085193	200342341.0	...	4.0	6.0	30.0	NaN	NaN	NaN	NaN	NaN	1.0	POINT (-122.44025 37.78711)
10	2020-02-03T08:36:00.000	2020-02-03T00:00:00.000	08:36	2020	Monday	2020-02-03T08:36:00.000	89876268020	898762	200083909	200340826.0	...	6.0	3.0	8.0	NaN	NaN	NaN	NaN	NaN	1.0	POINT (-122.39951 37.79693)

	cartodb_id	the_geom	name	incidents
0	828	MULTIPOLYGON (((-122.51314 37.77133, -122.5110...	Golden Gate Park	7
1	934	MULTIPOLYGON (((-122.43734 37.76235, -122.4370...	Eureka Valley	4
2	908	MULTIPOLYGON (((-122.39899 37.71534, -122.3988...	Little Hollywood	2
3	850	MULTIPOLYGON (((-122.38452 37.78739, -122.3843...	South Beach	7
4	894	MULTIPOLYGON (((-122.41622 37.70833, -122.4159...	Visitacion Valley	4