Automating GIS-processes - Lecture 7: Python GIS

Working with spatial data using Pandas/Geopandas

Geopandas is a Python module that extends the functionalities of Pandas data analysis library (that takes advantage of numpy module). Geopandas makes possible to work with spatial data such as Shapefiles. It is also possible to work with rasters using geopandas with rasterio module (see example).

Reading a Shapefile

Spatial data can be read easily with geopandas using gpd.from_file() function:


In [1]:
# Import necessary modules
import geopandas as gpd   # geopandas is usually shortened as gpd

# Set filepath
fp = r"C:\HY-Data\HENTENKA\Data\DAMSELFISH_distributions.shp"

# Read file using gpd.read_file()
data = gpd.read_file(fp)

# Let's see what we got --> print the 'head' of the file (first 5 rows)
data.head()


Out[1]:
ORIG_FID binomial category citation class_name compiler dist_comm family_nam genus_name geometry ... rl_update seasonal shape_Area shape_Leng source species_na subpop subspecies tax_comm year
0 0 Stegastes leucorus VU International Union for Conservation of Nature... ACTINOPTERYGII IUCN None POMACENTRIDAE Stegastes POLYGON ((-115.6437454219999 29.71392059300007... ... 2012.1 1 28.239363 82.368856 None leucorus None None None 2010
1 0 Stegastes leucorus VU International Union for Conservation of Nature... ACTINOPTERYGII IUCN None POMACENTRIDAE Stegastes POLYGON ((-105.589950704 21.89339825500002, -1... ... 2012.1 1 28.239363 82.368856 None leucorus None None None 2010
2 0 Stegastes leucorus VU International Union for Conservation of Nature... ACTINOPTERYGII IUCN None POMACENTRIDAE Stegastes POLYGON ((-111.159618439 19.01535626700007, -1... ... 2012.1 1 28.239363 82.368856 None leucorus None None None 2010
3 1 Chromis intercrusma LC International Union for Conservation of Nature... ACTINOPTERYGII IUCN None POMACENTRIDAE Chromis POLYGON ((-80.86500229899997 -0.77894492099994... ... 2012.1 1 87.461539 729.012180 None intercrusma None None None 2010
4 1 Chromis intercrusma LC International Union for Conservation of Nature... ACTINOPTERYGII IUCN None POMACENTRIDAE Chromis POLYGON ((-67.33922225599997 -55.6761029239999... ... 2012.1 1 87.461539 729.012180 None intercrusma None None None 2010

5 rows × 27 columns

Writing a Shapefile

We can select for example 50 rows of the input data and write those into a new Shapefile by first selecting the data using pandas functionalities and then writing the selection with gpd.to_file() function:


In [2]:
# Create a output path for the data
out = r"C:\HY-Data\HENTENKA\Data\DAMSELFISH_distributions_SELECTION.shp"

# Select first 50 rows
selection = data[0:50]

# Print the head of our selection
print(selection.head())

# Write those rows into a new Shapefile
selection.to_file(out, driver="ESRI Shapefile")  # drivers makes it possible to write to different file formats


   ORIG_FID             binomial category  \
0         0   Stegastes leucorus       VU   
1         0   Stegastes leucorus       VU   
2         0   Stegastes leucorus       VU   
3         1  Chromis intercrusma       LC   
4         1  Chromis intercrusma       LC   

                                            citation      class_name compiler  \
0  International Union for Conservation of Nature...  ACTINOPTERYGII     IUCN   
1  International Union for Conservation of Nature...  ACTINOPTERYGII     IUCN   
2  International Union for Conservation of Nature...  ACTINOPTERYGII     IUCN   
3  International Union for Conservation of Nature...  ACTINOPTERYGII     IUCN   
4  International Union for Conservation of Nature...  ACTINOPTERYGII     IUCN   

  dist_comm     family_nam genus_name  \
0      None  POMACENTRIDAE  Stegastes   
1      None  POMACENTRIDAE  Stegastes   
2      None  POMACENTRIDAE  Stegastes   
3      None  POMACENTRIDAE    Chromis   
4      None  POMACENTRIDAE    Chromis   

                                            geometry  ...   rl_update  \
0  POLYGON ((-115.6437454219999 29.71392059300007...  ...      2012.1   
1  POLYGON ((-105.589950704 21.89339825500002, -1...  ...      2012.1   
2  POLYGON ((-111.159618439 19.01535626700007, -1...  ...      2012.1   
3  POLYGON ((-80.86500229899997 -0.77894492099994...  ...      2012.1   
4  POLYGON ((-67.33922225599997 -55.6761029239999...  ...      2012.1   

  seasonal shape_Area  shape_Leng source   species_na subpop  subspecies  \
0        1  28.239363   82.368856   None     leucorus   None        None   
1        1  28.239363   82.368856   None     leucorus   None        None   
2        1  28.239363   82.368856   None     leucorus   None        None   
3        1  87.461539  729.012180   None  intercrusma   None        None   
4        1  87.461539  729.012180   None  intercrusma   None        None   

   tax_comm  year  
0      None  2010  
1      None  2010  
2      None  2010  
3      None  2010  
4      None  2010  

[5 rows x 27 columns]

Geometries in Geopandas

Geopandas takes advantage of Shapely's geometric objects. Geometries are stored in a column called geometry that is a default column name for storing geometric information in geopandas.


In [3]:
# Print first 5 rows of the column 'geometry'
# -------------------------------------------

# It is possible to use only specific columns by specifying the column name within square brackets []
data['geometry'].head()


Out[3]:
0    POLYGON ((-115.6437454219999 29.71392059300007...
1    POLYGON ((-105.589950704 21.89339825500002, -1...
2    POLYGON ((-111.159618439 19.01535626700007, -1...
3    POLYGON ((-80.86500229899997 -0.77894492099994...
4    POLYGON ((-67.33922225599997 -55.6761029239999...
Name: geometry, dtype: object

Since spatial data is stored as Shapely objects, it is possible to use all of the functionalities of Shapely module that we practiced earlier:


In [4]:
# Print the areas of the first 5 polygons
# ---------------------------------------

# Make a selection that contains only the first five rows
selection = data[0:5]

# Iterate over the selected rows by using a for loop in a pandas function called '.iterrows()'
for row_num, row in selection.iterrows():
    # Calculate the area of the polygon
    poly_area = row['geometry'].area
    # Print information for the user
    print("Polygon area at index %s is: %s" % (row_num, poly_area))
    
# Outputs:


Polygon area at index 0 is: 19.39625403004423
Polygon area at index 1 is: 6.145902112999523
Polygon area at index 2 is: 2.6972072716440176
Polygon area at index 3 is: 87.46062072709621
Polygon area at index 4 is: 0.0009183696153124292

Creating GeoDataFrame and exporting it to Shapefile

Since geopandas takes advantage of Shapely geometric objects it is possible to create a Shapefile from a scratch or using e.g. a text file that contains coordinates.

Let's create an empty GeoDataFrame that is a Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes --> i.e. it is a similar thing as attribute table that also contains geometries. GeoDataFrame extends the functionalities of pandas.DataFrame() in a way that it is possible to use spatial data within pandas (hence the name geopandas).


In [2]:
# Import necessary modules first
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
import fiona

# Create an empty geopandas GeoDataFrame
data = gpd.GeoDataFrame()

# Let's see what's inside
print(data)

# Outputs:


Empty GeoDataFrame
Columns: []
Index: []

The GeoDataFrame is empty since we haven't places any data inside. Let's create a new column called geometry that will contain our Shapely Geometric Object:


In [3]:
# Create a new column called 'geometry' to the GeoDataFrame
data['geometry'] = None

# Let's see what's inside
data

# Outputs:


Out[3]:
geometry

Now we have a geometry column in the GeoDataFrame but we don't have any data yet. Let's create a Shapely Polygon that we can insert to our GeoDataFrame:


In [4]:
# Create a Shapely Polygon that has coordinates in WGS84 projection (i.e. lat, lon)

# Coordinates are corner coordinates of Senaatintori in Helsinki
coordinates = [(24.950899, 60.169158), (24.953492, 60.169158), (24.953510, 60.170104), (24.950958, 60.169990)]

# Create a Shapely polygon from the coordinate-tuple list
poly = Polygon(coordinates)

# Let's see that we really have a polygon
print(poly)

# Outputs:


POLYGON ((24.950899 60.169158, 24.953492 60.169158, 24.95351 60.170104, 24.950958 60.16999, 24.950899 60.169158))

Now we need to insert that polygon into the column 'geometry' of our GeoDataFrame:


In [5]:
# Insert the polygon to the GeoDataFrame
data['geometry'] = [poly]

# Print out the data
data

# Outputs:


Out[5]:
geometry
0 POLYGON ((24.950899 60.169158, 24.953492 60.16...

Now we have a GeoDataFrame with Polygon that we can export to a Shapefile. Let's add another column to our GeoDataFrame called Location with text Senaatintori.


In [6]:
# Add a new column and insert data
data['Location'] = ['Senaatintori']

# Print out the data
data

# Outputs


Out[6]:
geometry Location
0 POLYGON ((24.950899 60.169158, 24.953492 60.16... Senaatintori

Before exporting the data it is useful to determine the projection for the GeoDataFrame.

GeoDataFrame has a property called .crs that shows the coordinatesystem of the data which is empty in our case since we are creating the data from the scratch. If you import a Shapefile into geopandas that has a defined projection, the crs of the GeoDataFrame will automatically be the one of that Shapefile.

A Python module called fiona has a nice function called from_epsg for passing coordinate system for the GeoDataFrame. Next we will use that and determine the projection to WGS84 (epsg code: 4326):


In [7]:
# Check the current coordinate system
print(data.crs)

# Outputs:


None

In [8]:
# Import specific function 'from_epsg' from fiona module
from fiona.crs import from_epsg

# Set the GeoDataFrame's coordinate system to WGS84
data.crs = from_epsg(4326)

# Let's see how the crs definition looks like
data.crs


Out[8]:
{'init': 'epsg:4326', 'no_defs': True}

Finally, we can export the data using GeoDataFrames .to_file function.


In [9]:
# Determine the output path for the Shapefile
out = r"C:\HY-Data\HENTENKA\Data\Senaatintori.shp"

# Write the data into that location
data.to_file(out)

Now we have successfully created a Shapefile from the scratch using only Python programming. Similar approach can be used to for example to read coordinates from a text file (e.g. points) and create Shapefiles from those automatically.

Extra - Create an interactive visualization using bokeh

Following lines of code demonstrate how we can create an interactive map of our Polygon geometry on top of Google Maps using bokeh module. See the result at the end of this page, you can zoom in/out and also move around the map.

In the GIS-lab computers bokeh module is not installed, so unfortunately at the moment it is not possible for you to test this yourself.


In [10]:
# Import bokeh stuff to visualize our map
from bokeh.models.glyphs import Patch
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import GMapPlot, Range1d, ColumnDataSource, PanTool, WheelZoomTool, BoxSelectTool, GMapOptions, ResizeTool

# Plot the data using bokeh - Initialize
output_notebook()

# Parse coordinates from the GeoDataFrames first row [i.e. index --> 0]
x_coords, y_coords = x, y = data['geometry'].values[0].exterior.coords.xy

# Set the ranges for the map ("how far the user can go")
x_range = Range1d(24.9, 25.0)
y_range = Range1d(60.15, 60.2)

# Set the Google Maps to Helsinki
map_options = GMapOptions(lat=60.17, lng=24.952, map_type="roadmap", zoom=15)

# Create an interactive Google map to Helsinki
plot = GMapPlot(
    x_range=x_range, y_range=y_range,
    map_options=map_options,
    title = "Helsinki - Senaatintori",
    plot_width=500,
    plot_height=500
)

# Create a Data source
source = ColumnDataSource(
    data=dict(
        lat=y_coords,
        lon=x_coords
    )
)

# Create the Patch i.e. the polygon
poly_patch = Patch(x="lon",y="lat", fill_color='#DE2D26', fill_alpha=0.8)

# Add Polygon Patch on top of the map
plot.add_glyph(source, poly_patch)

# Add tools
plot.add_tools(PanTool(), WheelZoomTool(), ResizeTool(), BoxSelectTool())

# Show the map
show(plot)


Loading BokehJS ...
C:\HYapp\Anaconda3\lib\site-packages\bokeh\core\properties.py:1941: UserWarning: Setting Plot property 'title' using a string was deprecated in 0.12.0,
            and will be removed. The title is now an object on Plot (which holds all of it's
            styling properties). Please use Plot.title.text instead.

            SERVER USERS: If you were using plot.title to have the server update the plot title
            in a callback, you MUST update to plot.title.text as the title object cannot currently
            be replaced after intialization.
            
  """)

In [11]:
x_coords


Out[11]:
array('d', [24.950899, 24.953492, 24.95351, 24.950958, 24.950899])