A notebook developed by Nick Swanson-Hysell (https://github.com/Swanson-Hysell) for a presentation/tutorial at the Berkeley The Hacker Within Group (Berkeley Institute for Data Science, March 16, 2016; http://www.thehackerwithin.org/berkeley/posts/matplotlib-spring-2016).
This notebook was developed using a Python 3 kernel. The first portion of this Jupyter notebook illustrates some functionality using the base matplotlib module within the Jupyter notebook environment. Subsequent portions of the notebook utilize Basemap, mpld3, folium and bokeh which can be pip installed (pip install Basemap
) or if you are using the Anaconda distribution of Python (conda install Basemap
).
In [1]:
import matplotlib.pyplot as plt
import numpy as np
In [2]:
%matplotlib notebook
In [3]:
x_values = np.linspace(-np.pi, np.pi, 256, endpoint=True)
y_values = np.sin(x_values)
plt.plot(x_values, y_values)
plt.xlabel('x values')
plt.ylabel('y values')
plt.show()
In [4]:
%matplotlib inline
In [5]:
x_values = np.linspace(-np.pi, np.pi, 256, endpoint=True)
y_values = np.sin(x_values)
plt.plot(x_values, y_values)
plt.xlabel('x values')
plt.ylabel('y values')
plt.show()
In [6]:
plt.figure()
x_values = np.linspace(-np.pi, np.pi, 256, endpoint=True)
y_values = np.sin(x_values)
plt.plot(x_values, y_values)
plt.xlim(x_values.min()*1.1, x_values.max()*1.1)
plt.ylim(y_values.min()*1.1, y_values.max()*1.1)
plt.xlabel('x values')
plt.ylabel('y values')
plt.show()
In [7]:
from matplotlib import style
style.available
Out[7]:
In [8]:
style.use('fivethirtyeight')
plt.figure()
x_values = np.linspace(-np.pi, np.pi, 256, endpoint=True)
y_values = np.sin(x_values)
plt.plot(x_values, y_values)
plt.xlim(x_values.min()*1.1, x_values.max()*1.1)
plt.ylim(y_values.min()*1.1, y_values.max()*1.1)
plt.xlabel('x values')
plt.ylabel('y values')
plt.savefig('example_plot.svg')
plt.show()
In [9]:
%config InlineBackend.figure_format='retina'
In [10]:
plt.figure()
x_values = np.linspace(-np.pi, np.pi, 256, endpoint=True)
y_values = np.sin(x_values)
plt.plot(x_values, y_values)
plt.xlim(x_values.min()*1.1, x_values.max()*1.1)
plt.ylim(y_values.min()*1.1, y_values.max()*1.1)
plt.xlabel('x values')
plt.ylabel('y values')
plt.show()
In [11]:
%config InlineBackend.figure_formats = {'svg',}
In [12]:
plt.figure()
x_values = np.linspace(-np.pi, np.pi, 256, endpoint=True)
y_values = np.sin(x_values)
plt.plot(x_values, y_values)
plt.xlim(x_values.min()*1.1, x_values.max()*1.1)
plt.ylim(y_values.min()*1.1, y_values.max()*1.1)
plt.xlabel('x values')
plt.ylabel('y values')
plt.savefig('plot.svg')
plt.show()
As of matplotlib 1.5, pandas dataframes can be plotted by column in plots such as plt.scatter(). To illustrate this capability, let's create a dataframe from a .csv of Pennsylvania fracking data. This dataset contains the production of unconventional gas wells for the month of December, 2015. It was downloaded from the Pennsylvania Department of Environmental Protection Oil and Gas Electronic Reporting website: https://www.paoilandgasreporting.state.pa.us/publicreports/
Importing the pandas module gives us a way to nicely import these data as a Dataframe.
In [15]:
import pandas as pd
In [16]:
PA_Frack_Wells_Dec = pd.read_csv('gas_data/Oil_Gas_Well_Historical_Production_Report.csv')
PA_Frack_Wells_Dec.head()
Out[16]:
In [17]:
PA_Frack_Wells_Dec.describe()
Out[17]:
In matplotlib, it is now possible to pass the name of Dataframe columns to be x and y within plt.scatter() as long as we set data equal to the name of the DataFrame which is PA_Frack_Wells_Dec.
In [18]:
style.use('seaborn-notebook')
plt.figure()
plt.scatter(x='Gas_Production_Days',y='Gas_Quantity',data=PA_Frack_Wells_Dec)
plt.xlabel('number of days of gas production (in Dec 2015)')
plt.ylabel('gas production (thousand cubic feet)')
plt.ylim(-20000,PA_Frack_Wells_Dec.Gas_Quantity.max()*1.1)
plt.xlim(0,32)
plt.show()
The plotting toolkit Basemap can be used to plot this data set of Pennsylvania unconventional gas wells (i.e. fracking) that were producing in December, 2015. To plot these data we need to import Basemap. Basemap is a matplotlib tool kit that you can conda install. The example Basemap gallery (http://matplotlib.org/basemap/users/examples.html) and the list of all availible projections (http://matplotlib.org/basemap/users/mapsetup.html) are quite helpful.
In [19]:
from mpl_toolkits.basemap import Basemap
style.use('seaborn-notebook')
In [20]:
m = Basemap(projection='mill',
llcrnrlat = 39,
llcrnrlon = -83,
urcrnrlat = 43,
urcrnrlon = -72,
resolution='l')
core_x,core_y = m(PA_Frack_Wells_Dec.Well_Longitude.tolist(),PA_Frack_Wells_Dec.Well_Latitude.tolist())
m.drawcoastlines()
m.drawcountries()
m.drawstates()
m.fillcontinents(color='tan',lake_color='lightblue',zorder=0)
m.scatter(x=core_x,y=core_y,c=PA_Frack_Wells_Dec.Gas_Quantity.tolist(),cmap='viridis',vmin=0)
m.colorbar(label='gas quantity (Mcf)')
plt.title('Unconventional gas wells in PA (as of Dec 2015)',)
plt.show()
From plotting these gas wells on a map, we can see that there is a geographical clustering of the wells. The majority of gas in PA is produced from the Marcellus Shale. Let's explore the correspondance between the location of the gas wells and the the extent of the Marecellus Shale by plotting a shapefile of the extent of the Marcellus Formation.
In [21]:
m = Basemap(projection='mill',
llcrnrlat = 39,
llcrnrlon = -83,
urcrnrlat = 43,
urcrnrlon = -72,
resolution='l')
core_x,core_y = m(PA_Frack_Wells_Dec.Well_Longitude.tolist(),PA_Frack_Wells_Dec.Well_Latitude.tolist())
m.drawcoastlines()
m.drawcountries()
m.drawstates()
m.fillcontinents(color='tan',lake_color='lightblue',zorder=0)
m.scatter(x=core_x,y=core_y,c=PA_Frack_Wells_Dec.Gas_Quantity.tolist(),cmap='viridis',vmin=0)
m.readshapefile('./gas_data/marcellus_extent', 'marcellus_extent',color='red',linewidth=1)
m.colorbar(label='gas quantity (Mcf)')
plt.title('Unconventional gas wells in PA (with Marcellus Shale extent)')
plt.savefig('Marcellus_Shale_Map.pdf')
plt.show()
In [22]:
high_producing_wells = PA_Frack_Wells_Dec[PA_Frack_Wells_Dec.Gas_Quantity>300000]
m = Basemap(projection='mill',
llcrnrlat = 39,
llcrnrlon = -83,
urcrnrlat = 43,
urcrnrlon = -72,
resolution='l')
core_x,core_y = m(high_producing_wells.Well_Longitude.tolist(),high_producing_wells.Well_Latitude.tolist())
m.drawcoastlines()
m.drawcountries()
m.drawstates()
m.fillcontinents(color='tan',lake_color='lightblue',zorder=0)
m.scatter(x=core_x,y=core_y,c=high_producing_wells.Gas_Quantity.tolist(),cmap='viridis',vmin=0)
m.readshapefile('./gas_data/marcellus_extent', 'marcellus_extent',color='red',linewidth=1)
m.colorbar(label='gas quantity (Mcf)')
plt.title('high producing unconventional gas wells in PA (> 300,000 Mcf)')
plt.show()
Let's import a dataset of all the earthquakes greater than magnitude 7 between 1900 and 2016 that are in NOAA's National Centers for Environmental Information (NCEI; https://www.ngdc.noaa.gov/) Significant Earthquake Database.
In [23]:
earthquakes_mag7 = pd.read_csv('./earthquake_data/results.tsv',sep='\t')
earthquakes_mag7.drop([358,391,411],inplace=True) #drop lines with empty latitude which are there as empty strings ' '
earthquakes_mag7.columns
Out[23]:
A nice global projection is the Mollweide elliptical equal-area projection. The earthquake locations can be plotted on this map and let's set up a color bar that corresponds to earthquake magnitude.
In [24]:
# lon_0 is central longitude of projection.
# resolution = 'c' means use crude resolution coastlines.
m = Basemap(projection='moll',lon_0=0,resolution='c')
m.drawcoastlines()
m.drawmapboundary(fill_color='white')
earthquake_lon = earthquakes_mag7.LONGITUDE.astype(float).tolist()
earthquake_lat = earthquakes_mag7.LATITUDE.astype(float).tolist()
earthquake_x,earthquake_y = m(earthquake_lon,earthquake_lat)
m.scatter(earthquake_x,earthquake_y,
c=earthquakes_mag7.EQ_MAG_MW.astype(float).tolist(),
cmap='seismic')
m.colorbar(label='Earthquake magnitude')
plt.title("Earthquake locations (M>7) since 1900")
plt.show()
Instead of having the coastlines on the plot for reference, let's plate the boundaries of tectonic plates using the boundaries as compiled in:
Bird, P. (2003), An updated digital model of plate boundaries, Geochem. Geophys. Geosyst., 4, 1027, doi:10.1029/2001GC000252, 3.
In [25]:
m = Basemap(projection='moll',lon_0=0,resolution='c')
m.drawmapboundary(fill_color='white')
m.readshapefile('./earthquake_data/tectonicplates/PB2002_boundaries',
'PB2002_boundaries',color='black',linewidth=1)
earthquake_lon = earthquakes_mag7.LONGITUDE.astype(float).tolist()
earthquake_lat = earthquakes_mag7.LATITUDE.astype(float).tolist()
earthquake_x,earthquake_y = m(earthquake_lon,earthquake_lat)
m.scatter(earthquake_x,earthquake_y,
c=earthquakes_mag7.EQ_MAG_MW.astype(float).tolist(),
cmap='seismic')
m.colorbar(label='Earthquake magnitude')
plt.title("Earthquake locations (M>7) since 1900")
plt.show()
Most of the earthquakes are right at plate boundaries, but some are not. Let's plot polygons of intraplate deformation (mountain belts) to see if those regions might correspond with more of the earthquake locations.
In [26]:
# lon_0 is central longitude of projection.
# resolution = 'c' means use crude resolution coastlines.
m = Basemap(projection='moll',lon_0=0,resolution='c')
m.drawmapboundary(fill_color='white')
m.readshapefile('./earthquake_data/tectonicplates/PB2002_boundaries',
'PB2002_boundaries',color='black',linewidth=1)
m.readshapefile('./earthquake_data/tectonicplates/PB2002_orogens',
'PB2002_orogens',color='green',linewidth=1)
earthquake_lon = earthquakes_mag7.LONGITUDE.astype(float).tolist()
earthquake_lat = earthquakes_mag7.LATITUDE.astype(float).tolist()
earthquake_x,earthquake_y = m(earthquake_lon,earthquake_lat)
m.scatter(earthquake_x,earthquake_y,
c=earthquakes_mag7.EQ_MAG_MW.astype(float).tolist(),
cmap='seismic')
m.colorbar(label='Earthquake magnitude')
plt.title("Earthquake locations (M>7) since 1900")
plt.show()
Earthquakes that have magnitudes this large can be large disasters with loss of human life. This aspect of the dataset is explored in the figures below.
In [27]:
plt.figure()
plt.scatter(x='EQ_MAG_MW',y='TOTAL_DEATHS',data=earthquakes_mag7)
plt.ylabel('total deaths from Earthquake')
plt.xlabel('magnitude of Earthquake')
plt.show()
In [28]:
earthquakes_mag7_highd = earthquakes_mag7[earthquakes_mag7.TOTAL_DEATHS>500]
earthquakes_mag7_lowd = earthquakes_mag7[earthquakes_mag7.TOTAL_DEATHS<500]
In [29]:
m = Basemap(projection='moll',lon_0=0,resolution='c')
m.drawmapboundary(fill_color='white',zorder=0)
m.drawcoastlines(zorder=1)
badquake_lon = earthquakes_mag7_highd.LONGITUDE.astype(float).tolist()
badquake_lat = earthquakes_mag7_highd.LATITUDE.astype(float).tolist()
badquake_year = earthquakes_mag7_highd.YEAR.tolist()
badquake_casualties = earthquakes_mag7_highd.DEATHS.tolist()
badquake_x,badquake_y = m(badquake_lon,badquake_lat)
m.scatter(badquake_x,badquake_y,c='red',s=30,label='deaths > 5000',zorder=10)
notasbadquake_lon = earthquakes_mag7_lowd.LONGITUDE.astype(float).tolist()
notasbadquake_lat = earthquakes_mag7_lowd.LATITUDE.astype(float).tolist()
notasbadquake_x,notasbadquake_y = m(notasbadquake_lon,notasbadquake_lat)
m.scatter(notasbadquake_x,notasbadquake_y,c='green',s=10,label='deaths < 5000')
plt.legend(loc=4)
plt.title("Earthquake locations (M>7) since 1900")
plt.show()
The mpld3 project brings together Matplotlib, the popular Python-based graphing library, and D3js, the popular Javascript library for creating interactive data visualizations for the web. The result is a simple API for exporting your matplotlib graphics to HTML code which can be used within the browser, within standard web pages, blogs, or tools such as the IPython notebook. http://mpld3.github.io/index.html
Using mpld3 the earthquake map can be made interactive with labels associated with point that are activated by a mouse click. A histogram example that plots earthquake depths is also shown.
In [33]:
import mpld3
fig, ax = plt.subplots()
m = Basemap(projection='moll',lon_0=0,resolution='l')
m.drawmapboundary(fill_color='white',zorder=0)
m.drawcoastlines(zorder=1)
scatter = m.scatter(badquake_x,badquake_y,s=30,zorder=2)
ax.set_title("Earthquake locations (M>7) since 1900 with >500 casualties")
labels = earthquakes_mag7_highd.YEAR.astype(float).tolist()
tooltip = mpld3.plugins.PointLabelTooltip(scatter, labels=labels)
mpld3.plugins.connect(fig, tooltip)
mpld3.display()
Out[33]:
In [34]:
bad_quake_w_focal_depth = earthquakes_mag7_highd.dropna(subset = ['FOCAL_DEPTH', 'TOTAL_INJURIES', 'EQ_MAG_MW'])
notasbad_quake_w_focal_depth = earthquakes_mag7_lowd.dropna(subset = ['FOCAL_DEPTH', 'TOTAL_INJURIES', 'EQ_MAG_MW'])
bad_quake_w_focal_depth.head()
Out[34]:
In [35]:
fig = plt.figure()
ax = fig.add_subplot(111, axisbg='#EEEEEE')
ax.grid(color='white', linestyle='solid')
x = np.random.normal(size=1000)
ax.hist(notasbad_quake_w_focal_depth.FOCAL_DEPTH.tolist(),bins=30,
histtype='stepfilled', fc='lightblue', alpha=0.5, label='earthquakes with deaths < 500')
ax.hist(bad_quake_w_focal_depth.FOCAL_DEPTH.tolist(),bins=30,
histtype='stepfilled', fc='red', alpha=0.5, label='earthquakes with deaths > 500')
plt.legend()
plt.xlabel('depth of earthquake (km)')
plt.ylabel('number of earthquakes')
mpld3.display()
Out[35]:
Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via Folium. https://github.com/python-visualization/folium
OSGeo has put together a number of great example geospatial notebooks in this repository (https://github.com/OSGeo/OSGeoLive-Notebooks) including one with a bunch of Folium examples (https://github.com/OSGeo/OSGeoLive-Notebooks/tree/master/projects/FOLIUM).
One example is shown below where the earthquake data are plotted using folium with a custom popup that brings together year and casuality information.
In [38]:
import folium
map_1 = folium.Map(location=[20, 100], zoom_start=4,
tiles='Mapbox Bright')
for n in range(0,len(badquake_lon)):
map_1.simple_marker([badquake_lat[n],badquake_lon[n]],
popup='Year: '+ str(badquake_year[n]) + ' casualties: ' + str(badquake_casualties[n]))
map_1.save(outfile='code_output/earthquakes.html')
map_1
Out[38]:
In [37]:
from bokeh.io import output_file, show
from bokeh.models import (GMapPlot, GMapOptions, ColumnDataSource, Circle, DataRange1d, PanTool, WheelZoomTool, BoxSelectTool)
map_options = GMapOptions(lat=20, lng=100, map_type="roadmap", zoom=4)
plot = GMapPlot(x_range=DataRange1d(),
y_range=DataRange1d(),
map_options=map_options, title="Earthquakes")
source = ColumnDataSource(
data=dict(
lat=badquake_lat,
lon=badquake_lon,
)
)
circle = Circle(x="lon", y="lat", size=15, fill_color="blue", fill_alpha=0.8, line_color=None)
plot.add_glyph(source, circle)
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())
output_file("code_output/gmap_plot.html")
show(plot)
In [ ]: