Working with Surface Observations in Siphon and MetPy

Unidata Python Workshop


Overview:

  • Teaching: 20 minutes
  • Exercises: 20 minutes

Questions

  1. What's the best way to get surface station data from a THREDDS data server?
  2. What's the best way to make a station plot of data?
  3. How can I request a time series of data for a single station?

Objectives

  1. Use the netCDF Subset Service (NCSS) to request a portion of the data
  2. Download data for a single time across stations and create a station plot
  3. Request a time series of data and plot

1. Using NCSS to get point data


In [ ]:
from siphon.catalog import TDSCatalog

# copied from the browser url box
metar_cat_url = ('http://thredds.ucar.edu/thredds/catalog/'
                 'irma/metar/catalog.xml?dataset=irma/metar/Metar_Station_Data_-_Irma_fc.cdmr')

# Parse the xml
catalog = TDSCatalog(metar_cat_url)

# what datasets are here?
print(list(catalog.datasets))

In [ ]:
metar_dataset = catalog.datasets['Feature Collection']

Once we've grabbed the "Feature Collection" dataset, we can request a subset of the data:


In [ ]:
# Can safely ignore the warnings
ncss = metar_dataset.subset()

What variables do we have available?


In [ ]:
ncss.variables

Top


2. Making a station plot

  • Make new NCSS query
  • Request data closest to a time

In [ ]:
from datetime import datetime

query = ncss.query()
query.lonlat_box(north=34, south=24, east=-80, west=-90)
query.time(datetime(2017, 9, 10, 12))
query.variables('temperature', 'dewpoint', 'altimeter_setting',
                'wind_speed', 'wind_direction', 'sky_coverage')
query.accept('csv')

In [ ]:
# Get the data
data = ncss.get_data(query)
data

Now we need to pull apart the data and perform some modifications, like converting winds to components and convert sky coverage percent to codes (octets) suitable for plotting.


In [ ]:
import numpy as np

import metpy.calc as mpcalc
from metpy.units import units

# Since we used the CSV data, this is just a dictionary of arrays
lats = data['latitude']
lons = data['longitude']
tair = data['temperature']
dewp = data['dewpoint']
alt = data['altimeter_setting']

# Convert wind to components
u, v = mpcalc.wind_components(data['wind_speed'], data['wind_direction'] * units.degree)

# Need to handle missing (NaN) and convert to proper code
cloud_cover = 8 * data['sky_coverage'] / 100.
cloud_cover[np.isnan(cloud_cover)] = 10
cloud_cover = cloud_cover.astype(np.int)

# For some reason these come back as bytes instead of strings
stid = np.array([s.tostring().decode() for s in data['station']])

Create the map using cartopy and MetPy!

One way to create station plots with MetPy is to create an instance of StationPlot and call various plot methods, like plot_parameter, to plot arrays of data at locations relative to the center point.

In addition to plotting values, StationPlot has support for plotting text strings, symbols, and plotting values using custom formatting.

Plotting symbols involves mapping integer values to various custom font glyphs in our custom weather symbols font. MetPy provides mappings for converting WMO codes to their appropriate symbol. The sky_cover function below is one such mapping.


In [ ]:
%matplotlib inline

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt

from metpy.plots import StationPlot, sky_cover

# Set up a plot with map features
fig = plt.figure(figsize=(12, 12))
proj = ccrs.Stereographic(central_longitude=-95, central_latitude=35)
ax = fig.add_subplot(1, 1, 1, projection=proj)
ax.add_feature(cfeature.STATES, edgecolor='black')
ax.coastlines(resolution='50m')
ax.gridlines()

# Create a station plot pointing to an Axes to draw on as well as the location of points
stationplot = StationPlot(ax, lons, lats, transform=ccrs.PlateCarree(),
                          fontsize=12)
stationplot.plot_parameter('NW', tair, color='red')

# Add wind barbs
stationplot.plot_barb(u, v)

# Plot the sky cover symbols in the center. We give it the integer code values that
# should be plotted, as well as a mapping class that can convert the integer values
# to the appropriate font glyph.
stationplot.plot_symbol('C', cloud_cover, sky_cover)

Notice how there are so many overlapping stations? There's a utility in MetPy to help with that: reduce_point_density. This returns a mask we can apply to data to filter the points.


In [ ]:
# Project points so that we're filtering based on the way the stations are laid out on the map
proj = ccrs.Stereographic(central_longitude=-95, central_latitude=35)
xy = proj.transform_points(ccrs.PlateCarree(), lons, lats)

# Reduce point density so that there's only one point within a 200km circle
mask = mpcalc.reduce_point_density(xy, 200000)

Now we just plot with arr[mask] for every arr of data we use in plotting.


In [ ]:
# Set up a plot with map features
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(1, 1, 1, projection=proj)
ax.add_feature(cfeature.STATES, edgecolor='black')
ax.coastlines(resolution='50m')
ax.gridlines()

# Create a station plot pointing to an Axes to draw on as well as the location of points
stationplot = StationPlot(ax, lons[mask], lats[mask], transform=ccrs.PlateCarree(),
                          fontsize=12)
stationplot.plot_parameter('NW', tair[mask], color='red')
stationplot.plot_barb(u[mask], v[mask])
stationplot.plot_symbol('C', cloud_cover[mask], sky_cover)

More examples for MetPy Station Plots:

EXERCISE:
  • Modify the station plot (reproduced below) to include dewpoint, altimeter setting, as well as the station id. The station id can be added using the `plot_text` method on `StationPlot`.
  • Re-mask the data to be a bit more finely spaced, say: 75km
  • Bonus Points: Use the `formatter` argument to `plot_parameter` to only plot the 3 significant digits of altimeter setting. (Tens, ones, tenths)

In [ ]:
# Use reduce_point_density

# Set up a plot with map features
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(1, 1, 1, projection=proj)
ax.add_feature(cfeature.STATES, edgecolor='black')
ax.coastlines(resolution='50m')
ax.gridlines()

# Create a station plot pointing to an Axes to draw on as well as the location of points

# Plot dewpoint

# Plot altimeter setting--formatter can take a function that formats values

# Plot station id

# Use reduce_point_density
mask = mpcalc.reduce_point_density(xy, 75000)

\# Set up a plot with map features
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(1, 1, 1, projection=proj)
ax.add_feature(cfeature.STATES, edgecolor='black')
ax.coastlines(resolution='50m')
ax.gridlines()

\# Create a station plot pointing to an Axes to draw on as well as the location of points
stationplot = StationPlot(ax, lons[mask], lats[mask], transform=ccrs.PlateCarree(),
                          fontsize=12)
stationplot.plot_parameter('NW', tair[mask], color='tab:red')
stationplot.plot_barb(u[mask], v[mask])
stationplot.plot_symbol('C', cloud_cover[mask], sky_cover)

\# Plot dewpoint
stationplot.plot_parameter('SW', dewp[mask], color='tab:green')

\# Plot altimeter setting
stationplot.plot_parameter('NE', alt[mask], color='tab:blue',
                           formatter=lambda v: str(int(v * 10))[-3:])

\# Plot station id
stationplot.plot_text((2, 0), stid[mask])

Top


3. Time Series request and plot

  • Let's say we want the past days worth of data...
  • ...for Boulder (i.e. the lat/lon)
  • ...for the variables mean sea level pressure, air temperature, wind direction, and wind_speed

In [ ]:
from datetime import timedelta

# define the time range we are interested in
end_time = datetime(2017, 9, 12, 0)
start_time = end_time - timedelta(days=2)

# build the query
query = ncss.query()
query.lonlat_point(-80.25, 25.8)
query.time_range(start_time, end_time)
query.variables('altimeter_setting', 'temperature', 'dewpoint',
                'wind_direction', 'wind_speed')
query.accept('csv')

Let's get the data!


In [ ]:
data = ncss.get_data(query)

In [ ]:
print(list(data.keys()))

What station did we get?


In [ ]:
station_id = data['station'][0].tostring()
print(station_id)

That indicates that we have a Python bytes object, containing the 0-255 values corresponding to 'K', 'M', 'I', 'A'. We can decode those bytes into a string:


In [ ]:
station_id = station_id.decode('ascii')
print(station_id)

Let's get the time into datetime objects. We see we have an array with byte strings in it, like station id above.


In [ ]:
data['time']

So we can use a list comprehension to turn this into a list of date time objects:


In [ ]:
time = [datetime.strptime(s.decode('ascii'), '%Y-%m-%dT%H:%M:%SZ') for s in data['time']]

Now for the obligatory time series plot...


In [ ]:
from matplotlib.dates import DateFormatter, AutoDateLocator

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(time, data['wind_speed'], color='tab:blue')

ax.set_title(f'Site: {station_id}      Date: {time[0]:%Y/%m/%d}')
ax.set_xlabel('Hour of day')
ax.set_ylabel('Wind Speed')
ax.grid(True)

# Improve on the default ticking
locator = AutoDateLocator()
hoursFmt = DateFormatter('%H')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(hoursFmt)
EXERCISE:
  • Pick a different location
  • Plot temperature and dewpoint together on the same plot

In [ ]:
# Your code goes here

# define the time range we are interested in
end_time = datetime(2017, 9, 12, 0)
start_time = end_time - timedelta(days=2)

\# build the query
query = ncss.query()
query.lonlat_point(-155.1, 19.7)
query.time_range(start_time, end_time)
query.variables('altimeter_setting', 'temperature', 'dewpoint',
                'wind_direction', 'wind_speed')
query.accept('csv')

data = ncss.get_data(query)

station_id = data['station'][0].tostring()
station_id = station_id.decode('ascii')

time = [datetime.strptime(s.decode('ascii'), '%Y-%m-%dT%H:%M:%SZ') for s in data['time']]

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(time, data['temperature'], color='tab:red')
ax.plot(time, data['dewpoint'], color='tab:green')

ax.set_title(f'Site: {station_id}      Date: {time[0]:%Y/%m/%d}')
ax.set_xlabel('Hour of day')
ax.set_ylabel('Temperature/Dewpoint')
ax.grid(True)

\# Improve on the default ticking
locator = AutoDateLocator()
hoursFmt = DateFormatter('%H')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(hoursFmt)

Top