Planet OS API demo for GEFS

Note: this notebook requires python3.

This notebook is an introduction to the PlanetOS API data format using the GFS Global Forecast dataset.

API documentation is available at http://docs.planetos.com.

If you have questions or comments, join the Planet OS Slack community to chat with our development team.

For general information on usage of IPython/Jupyter and Matplotlib, please refer to their corresponding documentation. https://ipython.org/ and http://matplotlib.org/

GEFS global probabilistic weather forecast

GEFS is a probabilistic weather forecast system, which is composed of 20 model ensemble members, which differ by small fluctuations in model initial conditions. Probabilistic forecasts try to mimic the natural chaotic nature of the atmosphere and usually have higher forecast skill than deterministic weather forecast, after third day or so. However, their interpretation is not trivial and with this demo we encourage users to have deeper look into this kind of data.

In this tutorial we analyse precipitation and surface pressure data.


In [15]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import dateutil.parser
import datetime
from urllib.request import urlopen, Request
import simplejson as json
import pandas as pd

In [16]:
def extract_reference_time(API_data_loc):
    """Find reference time that corresponds to most complete forecast. Should be the earliest value."""
    reftimes = set()
    for i in API_data_loc['entries']:
        reftimes.update([i['axes']['reftime']])
    reftimes=list(reftimes)
    if len(reftimes)>1:
        reftime = reftimes[0] if dateutil.parser.parse(reftimes[0])<dateutil.parser.parse(reftimes[1]) else reftimes[1]
    else:
        reftime = reftimes[0]
    return reftime

In [17]:
#latitude = 21.205
#longitude = -158.35
latitude = 58
longitude = 26
apikey = open('APIKEY').read().strip()
num_ens = 10
prec_var = "Total_precipitation_surface_6_Hour_Accumulation_ens"
pres_var = "Pressure_surface_ens"

GEFS is a model with lots of output variables, which may also change depending of which particular output file you are checking. Analyse the metadata first, filter for variables we may be interested in and limit the API request.

Warning: if requesting too many variables, you may get gateway timeout error. If this happens, try to specify only one context or variable.


In [18]:
API_meta_url = "http://api.planetos.com/v1/datasets/noaa-ncep_gefs?apikey={}".format(apikey)
request = Request(API_meta_url)
response = urlopen(request)
API_meta = json.loads(response.read())
print(API_meta_url)


http://api.planetos.com/v1/datasets/noaa-ncep_gefs?apikey=8428878e4b944abeb84790e832c633fc

Filter by parameter name, in this example we wan't to find pressure at surface.


In [19]:
[i['name'] for i in API_meta['Variables'] if 'pressure' in i['name'].lower() and 'surface' in i['name'].lower()]


Out[19]:
['Pressure_surface_ens']

API request for precipitation


In [20]:
API_url = "http://api.planetos.com/v1/datasets/noaa-ncep_gefs/point?lon={0}&lat={1}&count=2000&verbose=false&apikey={2}&var={3}".format(longitude,latitude,apikey,prec_var)
request = Request(API_url)
response = urlopen(request)
API_data_prec = json.loads(response.read())
print(API_url)


http://api.planetos.com/v1/datasets/noaa-ncep_gefs/point?lon=26&lat=58&count=2000&verbose=false&apikey=8428878e4b944abeb84790e832c633fc&var=Total_precipitation_surface_6_Hour_Accumulation_ens

API request for surface pressure


In [21]:
API_url = "http://api.planetos.com/v1/datasets/noaa-ncep_gefs/point?lon={0}&lat={1}&count=2000&verbose=false&apikey={2}&var={3}".format(longitude,latitude,apikey,pres_var)
request = Request(API_url)
response = urlopen(request)
API_data_pres = json.loads(response.read())
print(API_url)


http://api.planetos.com/v1/datasets/noaa-ncep_gefs/point?lon=26&lat=58&count=2000&verbose=false&apikey=8428878e4b944abeb84790e832c633fc&var=Pressure_surface_ens

Read data from JSON responce and convert to numpy array for easier plotting


In [22]:
## first collect data to dictionaries, then convert to Pandas DataFrame
pres_data_dict = {}
pres_time_dict = {}
prec_data_dict = {}
prec_time_dict = {}

for i in range(0, num_ens):
    pres_data_dict[i] = []
    pres_time_dict[i] = []
    prec_data_dict[i] = []
    prec_time_dict[i] = []
    
for i in API_data_pres['entries']:
    reftime = extract_reference_time(API_data_pres)
    if reftime == i['axes']['reftime']:
        ## print("reftest", int(i['axes']['ens']))
        pres_data_dict[int(i['axes']['ens'])].append(i['data'][pres_var])
        pres_time_dict[int(i['axes']['ens'])].append(dateutil.parser.parse(i['axes']['time']))
        
for i in API_data_prec['entries']:
    reftime = extract_reference_time(API_data_prec)
    if reftime == i['axes']['reftime']:
        prec_data_dict[int(i['axes']['ens'])].append(i['data'][prec_var])
        prec_time_dict[int(i['axes']['ens'])].append(dateutil.parser.parse(i['axes']['time']))

## check if time scales are equal?!
for i in range(2,num_ens):
    ##print(i, np.array(pres_time_dict[1]).shape, np.array(pres_time_dict[i]).shape)
    if np.amax(np.array(pres_time_dict[1])-np.array(pres_time_dict[i])) != datetime.timedelta(0):
        print('timeproblem',np.amax(np.array(pres_time_dict[1])-np.array(pres_time_dict[i])))

In [23]:
pres_pd = pd.DataFrame(pres_data_dict)
prec_pd = pd.DataFrame(prec_data_dict)

In [24]:
prec_pd


Out[24]:
0 1 2 3 4 5 6 7 8 9
0 0.00 0.00 0.00 0.04 0.01 0.00 0.00 0.00 0.00 0.00
1 0.13 0.43 0.16 0.10 0.16 0.28 0.03 0.06 0.00 0.20
2 0.16 0.23 0.05 0.29 0.00 0.03 0.60 0.00 0.11 0.27
3 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00 0.04
4 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00
5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
6 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00
7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10 0.00 0.13 0.00 0.01 0.00 0.45 0.00 0.00 0.03 0.00
11 0.00 0.00 0.00 0.00 0.00 0.05 0.03 0.00 0.00 0.00
12 0.00 0.16 0.00 0.00 0.00 0.00 0.30 0.00 0.00 0.10
13 0.74 0.53 0.20 2.80 0.00 0.50 1.40 0.70 0.10 0.30
14 1.60 0.20 1.70 2.70 0.00 0.30 1.20 1.18 1.30 0.70
15 0.40 0.00 0.30 0.30 0.00 0.10 0.00 0.00 0.10 0.40

Precipitation plots

Let's first plot boxplots of ensamble members, showing 6h precipitation and accumulated precipitation.


In [25]:
fig, (ax0, ax2) = plt.subplots(nrows=2,figsize=(20,12))
ax0.boxplot(prec_pd)
ax0.grid()
ax0.set_title("Simple ensamble distribution")
ax0.set_ylabel('Precipitation mm/6h')

ax2.boxplot(np.cumsum(prec_pd,axis=0))
ax2.grid()
ax2.set_title("Cumulative precipitation distribution")
ax2.set_ylabel('Precipitation mm/6h')
ax2.set_xlabel('Forecast steps (each is 6h)')


Out[25]:
Text(0.5,0,'Forecast steps (each is 6h)')

From simple distribution it is immediately visible that ensamble members may have very different values at particular time. Interpretation of this is highly dependent on physical quantity: for precipitation this may reflect changes in actual weather pattern or just small changes in timing of the precipitation event. To get rid of the latter, we use the accumulated precipitation. From this plot it is more evident (depends on particular forecast of course), that variability is smaller. For longer forecasts it may be more reasonable to check only 24h accumulated precipitation.

Surface pressure plots

Surface pressure variation is better descriptor for actual uncertainty than precipitation


In [26]:
fig=plt.figure(figsize=(20,10))
plt.boxplot(pres_pd)
plt.grid()
plt.title('Ensamble distribution')
plt.ylabel('Pressure Pa')
plt.xlabel('Forecast steps (each is 6h)')


Out[26]:
Text(0.5,0,'Forecast steps (each is 6h)')

In [27]:
fig=plt.figure(figsize=(20,10))
plt.plot(pres_pd)
plt.grid()
plt.ylabel('Pressure Pa')
plt.xlabel('Forecast steps (each is 6h)')


Out[27]:
Text(0.5,0,'Forecast steps (each is 6h)')

In [ ]: