OpenNEX DCP30 Analysis Using Pandas

This notebook illustrates how to analyze a subset of OpenNEX DCP30 data using Python and pandas. Specifically, we will be analyzing temperature data in the Chicago area to understand how the CESM1-CAM5 climate model behaves under different RCP scenarios during the course of this century.

A dataset for this example is available at http://opennex.planetos.com/dcp30/k6Lef. On that page you will find a bash script that can be used to deploy a Docker container which will serve the selected data. Deployment of the container is beyond the scope of this example.


Import Required Modules

Let's begin by importing the required modules. We'll need pandas for analysis and urllib2 to request data from our access server. We'll use matplotlib to create a chart of our analysis.


In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

# set default figure size
from pylab import rcParams
rcParams['figure.figsize'] = 16, 8

import pandas as pd
import urllib2

Loading Data into a Dataframe

The load_data function reads data directly from your access server's endpoint. It accepts the ip_addr parameter, which must correspond to the IP address of your data access server.

For local deployments, this may be localhost or a local IP address. If you've deployed into an EC2 instance, you'll need to ensure the port is accessible and replace localhost with your instance's public IP address.

It's easier to work with the resulting data if we tell pandas about the date and categorical columns. The function declares these column types, and also converts the temperature from degrees Kelvin to degrees Celsius.


In [3]:
def load_data(ip_addr):
    data = pd.read_csv(urllib2.urlopen("http://%s:7645/data.csv" % (ip_addr)))
    for col in ['Model', 'Scenario', 'Variable']:
        data[col] = data[col].astype('category')
    data['Date'] = data['Date'].astype('datetime64')
    data['Temperature'] = data['Value'] - 273.15
    return data

Plotting the Scenarios

After loading that data, we can use matplotlib to visualize what the model predicts over the course of this century. This function reduces the data to show the warmest month for each year and displays the values under each RCP scenario.


In [4]:
def do_graph(df):
    model = df.loc[1,'Model']
    df['Year'] = df['Date'].map(lambda d: "%d-01-01" % (d.year)).astype('datetime64')
    by_year = df.groupby(['Year', 'Scenario']).max().loc[:,['Temperature']]
    groups = by_year.reset_index().set_index('Year').groupby('Scenario')
    for key, grp in groups:
        plt.plot(grp.index, grp['Temperature'], label=key)
    plt.legend(loc='best')
    plt.title("Maximum mean temperature for warmest month using model %s" % (model))
    plt.xlabel("Year")
    plt.ylabel("Temperature [Celsius]")
    plt.show()

Putting it all Together

Let's load the data, quickly inspect it using the head method, then use do_graph to visualize it.


In [5]:
# Note: make sure you pass load_data the correct IP address. This is only an example.
data = load_data("localhost")

In [6]:
data.head()


Out[6]:
Date Longitude Latitude Model Scenario Variable Value Temperature
0 2000-01-01 -87.6458 41.8708 CESM1-CAM5 historical tasmax 273.052246 -0.097754
1 2000-01-01 -87.6375 41.8708 CESM1-CAM5 historical tasmax 273.061737 -0.088263
2 2000-01-01 -87.6292 41.8708 CESM1-CAM5 historical tasmax 273.059784 -0.090216
3 2000-01-01 -87.6208 41.8708 CESM1-CAM5 historical tasmax 273.168396 0.018396
4 2000-01-01 -87.6125 41.8708 CESM1-CAM5 historical tasmax 273.170197 0.020197

In [7]:
do_graph(data)


Results

The plot above begins with a brief historical period at the start of the century, then presents data from the four RCP scenarios. We can see annual fluxuations as well as a clear divergence towards the end of the century. As expected, the most aggressive warming scenario, rcp85, produces the warmest temperatures.