In [1]:
from notebook.services.config import ConfigManager
cm = ConfigManager()
cm.update('livereveal', {
              'theme': 'simple',
              'transition': 'convex',
              'start_slideshow_at': 'selected'
})


Out[1]:
{'start_slideshow_at': 'selected', 'theme': 'simple', 'transition': 'convex'}

Python Analysis for GEOS-Chem

Author: Barron H. Henderson

Attendees will learn how to leverage Python to interact with air pollution-related model and observational data. Air research and application relies on big data. In addition to the challenge presented by data size, researchers must understand a multitude of formats and meta-data standards. For example, CMAQ, CAMx, and GEOS-Chem all use different formats and different meta-data conventions. This tutorial provides format-independent and convention-independent tools.


In [2]:
# Prepare my slides
%pylab inline
%cd working


Populating the interactive namespace from numpy and matplotlib
/Users/barronh/Downloads/GCandPython/working
WARNING: pylab import has clobbered these variables: ['cm']
`%matplotlib` prevents importing * from pylab and numpy

Objectives

There will be hands-on exercises for a range of GEOS-Chem analysis tools, including plotting maps with overlays, converting bpch to netcdf, editing bpch/netcdf files, and evaluating GEOS-Chem against AQSD or ICARTT campaign files.

  1. Produce publication quality graphics
  2. Perform standard model performance evaluations
  3. Create emission perturbations
  4. Add custom modifications to each exercise

Requirements

Download Workshop Materials

  1. https://github.com/barronh/GCandPython
  2. "Clone or Download"
  3. "Download Zip"
  4. Move it somewhere good
  5. Unzip it
  • Fork it.

With Anaconda

  1. Open Anaconda Navigator
  2. Open ipython
  3. Run the code below

In [3]:
# ipython
!curl -kLO http://github.com/barronh/GCandPython/archive/master.zip
import zipfile
zf = zipfile.ZipFile('master.zip')
zf.extractall()
%mv GCandPython-master GCandPython


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0   124    0     0    229      0 --:--:-- --:--:-- --:--:--   229
100  306k  100  306k    0     0   169k      0  0:00:01  0:00:01 --:--:--  395k
mv: cannot move ‘GCandPython-master’ to ‘GCandPython/GCandPython-master’: Directory not empty

Open Notebook

Intro to python/ipython/jupyter

  1. Options
    • python - least burden on system
    • ipython - least burden on user
    • jupyter - least burden on next user
  2. Running interactively python/ipython
    1. Terminal
    2. Integrated Development Environments
    3. Notebook
  3. Running saved scripts
    • from terminal $ python /path/to/file.py
    • from python `>>> execfile('/path/to/file.py')
    • from ipython In[1] %run /path/to/file.py
    • from spyder open in editor

*Windows replace $ with C:\>

Basic Functions

def foo(bar1, bar2, foo = 1, a = 2):
    return 'foo' + bar1
  • created with def name and call signature followed by colon
  • defined by commands following signature until the first unindented command

Most Useful Function... ever.

help:

  • ? in ipython or jupyter
  • it will return a docstring for that element or type

Really Useful Functions

type:

  • All things in python are objects (even classes).
  • type is a function object that returns the type of another object.

dir:

  • List of methods and properties

print:

  • display representation of object

In [4]:
def foo(bar):
    return 'foo.' + bar

In [5]:
print(foo('bar'))


foo.bar

In [6]:
try:
    print(foo(1))
except Exception as e:
    print(e)


Can't convert 'int' object to str implicitly

In [ ]:

Basic Types

int, float:

  • Literals 1 is int, 1. is float
  • 64-bit precision values (i.e., float is real*8)

str:

  • literal '...' or "..."
  • multi-line uses triple quotes
  • indexed (aka subset) with square braces and "slices"
  • version 3 uses system dependent string types (use encode or decode methods0

*A method is a bound function...

Integers and Floats


In [7]:
1 + 1


Out[7]:
2

In [8]:
1 + 1.


Out[8]:
2.0

In [9]:
1 / 2.


Out[9]:
0.5

In [10]:
1 // 2


Out[10]:
0

strings


In [11]:
"""

""".encode()


Out[11]:
b'\n\n'

In [12]:
"a b c".split()


Out[12]:
['a', 'b', 'c']

In [13]:
''.join(['h', 'ow', ' ', 'c', 'an', ' ', 'I ' 'su', 'bse', 't?'])


Out[13]:
'how can I subset?'

slices

slice([start, ]stop[, step]) or [start]:stop[:step]

  • slices define the subset in terms of 0-based indices
  • stop is the only required element.
  • Missing starts are implied (start: 0, end: len(utterable))

CHECK POINT

How can you rearange the string below to say "this is so fun"?

The final answer will fill in the ? with numbers (positive or negative).

print(teststr[:?]+teststr[?:?:?]+teststr[?:])

In [14]:
teststr = 'this is not fun.'
print(teststr[:8])


this is 

ANSWERS Hidden

``` print(teststr[:7], end = ' ') print(teststr[3:-1:6], end = '') print(teststr[-5:]) ```

Iterable Types

lists : [item1, item2, item3, ...]

  • created with square braces
  • indexed with square braces (indexing is always square braces, not just lists) and "slices"
  • tuples are like lists, but defined with parentheses and the items are immutable.

dict : {key1: value1, key2: value2, ...}

  • indexed with keys
  • iterate keys

iterables

  • strings, lists, dict, and tuples are iterables.
  • For any iterable, you can loop over the elements with a for loop
    • Each character in a string...
    • Each item in a list or tuple
    • Each key in a dictionary
  • enumerate is a special function that returns the element number (0-based) and the element as a tuple

CHECK POINT

The code below has three parts:

  • first cell define a function
  • subsequent cells call function
  • What are the types and values of a, b, and c?

In [15]:
def foo(pieces):
    out = type(pieces[0])()
    for i, piece in enumerate(pieces):
        out += piece
    return out

In [16]:
a = foo('12345')

In [17]:
b = foo([1, 2, 3, 4, 5])

In [18]:
c = foo({0: 1, 1: 2, 2: 3, 3: 4, 4: 5})

In [19]:
print(a, b, c)


12345 15 10

In [ ]:

Intro to numpy/scipy/matplotlib

  1. numpy - numerical python (aka, do math fast)
  2. scipy - a collection of scientific tools (aka, do stats or solve equations)
  3. matplotlib - a library for plotting (aka, make figures for my article)

numpy

numpy - Numerical Python (numpy.org)

  • Fast library for numerical mathematics
  • Large array processing (element by element)
  • Large matrix processing (using standard matrix)
  • Support for masked arrays
  • LAPACK for linear algebra

Make a random "ozone" array


In [20]:
%pylab tk
np.random.seed(50)
ozone = (np.random.normal(size = 2*3*4*5) + 40).reshape(2,3,4,5)


Populating the interactive namespace from numpy and matplotlib

In [21]:
print(ozone.ndim)


4

In [22]:
print(ozone.shape, ozone[0, :, 2, 3].shape, ozone[0, :, 2, 3])


(2, 3, 4, 5) (3,) [ 39.66543482  40.48642763  38.50861149]

In [23]:
print(ozone.mean())


39.9511584716

In [24]:
print(np.percentile(ozone, [5, 95]))


[ 38.40819179  41.52552455]
  • Which dimension is 2 long? 3 long? 4 long? 5 long?

Make an "averaging" kernel matrix


In [25]:
averagekernel = np.array([[0.5, 0.35, 0.15],
                          [0.25, 0.5, 0.25],
                          [0.1, 0.4, 0.5]])
averagekernelm = np.matrix(averagekernel)

In [26]:
print(averagekernel.shape)
print(averagekernelm.shape)
print(averagekernelm.T.shape)


(3, 3)
(3, 3)
(3, 3)

In [27]:
averagekernel = np.array([0.25, 0.5, 0.25])
averagekernelm = np.matrix(averagekernel)
  • Which dimension is the input layer and which is the output layer?

Product sums and matrices?


In [28]:
(ozone[0, :, 2, 3] * averagekernel).sum()


Out[28]:
39.786725390722133
ozone[0, :, 2, 3] * averagekernelm

In [29]:
ozone[0, :, 2, 3] * averagekernelm.T


Out[29]:
matrix([[ 39.78672539]])

CHECK POINT

Change the code to operate on all times, latitude, and longitudes without a loop.


In [30]:
out = np.zeros_like(ozone[:, 0])
for t in range(ozone.shape[0]):
    for j in range(ozone.shape[2]):
        for i in range(ozone.shape[3]):
            out[t, j, i] = ozone[t, :, j, i] * averagekernelm.T
out


Out[30]:
array([[[ 39.36282997,  40.25207302,  39.85299116,  39.17695003,
          40.00616423],
        [ 39.96880295,  39.57453709,  41.21765252,  39.77559737,
          39.72131962],
        [ 40.38088501,  40.87504393,  40.16600716,  39.78672539,
          40.15137381],
        [ 40.71263105,  40.87507997,  40.94645965,  39.67761914,
          40.4431232 ]],

       [[ 40.5726842 ,  38.83362944,  40.06432046,  37.82883717,
          39.79688735],
        [ 40.40327088,  39.9351408 ,  39.70370031,  39.40573444,
          38.03569959],
        [ 39.99168397,  39.91389171,  40.50004645,  40.61334477,
          41.17862441],
        [ 39.41308569,  39.64410936,  39.87867507,  40.35938726,
          39.04594661]]])

ANSWERS Hidden

newout = (ozone * averagekernel[None, :, None, None]) (out == out).all()
  • Hint: Add singleton dimensions to force broadcasting.

scipy

scipy - Scientific Python (scipy.org)

  • Good for physical constants
  • Large statistical library
  • Large interpolation library
  • Including masked array statistics
  • Ordinary Differential Equation Solving

In [31]:
from scipy import constants
?constants

In [32]:
from scipy import stats
?stats

In [33]:
from scipy.stats import mstats
?mstats

XKCD, Jelly beans, and Type I Error?


In [34]:
%%timeit -n 20

from scipy.stats import ttest_ind, mannwhitneyu
mdifferent = []
tdifferent = []
n = 100
for i in range(n):
    a = np.exp(np.random.normal(size = 20))
    b = np.exp(np.random.normal(size = 20))
    tresult = ttest_ind(a, b)
    mresult = mannwhitneyu(a, b)
    if tresult.pvalue < 0.05:
        tdifferent.append(i);
    if mresult.pvalue < 0.05:
        mdifferent.append(i);

        
print(len(mdifferent)/n, end = '/')
print(len(tdifferent)/n, end = ', ')


0.07/0.01, 0.1/0.04, 0.12/0.05, 0.11/0.02, 0.13/0.06, 0.15/0.11, 0.08/0.04, 0.05/0.02, 0.11/0.06, 0.08/0.04, 0.06/0.02, 0.1/0.03, 0.09/0.05, 0.09/0.07, 0.08/0.06, 0.13/0.04, 0.12/0.04, 0.06/0.01, 0.09/0.03, 0.14/0.05, 0.06/0.05, 0.09/0.04, 0.04/0.03, 0.1/0.05, 0.12/0.04, 0.14/0.04, 0.1/0.04, 0.13/0.06, 0.07/0.03, 0.07/0.01, 0.12/0.07, 0.09/0.02, 0.09/0.06, 0.14/0.05, 0.09/0.03, 0.08/0.02, 0.13/0.05, 0.06/0.04, 0.06/0.06, 0.11/0.02, 0.16/0.03, 0.14/0.02, 0.09/0.05, 0.06/0.06, 0.08/0.01, 0.1/0.02, 0.11/0.05, 0.06/0.04, 0.15/0.06, 0.07/0.04, 0.1/0.05, 0.12/0.06, 0.1/0.02, 0.1/0.07, 0.07/0.05, 0.15/0.05, 0.11/0.04, 0.07/0.01, 0.09/0.03, 0.09/0.07, 20 loops, best of 3: 42.7 ms per loop

In [35]:
np.array([0.06, 0.06, 0.01, 0.04, 0.05, 0.02, 0.06, 0.07, 0.02, 0.07, 0.04, 0.05, 0.03, 0.08, 0.04, 0.03, 0.03, 0.02, 0.04, 0.01, 0.06, 0.04, 0.05, 0.01, 0.03, 0.04, 0.02, 0.04, 0.03, 0.02, 0.04, 0.03, 0.06, 0.03, 0.02, 0.07, 0.05, 0.02, 0.01, 0.05, 0.03, 0.04, 0.04, 0.03, 0.03, 0.04, 0.04, 0.06, 0.03, 0.05, 0.05, 0.03, 0.02, 0.0,\
          0.04, 0.05, 0.02, 0.07, 0.05, 0.05]).mean()


Out[35]:
0.038666666666666655

matplotlib

matplotlib (matplotlib.org; optionally with basemap)

  • Plotting library with matlab functionality and naming
  • 2-D plotting
    • line plots
    • date plots
    • tile plots
    • contour plots
  • 3-D plotting
    • see functionality above.
    • Sometimes harder to work with than 2-D.

In [ ]: