In [1]:

    
from notebook.services.config import ConfigManager
cm = ConfigManager()
cm.update('livereveal', {
              'theme': 'simple',
              'transition': 'convex',
              'start_slideshow_at': 'selected'
})









    Out[1]:





{'start_slideshow_at': 'selected', 'theme': 'simple', 'transition': 'convex'}

Python Analysis for GEOS-Chem

Author: Barron H. Henderson

Attendees will learn how to leverage Python to interact with air pollution-related model and observational data. Air research and application relies on big data. In addition to the challenge presented by data size, researchers must understand a multitude of formats and meta-data standards. For example, CMAQ, CAMx, and GEOS-Chem all use different formats and different meta-data conventions. This tutorial provides format-independent and convention-independent tools.



In [2]:

    
# Prepare my slides
%pylab inline
%cd working









    



Populating the interactive namespace from numpy and matplotlib
/Users/barronh/Downloads/GCandPython/working






    



WARNING: pylab import has clobbered these variables: ['cm']
`%matplotlib` prevents importing * from pylab and numpy

Objectives

There will be hands-on exercises for a range of GEOS-Chem analysis tools, including plotting maps with overlays, converting bpch to netcdf, editing bpch/netcdf files, and evaluating GEOS-Chem against AQSD or ICARTT campaign files.

Produce publication quality graphics
Perform standard model performance evaluations
Create emission perturbations
Add custom modifications to each exercise

Requirements

Some scripting experience (R, Python, Perl, bash, or csh). Attendees who do not have experience can follow my on-line Python-primer (http://www.barronh.com/software/tutorials/python-tutorial) to satisfy the requiremnt.
A computer with either
- Windows, Linux, or Mac; a text editor; and Anaconda 3.5 installed.
- or a computer and an account on wakari.io
See detailed install instructions at https://github.com/barronh/GCandPython/blob/master/PythonInstallation.ipynb

Download Workshop Materials

https://github.com/barronh/GCandPython
"Clone or Download"
"Download Zip"
Move it somewhere good
Unzip it

Fork it.

With Anaconda

Open Anaconda Navigator
Open ipython
Run the code below



In [3]:

    
# ipython
!curl -kLO http://github.com/barronh/GCandPython/archive/master.zip
import zipfile
zf = zipfile.ZipFile('master.zip')
zf.extractall()
%mv GCandPython-master GCandPython









    



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0   124    0     0    229      0 --:--:-- --:--:-- --:--:--   229
100  306k  100  306k    0     0   169k      0  0:00:01  0:00:01 --:--:--  395k
mv: cannot move ‘GCandPython-master’ to ‘GCandPython/GCandPython-master’: Directory not empty

Open Notebook

Intro to python/ipython/jupyter

Options
- python - least burden on system
- ipython - least burden on user
- jupyter - least burden on next user
Running interactively python/ipython
1. Terminal
2. Integrated Development Environments
3. Notebook
Running saved scripts
- from terminal $ python /path/to/file.py
- from python `>>> execfile('/path/to/file.py')
- from ipython In[1] %run /path/to/file.py
- from spyder open in editor

*Windows replace $ with C:\>

Basic Functions

def foo(bar1, bar2, foo = 1, a = 2):
    return 'foo' + bar1

created with def name and call signature followed by colon
defined by commands following signature until the first unindented command

Most Useful Function... ever.

help:

? in ipython or jupyter
it will return a docstring for that element or type

Really Useful Functions

type:

All things in python are objects (even classes).
type is a function object that returns the type of another object.

dir:

List of methods and properties

print:

display representation of object



In [4]:

    
def foo(bar):
    return 'foo.' + bar



In [5]:

    
print(foo('bar'))









    



foo.bar



In [6]:

    
try:
    print(foo(1))
except Exception as e:
    print(e)









    



Can't convert 'int' object to str implicitly



In [ ]:

Basic Types

int, float:

Literals 1 is int, 1. is float
64-bit precision values (i.e., float is real*8)

str:

literal '...' or "..."
multi-line uses triple quotes
indexed (aka subset) with square braces and "slices"
version 3 uses system dependent string types (use encode or decode methods0

*A method is a bound function...

Integers and Floats



In [7]:

    
1 + 1









    Out[7]:





2



In [8]:

    
1 + 1.









    Out[8]:





2.0



In [9]:

    
1 / 2.









    Out[9]:





0.5



In [10]:

    
1 // 2









    Out[10]:





0

strings



In [11]:

    
"""

""".encode()









    Out[11]:





b'\n\n'



In [12]:

    
"a b c".split()









    Out[12]:





['a', 'b', 'c']



In [13]:

    
''.join(['h', 'ow', ' ', 'c', 'an', ' ', 'I ' 'su', 'bse', 't?'])









    Out[13]:





'how can I subset?'

slices

slice([start, ]stop[, step]) or [start]:stop[:step]

slices define the subset in terms of 0-based indices
stop is the only required element.
Missing starts are implied (start: 0, end: len(utterable))

CHECK POINT

How can you rearange the string below to say "this is so fun"?

The final answer will fill in the ? with numbers (positive or negative).

print(teststr[:?]+teststr[?:?:?]+teststr[?:])



In [14]:

    
teststr = 'this is not fun.'
print(teststr[:8])









    



this is

ANSWERS Hidden

``` print(teststr[:7], end = ' ') print(teststr[3:-1:6], end = '') print(teststr[-5:]) ```

Iterable Types

lists : [item1, item2, item3, ...]

created with square braces
indexed with square braces (indexing is always square braces, not just lists) and "slices"
tuples are like lists, but defined with parentheses and the items are immutable.

dict : {key1: value1, key2: value2, ...}

indexed with keys
iterate keys

iterables

strings, lists, dict, and tuples are iterables.
For any iterable, you can loop over the elements with a for loop
- Each character in a string...
- Each item in a list or tuple
- Each key in a dictionary
enumerate is a special function that returns the element number (0-based) and the element as a tuple

CHECK POINT

The code below has three parts:

first cell define a function
subsequent cells call function
What are the types and values of a, b, and c?



In [15]:

    
def foo(pieces):
    out = type(pieces[0])()
    for i, piece in enumerate(pieces):
        out += piece
    return out



In [16]:

    
a = foo('12345')



In [17]:

    
b = foo([1, 2, 3, 4, 5])



In [18]:

    
c = foo({0: 1, 1: 2, 2: 3, 3: 4, 4: 5})



In [19]:

    
print(a, b, c)









    



12345 15 10



In [ ]:

Intro to numpy/scipy/matplotlib

numpy - numerical python (aka, do math fast)
scipy - a collection of scientific tools (aka, do stats or solve equations)
matplotlib - a library for plotting (aka, make figures for my article)

numpy

numpy - Numerical Python (numpy.org)

Fast library for numerical mathematics
Large array processing (element by element)
Large matrix processing (using standard matrix)
Support for masked arrays
LAPACK for linear algebra

Make a random "ozone" array



In [20]:

    
%pylab tk
np.random.seed(50)
ozone = (np.random.normal(size = 2*3*4*5) + 40).reshape(2,3,4,5)









    



Populating the interactive namespace from numpy and matplotlib



In [21]:

    
print(ozone.ndim)



In [22]:

    
print(ozone.shape, ozone[0, :, 2, 3].shape, ozone[0, :, 2, 3])









    



(2, 3, 4, 5) (3,) [ 39.66543482  40.48642763  38.50861149]



In [23]:

    
print(ozone.mean())









    



39.9511584716



In [24]:

    
print(np.percentile(ozone, [5, 95]))









    



[ 38.40819179  41.52552455]

Which dimension is 2 long? 3 long? 4 long? 5 long?

Make an "averaging" kernel matrix



In [25]:

    
averagekernel = np.array([[0.5, 0.35, 0.15],
                          [0.25, 0.5, 0.25],
                          [0.1, 0.4, 0.5]])
averagekernelm = np.matrix(averagekernel)



In [26]:

    
print(averagekernel.shape)
print(averagekernelm.shape)
print(averagekernelm.T.shape)









    



(3, 3)
(3, 3)
(3, 3)



In [27]:

    
averagekernel = np.array([0.25, 0.5, 0.25])
averagekernelm = np.matrix(averagekernel)

Which dimension is the input layer and which is the output layer?

Product sums and matrices?



In [28]:

    
(ozone[0, :, 2, 3] * averagekernel).sum()









    Out[28]:





39.786725390722133

ozone[0, :, 2, 3] * averagekernelm



In [29]:

    
ozone[0, :, 2, 3] * averagekernelm.T









    Out[29]:





matrix([[ 39.78672539]])

CHECK POINT

Change the code to operate on all times, latitude, and longitudes without a loop.



In [30]:

    
out = np.zeros_like(ozone[:, 0])
for t in range(ozone.shape[0]):
    for j in range(ozone.shape[2]):
        for i in range(ozone.shape[3]):
            out[t, j, i] = ozone[t, :, j, i] * averagekernelm.T
out









    Out[30]:





array([[[ 39.36282997,  40.25207302,  39.85299116,  39.17695003,
          40.00616423],
        [ 39.96880295,  39.57453709,  41.21765252,  39.77559737,
          39.72131962],
        [ 40.38088501,  40.87504393,  40.16600716,  39.78672539,
          40.15137381],
        [ 40.71263105,  40.87507997,  40.94645965,  39.67761914,
          40.4431232 ]],

       [[ 40.5726842 ,  38.83362944,  40.06432046,  37.82883717,
          39.79688735],
        [ 40.40327088,  39.9351408 ,  39.70370031,  39.40573444,
          38.03569959],
        [ 39.99168397,  39.91389171,  40.50004645,  40.61334477,
          41.17862441],
        [ 39.41308569,  39.64410936,  39.87867507,  40.35938726,
          39.04594661]]])

ANSWERS Hidden

newout = (ozone * averagekernel[None, :, None, None]) (out == out).all()

Hint: Add singleton dimensions to force broadcasting.

scipy

scipy - Scientific Python (scipy.org)

Good for physical constants
Large statistical library
Large interpolation library
Including masked array statistics
Ordinary Differential Equation Solving



In [31]:

    
from scipy import constants
?constants



In [32]:

    
from scipy import stats
?stats



In [33]:

    
from scipy.stats import mstats
?mstats

XKCD, Jelly beans, and Type I Error?



In [34]:

    
%%timeit -n 20

from scipy.stats import ttest_ind, mannwhitneyu
mdifferent = []
tdifferent = []
n = 100
for i in range(n):
    a = np.exp(np.random.normal(size = 20))
    b = np.exp(np.random.normal(size = 20))
    tresult = ttest_ind(a, b)
    mresult = mannwhitneyu(a, b)
    if tresult.pvalue < 0.05:
        tdifferent.append(i);
    if mresult.pvalue < 0.05:
        mdifferent.append(i);

        
print(len(mdifferent)/n, end = '/')
print(len(tdifferent)/n, end = ', ')









    



0.07/0.01, 0.1/0.04, 0.12/0.05, 0.11/0.02, 0.13/0.06, 0.15/0.11, 0.08/0.04, 0.05/0.02, 0.11/0.06, 0.08/0.04, 0.06/0.02, 0.1/0.03, 0.09/0.05, 0.09/0.07, 0.08/0.06, 0.13/0.04, 0.12/0.04, 0.06/0.01, 0.09/0.03, 0.14/0.05, 0.06/0.05, 0.09/0.04, 0.04/0.03, 0.1/0.05, 0.12/0.04, 0.14/0.04, 0.1/0.04, 0.13/0.06, 0.07/0.03, 0.07/0.01, 0.12/0.07, 0.09/0.02, 0.09/0.06, 0.14/0.05, 0.09/0.03, 0.08/0.02, 0.13/0.05, 0.06/0.04, 0.06/0.06, 0.11/0.02, 0.16/0.03, 0.14/0.02, 0.09/0.05, 0.06/0.06, 0.08/0.01, 0.1/0.02, 0.11/0.05, 0.06/0.04, 0.15/0.06, 0.07/0.04, 0.1/0.05, 0.12/0.06, 0.1/0.02, 0.1/0.07, 0.07/0.05, 0.15/0.05, 0.11/0.04, 0.07/0.01, 0.09/0.03, 0.09/0.07, 20 loops, best of 3: 42.7 ms per loop



In [35]:

    
np.array([0.06, 0.06, 0.01, 0.04, 0.05, 0.02, 0.06, 0.07, 0.02, 0.07, 0.04, 0.05, 0.03, 0.08, 0.04, 0.03, 0.03, 0.02, 0.04, 0.01, 0.06, 0.04, 0.05, 0.01, 0.03, 0.04, 0.02, 0.04, 0.03, 0.02, 0.04, 0.03, 0.06, 0.03, 0.02, 0.07, 0.05, 0.02, 0.01, 0.05, 0.03, 0.04, 0.04, 0.03, 0.03, 0.04, 0.04, 0.06, 0.03, 0.05, 0.05, 0.03, 0.02, 0.0,\
          0.04, 0.05, 0.02, 0.07, 0.05, 0.05]).mean()









    Out[35]:





0.038666666666666655

matplotlib

matplotlib (matplotlib.org; optionally with basemap)

Plotting library with matlab functionality and naming
2-D plotting
- line plots
- date plots
- tile plots
- contour plots
3-D plotting
- see functionality above.
- Sometimes harder to work with than 2-D.



In [ ]: