Coding outside of Jupyter notebooks

To be able to run Python on your own computer, I recommend installing Anaconda which contains basic packages for you to be up and running.

While you are downloading things, also try the text editor Atom.

We have used Jupyter notebooks in this class as a useful tool for integrating text and interactive, usable code. However, many people when real-life coding would not use Jupyter notebooks but would instead type code in text files and run them via a terminal window or in iPython. Many of you have done this before, perhaps analogously in Matlab by writing your code in a .m file and then running it in the Matlab GUI. Writing code in a separate file allows more heavy computations as well as allowing that code to be reused more easily than when it is written in a Jupyter notebook.

Later, we will demonstrate typing Python code in a .py file and then running it in an iPython window. First, a few of the other options...

Google's Colaboratory

Google recently announced a partnership with Jupyter and have put out the Jupyter Notebook in their . You can use it just like our in-class notebooks, share them through Google Drive, and even install packages. This may be the way of the future of teaching Python.

Jupyter itself

The Jupyter project is becoming more and more sophisticated. The next project coming out is called JupyterLab and aims to more cleanly integrate modules that are already available in the server setup we have been using: notebooks, terminals, text editor, file hierarchy, etc. You can see a pre-alpha release image of it below:

MATLAB-like GUIs

Two options for using Python in a clickable, graphical user interface are Spyder and Canopy. Spyder is open source and Canopy is not, though the company that puts Canopy together (Enthought) does make a version available for free.

Both are shown below. They are generally similar from the perspective of what we've been using so far. They have a console for getting information when you run your code (equivalent to running a cell in your notebook), typing in your file, getting more information about code, having nice code syntax coloring, maybe being able to examine Python objects. Note that many of these features are being integrated in less formal GUI tools like this — you'll see even in the terminal window in iPython you have access to many nice features.

Using iPython in a terminal window

Here we have code in our notebook:


In [5]:
import numpy as np
import matplotlib.pyplot as plt
# just for jupyter notebooks
%matplotlib inline

In [6]:
x = np.linspace(0, 10)
y = x**2

In [7]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, 'k', lw=3)


Out[7]:
[<matplotlib.lines.Line2D at 0x7fed1c38f240>]

Now let's switch to a terminal window and a text file...

Open iPython

Get Anaconda downloaded and opened up on your machine if you want. Open a terminal window, or use the one that Anaconda opens, and type:

ipython

Or, you can use redfish. On redfish:

Go to the home menu on redfish, on the right-hand-side under "New", choose "Terminal" to open a terminal window that is running on redfish. To run Python 3 in this terminal window, you'll need to use the command ipython3 instead of ipython, due to the way the alias to the program is set up:

ipython3

Note that we will use this syntax to mean that it is something to be typed in your terminal window/command prompt (or it is part of an exercise). To open ipython with some niceties added so that you don't have to import them by hand each time you open the program (numpy and matplotlib in particular), open it with

ipython --pylab

Once you have done this, you'll see a series of text lines indicating that you are now in the program ipython. You can now type code as if you were in a code window in a Jupyter notebook (but without an ability to integrate text easily).


Exercise

Copy in the code to define x and y from above, then make the figure. If you haven't opened ipython with the option flag --pylab, you will need to still do the import statements, but not %matplotlib inline since that is only for notebooks.

Notice how the figure appears as a separate window. Play with the figure window — you can change the size and properties of the plot using the GUI buttons, and you can zoom.


Text editor

A typical coding set up is to have a terminal window with ipython running alongside a text window where you type your code. You can then go back and forth, trying things out in iPython, and keeping what works in the text window so that you finish with a working script, which can be run independently in the future. This is, of course, what you've been doing when you use Jupyter notebooks, except everything is combined into one place in that sort of setup. If you are familiar with Matlab, this is what you are used to when you have your Matlab window with your *.m text window alongside a "terminal window" where you can type things. (There is also a variable viewer included in Matlab.) This is also what you can do in a single program with Jupyterlab.

A good text editor should be able to highlight syntax – that is, use color and font style to differentiate between special key words and types for a given programming language. This is what has been happening in our Jupyter notebooks, when strings are colored red, for example. The editors will also key off typical behaviors in the language to try to be helpful, such as when in a Jupyter notebook if you write an if statement with a colon at the end and push enter, the next line will be automatically indented so that you can just start typing. These behaviors can be adjusted by changing user settings.

Some options are TextMate for Mac, which costs money, and Sublime Text which works on all operating systems and also costs money (after a free trial). For this class, we recommend using Atom, which is free, works across operating systems, and integrates with GitHub since they wrote it.

So, go download Atom and start using it, unless you have a preferred alternative you want to use.


Exercise: run a script

If you are running python locally on your machine with Anaconda, copy and paste the code from above into a new text file in your text editor. Save it, then run the file in ipython with

run [filename]

If you are sticking with redfish, you can type out text in a text file from the home window (under New), and you can get most but not all functionality this way. Or you can try one of the GUIs.


Package managing

The advantage of Anaconda is being able to really easily add packages, with

conda install [packagename]

This will look for the Python package you want in the places known to conda. You may also tell it to look in another channel, which other people and groups can maintain. For example, cartopy is available through the scitools channel:

conda install -c scitools cartopy

Sometimes, it is better or necessary to use pip to install packages, which links to the PyPI collections of packages that anyone can place there for other people to use. For example, you can get the cmocean colormaps package from PyPI with

pip install cmocean

Running Jupyter notebooks on your own server

We've been running our notebooks on a TAMU server all semester. You can do this on your own machine pretty easily once you have Anaconda. There should be a place for you to double-click for it to open, or you can open a terminal window and type:

jupyter notebook

This opens a window that should look familiar in your browser window. The difference is that instead of connecting to a remote server (on redfish), you are connecting to a local server on your own machine that you started running with the jupyter notebook command.

Run a script

When you use the command run in iPython, parts of the code in that file are implemented as if they were written in the iPython window itself. Code that is outside of a function call will run: at 0 indentation level (import statements and maybe some variables definitions are common), but not any functions, though it will read the functions into your local variables so that they can be used. Code inside the line if __name__ == '__main__': will also be run. This syntax is available so that default run commands can be built into your script. This is often used to provide example or test code that can be easily run.

Note anytime you are accessing a saved file from iPython, you need to have at least one of the following be true:

  • is in the same directory in your terminal window as the file;
  • are referencing the file with either its full path or a relative path;
  • have the path to your file be appended to the environmental variable PYTHONPATH.

Exercise

There is example code available to try at https://github.com/kthyng/python4geosciences/blob/master/data/optimal_interpolation.py. Copy this code into your text editor and save it to the same location on your computer as your iPython window is open.

Within your iPython window, run optimal_interpolation.py with run optimal_interpolation (if you saved it in the same directory as your iPython window). Which part of the code actually runs? Why?

Add some print statements into the script at 0 indentation as well as below the line if __name__ == '__main__': and see what comes through when you run the code. Can you access the class oi_1d?


Importing your own code

The point of importing code is to be able to then use the functions and classes that you've written in other files. Importing your own code is just like importing numpy or any other package. You use the same syntax and you have the same ability to query the built-in methods.

import numpy

or:

import [your_code]

When you import a package, any code at the 0 indentation level will run; however, the code within if __name__ == '__main__': will not run.

When you are using a single script for analysis, you may just use run to use your code. However, as you build up complexity in your work, you'll probably want to make separate, independent code bases that can be called from subsequent code by importing them (again, just like us using the capabilities in numpy).

When you import a package, a *.pyc file is created which holds compiled code that is subsequently read when the package is again imported, in order to save time. When you are in a single session in iPython, that *.pyc will be used and not updated. If you have changed the code and want it to be updated, you either need to exit iPython and reopen it, or you need to reload the package. These is different syntax for this depending on the version of Python you are using, but we are using Python 3 (>3.4) so we will do the following:

For >= Python3.4:

import importlib
importlib.reload([code to reload])

Exercise

Import optimal_interpolation.py. Add a print statement with 0 indentation level in the code. Import the package again. Does the print statement run? Reload the package. How about now?

What about if you run it instead of importing it?

Exercise

Write your own simple script with a function in it — your function should take at least one input (maybe several numbers) and return something (maybe a number that is the result of some calculation).

Now, use your code in several ways. Run the code in ipython with

run [filename]

Make sure you have a name definition for this (if __name__ == '__main__':, etc). Now import the code and use it:

import [filename]

Add a docstring to the top of the file and reload your package, then query the code. Do you see the docstring? Add a docstring to the function in the file, reload, and query. Do you see the docstring?

You should have been able to run your code both ways: running it directly, and importing it, then using a function that is within the code.


Unit testing

The idea of unit testing is to develop tests for your code as you develop it, so that you automatically know if it is working properly as you make changes. In fact, some coders prefer to write the unit tests first to drive the proper development of their code and to know when it is working. Of course, the quality of the testing is made up of the tests you include and aspects of your code that you test.

Here are some unit test guidelines:

  • Generally, you want to write unit tests that test one small aspect of your code functionality, as separately as possible from other parts of it. Then write many of these tests to cover all aspects of functionality.
  • Make sure your unit tests run very quickly since you may end up with many of them
  • Always run the full test suite before a coding session, and run it again after. This will give you more confidence that you did not break anything in the rest of the code.
  • You can now run a program like Travis CI through GitHub which runs your test suite before you push your code to your repository or merge a pull request.
  • Use long and descriptive names for testing functions. The style guide here is slightly different than that of running code, where short names are often preferred. The reason is testing functions are never called explicitly. square() or even sqr() is ok in running code, but in testing code you would have names such as test_square_of_number_2(), test_square_negative_number(). These function names are displayed when a test fails, and should be as descriptive as possible.
  • Include detailed docstrings and comments throughout your testing files since these may be read more than the original code.

How to set up a suite of unit tests:

  1. make a tests directory in your code (or for simple code, just have your test file in the same directory);
  2. make a new file to hold your tests, called tests*.py — it must start with "test" for it to be noticed by testing programs;
  3. inside tests*.py, write a test function called test_*() — the testing programs look for functions with these names in particular and ignore other functions;
  4. use functions like assert and np.allclose for numeric comparisons of function outputs and checking for output types.
  5. run testing programs on your code. I recommend nosetests or pytest. You use these by running nosetests or py.test from the terminal window in the directory with your test code in it (or pointing to the directory). Next version will be nose2.

You can load files into Jupyter notebooks using the magic command %load. You can then run the code inside the notebook if you want (though the import statement will be an issue here), or just look at it.


In [ ]:
# %load ../data/package.py
def add(x, y):
    """doc """

    return x+y

print(add(1, 2))

if __name__ == '__main__':
    print(add(1, 1))

In [ ]:
# %load ../data/test.py
"""Test package.py"""

import package
import numpy as np


def test_add_12():
    """Test package with inputs 1, 2"""

    assert package.add(1, 2) == np.sum([1, 2])

Now, run the test. We can do this by escaping to the terminal, or we can go to our terminal window and run it there.

Note: starting a line of code with "!" makes it from the terminal window. Some commands are so common that you don't need to use the "!" (like ls), but in general you need it.


In [ ]:
!nosetests ../data/

Exercise

Start another file called test*.py. In it, write a test function, test_*(), that checks the accuracy of your original function in some way. Then run it with nosetests.


PEP 0008

A PEP is a Python Enhancement Proposal, describing ideas for design or processes to the Python community. The list of the PEPs is available online.

PEP 0008 is a set of style guidelines for writing good, clear, readable code, written with the assumption that code is read more often than it is written. These address questions such as, when a line of code is longer than one line, how should it be indented? And speaking of one line of code, how long should it be? Note than even in this document, they emphasize that these are guidelines and sometimes should be trumped by what has already been happening in a project or other considerations. But, generally, follow what they say here.

Here is a list of some style guidelines to follow, but check out the full guide for a wealth of good ideas:

  • indent with spaces, not tabs
  • indent with 4 spaces
  • limit all lines to a maximum of 79 characters
  • put a space after a comma
  • avoid trailing whitespace anywhere

Note that you can tell your text editor to enforce pep8 style guidelines to help you learn. This is called a linter. I do this with a plug-in in Sublime Text. You can get one for Atom.


Exercise

Go back and clean up your code you've been writing so that it follows pep8 standards.


Docstrings

Docstrings should be provided at the top of a code file for the whole package, and then for each function/class within the package.

Overall style

Overall style for docstrings is given in PEP 0257, and includes the following guidelines:

  • one liners: for really obvious functions. Keep the docstring completely on one line:

    def simple_function(): """Simple functionality."""

  • multiliners: Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description. The summary line may be used by automatic indexing tools; it is important that it fits on one line and is separated from the rest of the docstring by a blank line. The summary line may be on the same line as the opening quotes or on the next line.

    def complex_function(): """ One liner describing overall.

    Now more involved description of inputs and outputs.
    Possibly usage example(s) too.
    """

Styles for inputs/outputs

For the more involved description in the multi line docstring, there are several standards used. (These are summarized nicely in a post on Stack Overflow; this list is copied from there.)

  1. reST

    def complex_function(param1, param2): """ This is a reST style. :param param1: this is a first param :param param2: this is a second param :returns: this is a description of what is returned :raises keyError: raises an exception """

  2. Google

    def complex_function(param1, param2): """ This is an example of Google style. Args: param1: This is the first param. param2: This is a second param.
    Returns: This is a description of what is returned.
    Raises: KeyError: Raises an exception. """

  3. Numpydoc

    def complex_function(first, second, third='value'): """ Numpydoc format docstring.

    Parameters
    ----------
    first : array_like
        the 1st param name `first`
    second :
       the 2nd param
    third : {'value', 'other'}, optional
        the 3rd param, by default 'value'
    

    Returns
    -------
    string
        a value in a string
    

    Raises
    ------
    KeyError
        when a key error
    OtherError
        when an other error
    """

Documentation generation

Sphinx is a program that can be run to generate documentation for your project from your docstrings. You basically run the program and if you use the proper formatting in your docstrings, they will all be properly pulled out and presented nicely in a coherent way. There are various additions you can use with Sphinx in order to be able to write your docstrings in different formats (as shown above) and still have Sphinx be able to interpret them. For example, you can use Napoleon with Sphinx to be able to write using the Google style instead of reST, meaning that you can have much more readable docstrings and still get nicely-generated documentation out. Once you have generated this documentation, you can publish it using Read the docs. Here is documentation on readthedocs for a package that converts between colorspaces, Colorspacious.

Another approach is to use Sphinx, but link it with GitHub Pages, which is hosted directly from your GitHub repo page. Separately from documentation, I use GitHub Pages for my own website. I also use one for documentation for a package of mine, cmocean that provides colormaps for oceanography. To get this running, I followed instructions online. Note that GitHub pages is built using Jekyll but in this case we tell it not to use Jekyll and instead use Sphinx.

We can see that the docstrings in the code are nicely interpreted into documentation for the functions by comparing the module docs with the code below.


In [ ]:
# %load https://raw.githubusercontent.com/matplotlib/cmocean/master/cmocean/tools.py
'''
Plot up stuff with colormaps.
'''

import numpy as np
import matplotlib as mpl


def print_colormaps(cmaps, N=256, returnrgb=True, savefiles=False):
    '''Print colormaps in 256 RGB colors to text files.

    :param returnrgb=False: Whether or not to return the rgb array. Only makes sense to do if print one colormaps' rgb.

    '''

    rgb = []

    for cmap in cmaps:

        rgbtemp = cmap(np.linspace(0, 1, N))[np.newaxis, :, :3][0]
        if savefiles:
            np.savetxt(cmap.name + '-rgb.txt', rgbtemp)
        rgb.append(rgbtemp)

    if returnrgb:
        return rgb


def get_dict(cmap, N=256):
    '''Change from rgb to dictionary that LinearSegmentedColormap expects.
    Code from https://mycarta.wordpress.com/2014/04/25/convert-color-palettes-to-python-matplotlib-colormaps/
    and http://nbviewer.ipython.org/github/kwinkunks/notebooks/blob/master/Matteo_colourmaps.ipynb
    '''

    x = np.linspace(0, 1, N)  # position of sample n - ranges from 0 to 1

    rgb = cmap(x)

    # flip colormap to follow matplotlib standard
    if rgb[0, :].sum() < rgb[-1, :].sum():
        rgb = np.flipud(rgb)

    b3 = rgb[:, 2]  # value of blue at sample n
    b2 = rgb[:, 2]  # value of blue at sample n

    # Setting up columns for tuples
    g3 = rgb[:, 1]
    g2 = rgb[:, 1]

    r3 = rgb[:, 0]
    r2 = rgb[:, 0]

    # Creating tuples
    R = list(zip(x, r2, r3))
    G = list(zip(x, g2, g3))
    B = list(zip(x, b2, b3))

    # Creating dictionary
    k = ['red', 'green', 'blue']
    LinearL = dict(zip(k, [R, G, B]))

    return LinearL


def cmap(rgbin, N=256):
    '''Input an array of rgb values to generate a colormap.

    :param rgbin: An [mx3] array, where m is the number of input color triplets which
         are interpolated between to make the colormap that is returned. hex values
         can be input instead, as [mx1] in single quotes with a #.
    :param N=10: The number of levels to be interpolated to.

    '''

    # rgb inputs here
    if not mpl.cbook.is_string_like(rgbin[0]):
        # normalize to be out of 1 if out of 256 instead
        if rgbin.max() > 1:
            rgbin = rgbin/256.

    cmap = mpl.colors.LinearSegmentedColormap.from_list('mycmap', rgbin, N=N)

    return cmap

Debugging

You can use the package pdb while running your code to pause it intermittently and poke around to check variable values and understand what is going on.

A few key commands to get you started are:

  • pdb.set_trace() pauses the code run at this location, then allows you to type in the iPython window. You can print statements or check variables shapes, etc. This is how you can dig into code.
  • Once you have stopped your code at a trace, use:
    • n to move to the next line;
    • s to step into a function if that is the next line and you want to move into that function as opposed to just running the function;
    • c to continue until there is another trace, the code ends, it reaches the end of a function, or an error occurs;
    • q to quit out of the debugger, which will also quit out of the code being run.

Exercise

Use pdb to investigate variables after starting your code running.


Make a package

To make a Python package that you want to be a bit more official because you plan to use it long-term, and/or you want to share it with other people and make it easy for them to use, you are going to want to get it on GitHub, provide documentation, and get it on PyPI (this is how you are able to then easily install it with pip install [package_name]). There are also a number of technical steps you'll need to do. More information about this sort of process is available online.