We have used Jupyter notebooks in this class as a useful tool for integrating text and interactive, usable code. However, many people when real-life coding would not use Jupyter notebooks but would instead type code in text files and run them via a terminal window or in iPython. Many of you have done this before, perhaps analogously in Matlab by writing your code in a .m file and then running it in the Matlab GUI. Writing code in a separate file allows more heavy computations as well as allowing that code to be reused more easily than when it is written in a Jupyter notebook.
Later, we will demonstrate typing Python code in a .py file and then running it in an iPython window. First, a few of the other options...
The Jupyter project is becoming more and more sophisticated. The next project coming out is called JupyterLab and aims to more cleanly integrate modules that are already available in the server setup we have been using: notebooks, terminals, text editor, file hierarchy, etc. You can see a pre-alpha release image of it below:
Two options for using Python in a clickable, graphical user interface are Spyder and Canopy. Spyder is open source and Canopy is not, though the company that puts Canopy together (Enthought) does make a version available for free.
Both are shown below. They are generally similar from the perspective of what we've been using so far. They have a console for getting information when you run your code (equivalent to running a cell in your notebook), typing in your file, getting more information about code, having nice code syntax coloring, maybe being able to examine Python objects. Note that many of these features are being integrated in less formal GUI tools like this — you'll see even in the terminal window in iPython you have access to many nice features.
In [5]:
import numpy as np
import matplotlib.pyplot as plt
# just for jupyter notebooks
%matplotlib inline
In [6]:
x = np.linspace(0, 10)
y = x**2
In [7]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, 'k', lw=3)
Out[7]:
Now let's switch to a terminal window and a text file...
Get Anaconda downloaded and opened up on your machine if you want. Open a terminal window, or use the one that Anaconda opens, and type:
ipython
Or, you can use redfish. On redfish
:
Go to the home menu on redfish, on the right-hand-side under "New", choose "Terminal" to open a terminal window that is running on redfish
. To run Python 3 in this terminal window, you'll need to use the command ipython3
instead of ipython
, due to the way the alias to the program is set up:
ipython3
Note that we will use this syntax to mean that it is something to be typed in your terminal window/command prompt (or it is part of an exercise). To open ipython with some niceties added so that you don't have to import them by hand each time you open the program (numpy and matplotlib in particular), open it with
ipython --pylab
Once you have done this, you'll see a series of text lines indicating that you are now in the program ipython. You can now type code as if you were in a code window in a Jupyter notebook (but without an ability to integrate text easily).
Copy in the code to define
x
andy
from above, then make the figure. If you haven't openedipython
with the option flag--pylab
, you will need to still do the import statements, but not%matplotlib inline
since that is only for notebooks.Notice how the figure appears as a separate window. Play with the figure window — you can change the size and properties of the plot using the GUI buttons, and you can zoom.
A typical coding set up is to have a terminal window with ipython
running alongside a text window where you type your code. You can then go back and forth, trying things out in iPython, and keeping what works in the text window so that you finish with a working script, which can be run independently in the future. This is, of course, what you've been doing when you use Jupyter notebooks, except everything is combined into one place in that sort of setup. If you are familiar with Matlab, this is what you are used to when you have your Matlab window with your *.m
text window alongside a "terminal window" where you can type things. (There is also a variable viewer included in Matlab.) This is also what you can do in a single program with Jupyterlab.
A good text editor should be able to highlight syntax – that is, use color and font style to differentiate between special key words and types for a given programming language. This is what has been happening in our Jupyter notebooks, when strings are colored red, for example. The editors will also key off typical behaviors in the language to try to be helpful, such as when in a Jupyter notebook if you write an if
statement with a colon at the end and push enter
, the next line will be automatically indented so that you can just start typing. These behaviors can be adjusted by changing user settings.
Some options are TextMate for Mac, which costs money, and Sublime Text which works on all operating systems and also costs money (after a free trial). For this class, we recommend using Atom, which is free, works across operating systems, and integrates with GitHub since they wrote it.
So, go download Atom and start using it, unless you have a preferred alternative you want to use.
If you are running python locally on your machine with Anaconda, copy and paste the code from above into a new text file in your text editor. Save it, then run the file in ipython with
run [filename]
If you are sticking with
redfish
, you can type out text in a text file from the home window (under New), and you can get most but not all functionality this way. Or you can try one of the GUIs.
The advantage of Anaconda is being able to really easily add packages, with
conda install [packagename]
This will look for the Python package you want in the places known to conda. You may also tell it to look in another channel, which other people and groups can maintain. For example, cartopy
is available through the scitools channel
:
conda install -c scitools cartopy
Sometimes, it is better or necessary to use pip
to install packages, which links to the PyPI collections of packages that anyone can place there for other people to use. For example, you can get the cmocean
colormaps package from PyPI with
pip install cmocean
We've been running our notebooks on a TAMU server all semester. You can do this on your own machine pretty easily once you have Anaconda. There should be a place for you to double-click for it to open, or you can open a terminal window and type:
jupyter notebook
This opens a window that should look familiar in your browser window. The difference is that instead of connecting to a remote server (on redfish), you are connecting to a local server on your own machine that you started running with the jupyter notebook
command.
When you use the command run
in iPython, parts of the code in that file are implemented as if they were written in the iPython window itself. Code that is outside of a function call will run: at 0 indentation level (import statements and maybe some variables definitions are common), but not any functions, though it will read the functions into your local variables so that they can be used. Code inside the line if __name__ == '__main__':
will also be run. This syntax is available so that default run commands can be built into your script. This is often used to provide example or test code that can be easily run.
Note anytime you are accessing a saved file from iPython, you need to have at least one of the following be true:
There is example code available to try at https://github.com/kthyng/python4geosciences/blob/master/data/optimal_interpolation.py. Copy this code into your text editor and save it to the same location on your computer as your iPython window is open.
Within your iPython window, run optimal_interpolation.py with
run optimal_interpolation
(if you saved it in the same directory as your iPython window). Which part of the code actually runs? Why?Add some
if __name__ == '__main__':
and see what comes through when you run the code. Can you access the classoi_1d
?
The point of importing code is to be able to then use the functions and classes that you've written in other files. Importing your own code is just like importing numpy
or any other package. You use the same syntax and you have the same ability to query the built-in methods.
import numpy
or:
import [your_code]
When you import a package, any code at the 0 indentation level will run; however, the code within if __name__ == '__main__':
will not run.
When you are using a single script for analysis, you may just use run
to use your code. However, as you build up complexity in your work, you'll probably want to make separate, independent code bases that can be called from subsequent code by importing them (again, just like us using the capabilities in numpy
).
When you import a package, a *.pyc
file is created which holds compiled code that is subsequently read when the package is again imported, in order to save time. When you are in a single session in iPython, that *.pyc
will be used and not updated. If you have changed the code and want it to be updated, you either need to exit iPython and reopen it, or you need to reload
the package. These is different syntax for this depending on the version of Python you are using, but we are using Python 3 (>3.4) so we will do the following:
For >= Python3.4:
import importlib
importlib.reload([code to reload])
Write your own simple script with a function in it — your function should take at least one input (maybe several numbers) and return something (maybe a number that is the result of some calculation).
Now, use your code in several ways. Run the code in ipython with
run [filename]
Make sure you have a name definition for this (
if __name__ == '__main__':
, etc). Now import the code and use it:import [filename]
Add a docstring to the top of the file and reload your package, then query the code. Do you see the docstring? Add a docstring to the function in the file, reload, and query. Do you see the docstring?
You should have been able to run your code both ways: running it directly, and importing it, then using a function that is within the code.
The idea of unit testing is to develop tests for your code as you develop it, so that you automatically know if it is working properly as you make changes. In fact, some coders prefer to write the unit tests first to drive the proper development of their code and to know when it is working. Of course, the quality of the testing is made up of the tests you include and aspects of your code that you test.
Here are some unit test guidelines:
How to set up a suite of unit tests:
tests
directory in your code (or for simple code, just have your test file in the same directory);tests*.py
— it must start with "test" for it to be noticed by testing programs;tests*.py
, write a test function called test_*()
— the testing programs look for functions with these names in particular and ignore other functions;assert
and np.allclose
for numeric comparisons of function outputs and checking for output types.nosetests
or py.test
from the terminal window in the directory with your test code in it (or pointing to the directory). Next version will be nose2
.You can load files into Jupyter notebooks using the magic command %load
. You can then run the code inside the notebook if you want (though the import
statement will be an issue here), or just look at it.
In [ ]:
# %load ../data/package.py
def add(x, y):
"""doc """
return x+y
print(add(1, 2))
if __name__ == '__main__':
print(add(1, 1))
In [ ]:
# %load ../data/test.py
"""Test package.py"""
import package
import numpy as np
def test_add_12():
"""Test package with inputs 1, 2"""
assert package.add(1, 2) == np.sum([1, 2])
Now, run the test. We can do this by escaping to the terminal, or we can go to our terminal window and run it there.
Note: starting a line of code with "!" makes it from the terminal window. Some commands are so common that you don't need to use the "!" (like ls
), but in general you need it.
In [ ]:
!nosetests ../data/
A PEP is a Python Enhancement Proposal, describing ideas for design or processes to the Python community. The list of the PEPs is available online.
PEP 0008 is a set of style guidelines for writing good, clear, readable code, written with the assumption that code is read more often than it is written. These address questions such as, when a line of code is longer than one line, how should it be indented? And speaking of one line of code, how long should it be? Note than even in this document, they emphasize that these are guidelines and sometimes should be trumped by what has already been happening in a project or other considerations. But, generally, follow what they say here.
Here is a list of some style guidelines to follow, but check out the full guide for a wealth of good ideas:
Note that you can tell your text editor to enforce pep8 style guidelines to help you learn. This is called a linter. I do this with a plug-in in Sublime Text. You can get one for Atom.
Docstrings should be provided at the top of a code file for the whole package, and then for each function/class within the package.
Overall style for docstrings is given in PEP 0257, and includes the following guidelines:
one liners: for really obvious functions. Keep the docstring completely on one line:
def simple_function(): """Simple functionality."""
multiliners: Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description. The summary line may be used by automatic indexing tools; it is important that it fits on one line and is separated from the rest of the docstring by a blank line. The summary line may be on the same line as the opening quotes or on the next line.
def complex_function(): """ One liner describing overall.
Now more involved description of inputs and outputs. Possibly usage example(s) too. """
For the more involved description in the multi line docstring, there are several standards used. (These are summarized nicely in a post on Stack Overflow; this list is copied from there.)
def complex_function(param1, param2): """ This is a reST style. :param param1: this is a first param :param param2: this is a second param :returns: this is a description of what is returned :raises keyError: raises an exception """
def complex_function(param1, param2): """ This is an example of Google style. Args: param1: This is the first param. param2: This is a second param.
Returns: This is a description of what is returned.
Raises: KeyError: Raises an exception. """
def complex_function(first, second, third='value'): """ Numpydoc format docstring.
Parameters ---------- first : array_like the 1st param name `first` second : the 2nd param third : {'value', 'other'}, optional the 3rd param, by default 'value'
Returns ------- string a value in a string
Raises ------ KeyError when a key error OtherError when an other error """
Sphinx is a program that can be run to generate documentation for your project from your docstrings. You basically run the program and if you use the proper formatting in your docstrings, they will all be properly pulled out and presented nicely in a coherent way. There are various additions you can use with Sphinx in order to be able to write your docstrings in different formats (as shown above) and still have Sphinx be able to interpret them. For example, you can use Napoleon with Sphinx to be able to write using the Google style instead of reST, meaning that you can have much more readable docstrings and still get nicely-generated documentation out. Once you have generated this documentation, you can publish it using Read the docs. Here is documentation on readthedocs for a package that converts between colorspaces, Colorspacious.
Another approach is to use Sphinx, but link it with GitHub Pages, which is hosted directly from your GitHub repo page. Separately from documentation, I use GitHub Pages for my own website. I also use one for documentation for a package of mine, cmocean that provides colormaps for oceanography. To get this running, I followed instructions online. Note that GitHub pages is built using Jekyll but in this case we tell it not to use Jekyll and instead use Sphinx.
We can see that the docstrings in the code are nicely interpreted into documentation for the functions by comparing the module docs with the code below.
In [ ]:
# %load https://raw.githubusercontent.com/matplotlib/cmocean/master/cmocean/tools.py
'''
Plot up stuff with colormaps.
'''
import numpy as np
import matplotlib as mpl
def print_colormaps(cmaps, N=256, returnrgb=True, savefiles=False):
'''Print colormaps in 256 RGB colors to text files.
:param returnrgb=False: Whether or not to return the rgb array. Only makes sense to do if print one colormaps' rgb.
'''
rgb = []
for cmap in cmaps:
rgbtemp = cmap(np.linspace(0, 1, N))[np.newaxis, :, :3][0]
if savefiles:
np.savetxt(cmap.name + '-rgb.txt', rgbtemp)
rgb.append(rgbtemp)
if returnrgb:
return rgb
def get_dict(cmap, N=256):
'''Change from rgb to dictionary that LinearSegmentedColormap expects.
Code from https://mycarta.wordpress.com/2014/04/25/convert-color-palettes-to-python-matplotlib-colormaps/
and http://nbviewer.ipython.org/github/kwinkunks/notebooks/blob/master/Matteo_colourmaps.ipynb
'''
x = np.linspace(0, 1, N) # position of sample n - ranges from 0 to 1
rgb = cmap(x)
# flip colormap to follow matplotlib standard
if rgb[0, :].sum() < rgb[-1, :].sum():
rgb = np.flipud(rgb)
b3 = rgb[:, 2] # value of blue at sample n
b2 = rgb[:, 2] # value of blue at sample n
# Setting up columns for tuples
g3 = rgb[:, 1]
g2 = rgb[:, 1]
r3 = rgb[:, 0]
r2 = rgb[:, 0]
# Creating tuples
R = list(zip(x, r2, r3))
G = list(zip(x, g2, g3))
B = list(zip(x, b2, b3))
# Creating dictionary
k = ['red', 'green', 'blue']
LinearL = dict(zip(k, [R, G, B]))
return LinearL
def cmap(rgbin, N=256):
'''Input an array of rgb values to generate a colormap.
:param rgbin: An [mx3] array, where m is the number of input color triplets which
are interpolated between to make the colormap that is returned. hex values
can be input instead, as [mx1] in single quotes with a #.
:param N=10: The number of levels to be interpolated to.
'''
# rgb inputs here
if not mpl.cbook.is_string_like(rgbin[0]):
# normalize to be out of 1 if out of 256 instead
if rgbin.max() > 1:
rgbin = rgbin/256.
cmap = mpl.colors.LinearSegmentedColormap.from_list('mycmap', rgbin, N=N)
return cmap
You can use the package pdb
while running your code to pause it intermittently and poke around to check variable values and understand what is going on.
A few key commands to get you started are:
pdb.set_trace()
pauses the code run at this location, then allows you to type in the iPython window. You can print statements or check variables shapes, etc. This is how you can dig into code.n
to move to the next line;s
to step into a function if that is the next line and you want to move into that function as opposed to just running the function;c
to continue until there is another trace, the code ends, it reaches the end of a function, or an error occurs;q
to quit out of the debugger, which will also quit out of the code being run.To make a Python package that you want to be a bit more official because you plan to use it long-term, and/or you want to share it with other people and make it easy for them to use, you are going to want to get it on GitHub, provide documentation, and get it on PyPI (this is how you are able to then easily install it with pip install [package_name]
). There are also a number of technical steps you'll need to do. More information about this sort of process is available online.