In [ ]:
print "some words"
# print "Some more words"
We have seen that the hash (or pound) symbol '#
' introduces a comment in Python. This is intended to remind a human reader (who may be the person who wrote the programme) what a particular part of the programme does or how it works. But this isn't the only place we see documentation and before we wrap up for the day I share some useful ideas about how we can best document our programmes and their development. This becomes particularly important if we are developing or sharing programmes with others and touches on aspects of version controll as well as programming.
An intresting first question is how much of a programme should be documentation? 10%? 90%?
As well as comments, many things in Python come with built in help. We've seen the file
command, used to open a new file. We can use help
to find out about all the options:
In [2]:
help(file)
There is another way to get hold of this information which is used by the help
function itself:
In [1]:
print file.__doc__
and we can look at some of those other options:
In [ ]:
print file.softspace.__doc__
These __doc__
things are just strings and help
just prints them out! One of the nice things about Python is that we can add __doc__
s to our own functions, and Python's help system can use these. Let's see how we can do this by documenting the code below.
In [ ]:
def fahr_to_kelvin(temp):
return ((temp - 32) * (5.0/9.0)) + 273.15
def kelvin_to_celsius(temp):
return temp - 273.15
def fahr_to_celsius(temp):
temp_k = fahr_to_kelvin(temp)
result = kelvin_to_celsius(temp_k)
return result
print 'The boiling point of water is', fahr_to_celsius(212), 'C'
Add documentation by just inserting properly indented strings for each function. e.g.
In [ ]:
def fahr_to_kelvin(temp):
return ((temp - 32) * (5.0/9.0)) + 273.15
def kelvin_to_celsius(temp):
return temp - 273.15
def fahr_to_celsius(temp):
temp_k = fahr_to_kelvin(temp)
result = kelvin_to_celsius(temp_k)
return result
print 'The boiling point of water is', fahr_to_celsius(212), 'C'
In [ ]:
help(fahr_to_celsius)
Comments in this form are known as docstrings, and there are many tools that can work with them (for example, to create webpages describing how your functions work). There are even guidelines describing how to best format docstrings so that these tools give the best possible results. See https://www.python.org/dev/peps/pep-0257/ for the details.
It is good practice to include docstrings for your functions and make sure that these describe what the function does, what the input parameters are, what the results are, and a high level description of how it works. It is often a good idea to include references to the literature where the approach is described. One of my better efforts is below.
In [19]:
def rotT(T, g):
"""Rotate a rank 4 tensor, T, using a rotation matrix, g
Tensor rotation involves a summation over all combinations
of products of elements of the unrotated tensor and the
rotation matrix. Like this for a rank 3 tensor:
T'(ijk) -> Sum g(i,p)*g(j,q)*g(k,r)*T(pqr)
with the summation over p, q and r. The obvious implementation
involves (2*rank) length 3 loops building up the summation in the
inner set of loops. This optimized implementation >100 times faster
than that obvious implementaton using 8 nested loops. Returns a
3*3*3*3 numpy array representing the rotated tensor, Tprime.
"""
gg = np.outer(g, g) # Flatterns input and returns 9*9 array
# of all possible products
gggg = np.outer(gg, gg).reshape(4 * g.shape)
# 81*81 array of double products reshaped
# to 3*3*3*3*3*3*3*3 array...
axes = ((0, 2, 4, 6), (0, 1, 2, 3)) # We only need a subset
# of gggg in tensordot...
return np.tensordot(gggg, T, axes)
The commit messages we write when we commit to git repositories is also useful documentation. I often end up copying text out of docstrings to explain what new code does. It is important to include details of what you are changing and why. These commit messages also have tools to allow automatic processing and, for these to work well, it can help to format the message in a standard way. One description is here: http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html.
Create new git repository containing a Python file with the temperature conversion functions above.
Things get interesting when we send our new repository to GitHub. Now everybody can see it, but do they know whay it does? Can they use our code? We need to add documentation about this too, and GitHub makes this easy.
Create a new GitHub repository, push your code to this and reload in your web browser.
We should add "README" or "README.md". You can do this online using the do this using the +
button to create a new file.
Create a README file for the repository
We also need how they can use the code, all GitHub insists on is that they can show the software to other people and that you are responsible making sure that you don't upload other peoples stuff if you are not permitted to (https://help.github.com/articles/github-terms-of-service/).
If we want other people to use our code we need to give them permission, and tell them what the rules are. Broadly speaking, there are two kinds of open license for software, and half a dozen for data and publications. For software, people can choose between the GNU General Public License (GPL) on the one hand, and licenses like the MIT and BSD licenses on the other. All of these licenses allow unrestricted sharing and modification of programs, but the GPL is infective: anyone who distributes a modified version of the code (or anything that includes GPL'd code) must make their code freely available as well.
Proponents of the GPL argue that this requirement is needed to ensure that people who are benefiting from freely-available code are also contributing back to the community. Opponents counter that many open source projects have had long and successful lives without this condition, and that the GPL makes it more difficult to combine code from different sources. At the end of the day, what matters most is that:
The second point is as important as the first: most scientists are not lawyers, so wording that may seem sensible to a layperson may have unintended gaps or consequences. The Open Source Initiative maintains a list of open source licenses, and tl;drLegal explains many of them in plain English. GitHub has also put together a tool to help people choose a licesnse and built this process into their website.
Create a "LICENSE" file for the repository using the GitHub website using the tool to choose a licence.
It is also think about text and data. Creative Commons, sometimes known as CC, have put effort into creating licenses that are better suited for text and data than the licences for programmes described above. CC licenses can include a combination of of the following limitations on reuse:
These four restrictions are abbreviated "BY", "ND", "SA", and "NC" respectively, so "CC-BY-ND" means, "People can re-use the work both for free and commercially, but cannot make changes and must cite the original." These short descriptions summarize the six CC licenses in plain language, and include links to their full legal formulations.
Software Carpentry uses CC-BY for its lessons and the MIT License for its code in order to encourage the widest possible re-use.
As scientists, we probably want people to cite our work and this should include our software. We can indicate how we want people to cite our work by including a CITATION
file in the root of our repository.
Create a CITATION file outlining what paper should be cited. Finally, pull the changes back to yyour local reposotory.
Put docstrings in functions to provide help for that function.
Open scientific work is more useful and more highly cited than closed.
Add docstrings to some of the functions you created earler today.
Find out whether you are allowed to apply an open license to your software. Can you do this unilaterally, or do you need permission from someone in your institution? If so, who?