In [ ]:from __future__ import print_function, division, absolute_import
The notebook contains problems oriented around building a basic Python code repository and making it public via Github. Of course there are other places to put code repositories, with complexity ranging from services comparable to github to simple hosting a git server on your local machine. But this focuses on git and github as a ready-to-use example with plenty of additional resources to be found online.
Note that these problems assume you are using the Anaconda Python distribution. This is particular useful for these problems because it makes it very easy to install testing packages in virtual environments quickly and with little wasted disk space. If you are not using anaconda, you can either use an alternative virtual environment scheme (e.g. in Py 3, the built-in
venv), or just install pacakges directly into your default python (and hope for the best...).
git interaction, this notebook also uses the
git command line tools directly. There are a variety of GUI tools that make working with
git more visually intuitive (e.g. SourceTree, gitkraken, or the github desktop client), but this notebook uses the command line tools as the lowest common denominator. You are welcome to try to reproduce the steps with your client, however - feel free to ask your neighbors or instructors if you run into trouble there.
As a final note, this notebook's examples assume you are using a system with a unix-like shell (e.g. macOS, Linux, or Windows with git-bash or the Linux subsystem shell).
E Tollerud, B Sipocz
As an initial step before diving into code repositories, it's important to understand how you can use Jupyter as a shell. Most of the steps in this notebook require interaction with the system that's easier done with a shell or editor rather than using Python code in a notebook. While this could be done by opening up a terminal beside this notebook, to keep most of your work in the notebook itself, you can use the capabilities Jupyter + IPython offer for shell interaction.
The critical trick here is the
! magic in IPython. Anything after a leading
! in IPython gets run by the shell instead of as python code. Run the shell command
ls to see where IPython thinks you are on your system, and the contents of the directory.
hint: Be sure to remove the "#complete"s below when you've done so. IPython will interpret that as part of the shell command if you don't
In [ ]:! #complete ! #complete
IPython magics often support "cell" magics by having
%%<command> at the top of a cell. Use that to cd into the directory below this one ("..") and then
ls inside that directory.
Hint: if you need syntax tips, run the
magic() function and look for the
In [ ]:%%sh #complete
In [ ]:! #complete
One thing about shell commands is that they always start wherever you started your IPython instance. So doing
cd as a shell command only changes things temporarily (i.e. within that shell command). IPython provides a
%cd magic that makes this change last, though. Use this to
%cd into the directory you just created, and then use the
pwd shell command to ensure this cd "stuck" (You can also try doing
cd as a shell command to prove to yourself that it's different from the
In [ ]:%cd #complete
%cd -0 is a convenient shorthand to switch back to the initial directory.
Here we'll create a simple (public) code repository with a minimal set of content, and publish it in github.
Start by creating the simplest possible code repository, composed of a single code file. Create a directory (or use the one from 0c), and place a
code.py file in it, with a bit of Python code of your choosing. (Bonus points for witty or sarcastic code...) You could even use non-Python code if you desired, although Problems 3 & 4 feature Python-specific bits so I wouldn't recommend it.
To make the file from the notebook, the
%%file <filename> magic is a convenient way to write the contents of a notebook cell to a file.
In [ ]:!mkdir #complete only if you didn't do 0c, or want a different name for your code directory
In [ ]:%%file <yourdirectory>/code.py def do_something(): # complete print(something)# this will make it much easier in future problems to see that something is actually happening
If you want to test-run your code:
In [ ]:%run <yourdirectory>/code.py # complete do_something()
Make that code into a git repository by doing
git init in the directory you created, then
git add and
In [ ]:%cd # complete
In [ ]:!git init
In [ ]:!git add code.py !git commit -m #complete
Once you've got an account, you'll need to make sure your git client can authenticate with github. If you're using a GUI, you'll have to figure it out (usually it's pretty easy). On the command line you have two options:
gitwill prompt you for your github username and password every so often.
Once you've got github set up to talk to your computer, you'll need to create a new repository for the code you created. Hit the "+" in the upper-right, create a "new repository" and fill out the appropriate details (don't create a README just yet).
To stay sane, I recommend using the same name for your repository as the local directory name you used... But that is not a requirement, just a recommendation.
Once you've created the repository, connect your local repository to github and push your changes up to github.
In [ ]:!git remote add <yourgithubusername> <the url github shows you on the repo web page> #complete
In [ ]:!git push <yourgithubusername> master -u
-u is a convenience that means from then on you can use just
git push and
git pull to send your code to and from github.
We'll discuss proper documentation later. But for now make sure to add a README to your code repository. Always add a README with basic documentation. Always. Even if only you are going to use this code, trust me, future you will be very happy you did it.
You can just call it
README, but to get it to get rendered nicely on the github repository, you can call it
README.md and write it using markdown syntax,
REAMDE.rst in ReST (if you know what that is) or various other similar markup languages github understands. If you don't know/care, just use
README.md, as that's pretty standard at this point.
In [ ]:%%file README.md # complete
Don't forget to add and commit via
git and push up to github...
In [ ]:!git #complete
A bet you didn't expect to be reading legalese today... but it turns out this is important. If you do not explicitly license your code, in most countries (including the US and EU) it is technically illegal for anyone to use your code for any purpose other than just looking at it.
(Un?)Fortunately, there are a lot of possible open source licenses out there. Assuming you want an open license, the best resources is to use the "Choose a License" website. Have a look over the options there and decide which you think is appropriate for your code.
Once you've chosen a License, grab a copy of the license text, and place it in your repository as a file called
LICENSE.md or the like). Some licenses might also suggest you place the license text or just a copyright notice in the source code as well, but that's up to you.
Once you've done that, do as we've done before: push all your additions up to github. If you've done it right, github will automatically figure out your license and show it in the upper-right corner of your repo's github page.
In [ ]:!git #complete
There's not much point in having open source code if no one else can look at it or use it. So now we'll have you try modify your neighbors' project using github's Pull Request feature.
Find someone sitting near you who has gotten through Problem 1. Ask them their github user name and the name of their repository.
Once you've got the name of their repo, navigate to it on github. The URL pattern is always "https://www.github.com/theirusername/reponame". Use the github interface to "fork" that repo, yielding a "yourusername/reponame" repository. Go to that one, take note of the URL needed to clone it (you'll need to grab it from the repo web page, either in "HTTPS" or "SSH" form, depending on your choice in 1a). Then clone that onto your local machine.
In [ ]:# Don't forget to do this cd or something like it... otherwise you'll clone *inside* your repo %cd -0 !git clone <url from github>#complete %cd <reponame>#complete
You're going to make some changes to their code, but who knows... maybe they'll spend so long reviewing it that you want to do another. So it's always best to make changes in a specific "branch" for that change. So to do this we need to make a github branch.
In [ ]:!git branch <name-of-branch>#complete
Make some change to their code repo. Usually this would be a new feature or a bug fix or documentation clarification or the like... But it's up to you.
Once you've done that, be sure to commit the change locally.
In [ ]:!git add <files modified>#complete !git commit -m ""#complete
and push it up (to a branch on your github fork).
In [ ]:!git push origin <name-of-branch>#complete
Now use the github interface to create a new "pull request". If you time it right, once you've pushed your new branch up, you'll see a prompt to do this automatically appear on your fork's web page. But if you don't, use the "branches" drop-down to navigate to the new branch, and then hit the "pull request" button. That should show you an interface that you can use to leave a title and description (in github markdown), and then submit the PR. Go ahead and do this.
Tell your neighbor that you've issued the PR. They should be able to go to their repo, and see that a new pull request has been created. There they'll review the PR, possibly leaving comments for you to change. If so, go to 2f, but if not, they should hit the "Merge" button, and you can jump to 2g.
If they left you some comments that require changing prior to merging, you'll need to make those changes in your local copy, commit those changes, and then push them up to your branch on your fork.
In [ ]:!git #complete
Hopefully they are now satisfied and are willing to hit the merge button.
Now you should get the up-to-date version from the original owner of the repo, because that way you'll have both your changes and any other changes they might have made in the meantime. To do this you'll need to connect your local copy to your nieghbor's github repo (not your fork).
In [ ]:!git remote add <neighbors-username> <url-from-neighbors-github-repo> #complete !git fetch <neighbors-username> #complete !git branch --set-upstream-to=<neighbors-username>/master master !git checkout master !git pull
Now if you look at the local repo, it should include your changes.
Suggestion To stay sane, you might change the "origin" remote to your username. E.g.
git remote rename origin <yourusername>. To go further, you might even delete your fork's
master branch, so that only your neighbor's
master exists. That might save you headaches in the long run if you were to ever access this repo again in the future.
Science (Data or otherwise) and open source code is a social enterprise built on shared effort, mutual respect, and trust. So ask them to issue a PR aginst your code, too. The more we can stand on each others' shoulders, the farther we will all see.
Hint: Ask them nicely. Maybe offer a cookie or something?
Up to this point we've been working on the simplest possible shared code: a single file with all the content. But for most substantial use cases this isn't going to cut it. After all, Python was designed around the idea of namespaces that let you hide away or show code to make writing, maintaining, and versioning code much easier. But to make use of these, we need to deploy the installational tools that Python provides. This is typically called "packaging". In this problem we will take the code you just made it and build it into a proper python package that can be installed and then used anywhere.
For more background and detail (and the most up-to-date recommendations) see the Python Packaging Guide.
First we adjust the structure of your code from Problem 1 to allow it to live in a package structure rather than as a stand-alone
.py file. All you need to do is create a directory, move the
code.py file into that directory, and add a file (can be empty) called
__init__.py into the directory.
You'll have to pick a name for the package, which is usually the same as the repo name (although that's not strictly required).
Hint: don't forget to switch back to your code repo directory, if you are doing this immediately after Problem 2.
In [ ]:!mkdir <yourpkgname>#complete !git mv code.py <yourpkgname>#complete
In [ ]:#The "touch" unix command simply creates an empty file if there isn't one already. #You could also use an editor to create an empty file if you prefer. !touch <yourpkgname>/__init__.py#complete
You should now be able to import your package and the code inside it as though it were some installed package like
In [ ]:from <yourpkgname> import code#complete #if your code.py has a function called `do_something` as in the example above, you can now run it like: code.do_something()
One of the nice things about packages is that they let you hide the implementation of some part of your code in one place while exposing a "cleaner" namespace to the users of your package. To see a (trivial) example, of this, lets pull a function from your
code.py into the base namespace of the package. In the below make the
__init__.py have one line:
from .code import do_something. That places the
do_something() function into the package's root namespace.
In [ ]:%%file <yourpkgname>/__init__.py #complete
Now the following should work.
In [ ]:import <yourpkgname>#complete <yourpkgname>.do_something()#complete
BUT you will probably get an error here. That's because Python is smart about imports: once it's imported a package once it won't re-import it later. Usually that saves time, but here it's a hassle. Fortunately, we can use the
reload function to get around this:
In [ ]:from importlib import reload #not necessary on Py 2.x, where reload() is built-in reload(<yourpkgname>)#complete <yourpkgname>.do_something()#complete
Ok, that's great in a pinch, but what if you want your package to be available from other directories? If you open a new terminal somewhere else and try to
import <yourpkgname> you'll see that it will fail, because Python doesn't know where to find your package. Fortunately, Python (both the language and the larger ecosystem) provide built-in tools to install packages. These are built around creating a
setup.py script that controls installation of a python packages into a shared location on your machine. Essentially all Python packages are installed this way, even if it happens silently behind-the-scenes.
Below is a template bare-bones setup.py file. Fill it in with the relevant details for your package.
In [ ]:%%file setup.py #!/usr/bin/env python from distutils.core import setup setup(name='<yourpkgname>', version='0.1dev', description='<a description>', author='<your name>', author_email='<youremail>', packages=['<yourpkgname>'], ) #complete
Now you should be able to "build" the package. In complex packages this will involve more involved steps like linking against C or FORTRAN code, but for pure-python packages like yours, it simply involves filtering out some extraneous files and copying the essential pieces into a build directory.
In [ ]:!python setup.py build
To test that it built sucessfully, the easiest thing to do is cd into the
build/lib.X-Y-Z directory ("X-Y-Z" here is OS and machine-specific). Then you should be able to
import <yourpkgname>. It's usually best to do this as a completely independent process in python. That way you can be sure you aren't accidentally using an old import as we saw above.
In [ ]:%%sh cd build/lib.X-Y-Z #complete python -c "import <yourpkgname>;<yourpkgname>.do_something()" #complete
Alright, now that it looks like it's all working as expected, we can install the package. Note that if we do this willy-nilly, we'll end up with lots of packages, perhaps with the wrong versions, and it's easy to get confused about what's installed (there's no reliable
uninstall command...) So before installing we first create a virtual environment using Anaconda, and install into that. If you don't have anaconda or a similar virtual environment scheme, you can just do
python setup.py install. But just remember that this will be difficult to back out (hence the reason for Python environments in the first place!)
In [ ]:%%sh conda create -n test_<yourpkgname> anaconda #complete source activate test_<yourpkgname> #complete python setup.py install
Now we can try running the package from anywhere (not just the source code directory), as long as we're in the same environment that we installed the package in.
In [ ]:%%sh cd $HOME source activate test_<yourpkgname> #complete python -c "import <yourpkgname>;<yourpkgname>.do_something()" #complete
OK, it's now installable. You'll now want to make sure to update the github version to reflect these improvements. You'll need to add and commit all the files. You'll also want to update the README to instruct users that they should use
python setup.py install to install the package.
In [ ]:!git #complete
Now that your package can be installed by anyone who comes across it on github. But it tends to scare some people that they need to download the source code and know
git to use your code. The Python Package Index (PyPI), combined with the
pip tool (now standard in Python) provides a much simpler way to distribute code. Here we will publish your code to a testing version of PyPI.
First you'll need an account on PyPI to register new packages. Go to the testing PyPI, and register. You'll also need to supply your login details in the
.pypirc directory in your home directory as shown below. (If it were the real PyPI you'd want to be more secure and not have your password in plain text. But for the testing server that's not really an issue.)
Note that if you've ever done something like this before and hence already have a
.pypirc file, you might get unexpected results if you run this without moving/renaming the ond version temorarily.
In [ ]:%%file -a ~/.pypirc [distutils] index-servers = pypi [pypi] repository = https://test.pypi.org/legacy/ username = <your user name goes here> password = <your password goes here>
distutils to create the source distribution of your package.
Hint: You'll want to make sure your package version is something you want to release before executing the upload command. Released versions can't be duplicates of existing versions, and shouldn't end in "dev" or "b" or the like."
In [ ]:!python setup.py sdist
Verify that there is a
<yourpkg>-<version>.tar.gz file in the
dist directory. It should have all of the source code necessary for your package.
Once you have an account on PyPI (or testPyPI in our case) you can upload your distributions to PyPI using
twine. If this is your first time uploading a distribution for a new project, twine will handle registering the project automatically filling out the details you provided in your
In [ ]:!twine upload dist/<yourpackage>-<version>
If for some reason this fails (which does happen for unclear reasons on occasion), you can usually just directly upload the
.tar.gz file from the web interface without too much trouble.
pip tool is a convenient way to install packages on PyPI. Again, we use Anaconda to create a testing environment to make sure everything worked correctly.
-i wouldn't be necessary - we're using it here only because we're using the "testing" PyPI)
In [ ]:%%sh conda create -n test_pypi_<yourpkgname> anaconda #complete source activate test_pypi_<yourpkgname> #complete pip install -i https://testpypi.python.org/pypi <yourpkgname>
In [ ]:%%sh cd $HOME source activate test_pypi_<yourpkgname> #complete python -c "import <yourpkgname>;<yourpkgname>.do_something()" #complete
Ask your neighbor to try to install your package just like you did above. Hopefully they'll get it to work right out of the box.
Hint: Don't forget to be nice to them! Always be nice to your users - it makes them want to be nice to your by contributing improvements or citations... Also, it's just good to be nice, period, dontcha think?
The above is all based on the assumption of a bare-bones package. In practice there's a lot of stuff you can add to a package that's convenient, but requires a variety of boilerplate setup or knowledge about tricks to get it all to work together. Astropy has created a "package template" that's meant to reduce the burden of this by providing a package without actual code but lots of "batteries included". Then you simply need to fill in your code and use the tools, instead of needing to set them up.
Try setting up a package using the astropy package template. Go to the Astropy affiliated package site, and follow the instructions at the bottom leading you to the package template and how to use it. Populate it with your code from above.
For an extra challenge (and to see one of the reasons why it's useful), see if you can make the Cython example code work. Cython is a tool that lets you compile Python-like code into C, which can be orders-of-magnitude faster depending on how you design the code. It get be tricky to package correctly, though, and the affiliated package template gets rid of a lot of that pain.
The template comes with a simple example of a Cython code. Try to get it compiled and running.
The sky's the limit here. Can you make some Cython code go faster than the
numpy equivalent? Experiment as you wish, but revel in not needing to understand how to invoke Cython.