In [ ]:
from __future__ import print_function, division, absolute_import

Code Repositories

Astro Hack Week 2017

The notebook contains problems oriented around building a basic Python code repository and making it public via Github. Of course there are other places to put code repositories, with complexity ranging from services comparable to github to simple hosting a git server on your local machine. But this focuses on git and github as a ready-to-use example with plenty of additional resources to be found online.

Note that these problems assume you are using the Anaconda Python distribution. This is particular useful for these problems because it makes it very easy to install testing packages in virtual environments quickly and with little wasted disk space. If you are not using anaconda, you can either use an alternative virtual environment scheme (e.g. in Py 3, the built-in venv), or just install pacakges directly into your default python (and hope for the best...).

For git interaction, this notebook also uses the git command line tools directly. There are a variety of GUI tools that make working with git more visually intuitive (e.g. SourceTree, gitkraken, or the github desktop client), but this notebook uses the command line tools as the lowest common denominator. You are welcome to try to reproduce the steps with your client, however - feel free to ask your neighbors or instructors if you run into trouble there.

As a final note, this notebook's examples assume you are using a system with a unix-like shell (e.g. macOS, Linux, or Windows with git-bash or the Linux subsystem shell).


E Tollerud, B Sipocz

Problem 0: Using Jupyter as a shell

As an initial step before diving into code repositories, it's important to understand how you can use Jupyter as a shell. Most of the steps in this notebook require interaction with the system that's easier done with a shell or editor rather than using Python code in a notebook. While this could be done by opening up a terminal beside this notebook, to keep most of your work in the notebook itself, you can use the capabilities Jupyter + IPython offer for shell interaction.

0a: Figure out your base shell path and what's in it

The critical trick here is the ! magic in IPython. Anything after a leading ! in IPython gets run by the shell instead of as python code. Run the shell command pwd and ls to see where IPython thinks you are on your system, and the contents of the directory.

hint: Be sure to remove the "#complete"s below when you've done so. IPython will interpret that as part of the shell command if you don't


In [ ]:
! #complete
! #complete

0b: Try a multi-line shell command

IPython magics often support "cell" magics by having %%<command> at the top of a cell. Use that to cd into the directory below this one ("..") and then ls inside that directory.

Hint: if you need syntax tips, run the magic() function and look for the ! or !! commands


In [ ]:
%%sh

#complete

0c: Create a new directory from Jupyter

While you can do this almost as easily with os.mkdir in Python, for this case try to do it using shell magics instead. Make a new directory in the directory you are currently in. Use your system file browser to ensure you were sucessful.


In [ ]:
! #complete

0d: Change directory to your new directory

One thing about shell commands is that they always start wherever you started your IPython instance. So doing cd as a shell command only changes things temporarily (i.e. within that shell command). IPython provides a %cd magic that makes this change last, though. Use this to %cd into the directory you just created, and then use the pwd shell command to ensure this cd "stuck" (You can also try doing cd as a shell command to prove to yourself that it's different from the %cd magic.)


In [ ]:
%cd #complete

Final note: %cd -0 is a convenient shorthand to switch back to the initial directory.

Problem 1: Creating a bare-bones repo and getting it on Github

Here we'll create a simple (public) code repository with a minimal set of content, and publish it in github.

1a: Create a basic repository locally

Start by creating the simplest possible code repository, composed of a single code file. Create a directory (or use the one from 0c), and place a code.py file in it, with a bit of Python code of your choosing. (Bonus points for witty or sarcastic code...) You could even use non-Python code if you desired, although Problems 3 & 4 feature Python-specific bits so I wouldn't recommend it.

To make the file from the notebook, the %%file <filename> magic is a convenient way to write the contents of a notebook cell to a file.


In [ ]:
!mkdir #complete only if you didn't do 0c, or want a different name for your code directory

In [ ]:
%%file <yourdirectory>/code.py

def do_something():
    # complete
    print(something)# this will make it much easier in future problems to see that something is actually happening

If you want to test-run your code:


In [ ]:
%run <yourdirectory>/code.py # complete
do_something()

1b: Convert the directory into a git repo

Make that code into a git repository by doing git init in the directory you created, then git add and git commit.


In [ ]:
%cd # complete

In [ ]:
!git init

In [ ]:
!git add code.py
!git commit -m #complete

1c: Create a repository for your code in Github

Go to github's web site in your web browser. If you do not have a github account, you'll need to create one (follow the prompts on the github site).

Once you've got an account, you'll need to make sure your git client can authenticate with github. If you're using a GUI, you'll have to figure it out (usually it's pretty easy). On the command line you have two options:

  • The simplest way is to connect to github using HTTPS. This requires no initial setup, but git will prompt you for your github username and password every so often.
  • If you find that annoying (I do...), you can set up your system to use SSH to talk to github. Look for the "SSH and GPG keys" section of your settings on github's site, or if you're not sure how to work with SSH keys, check out github's help on the subject.

Once you've got github set up to talk to your computer, you'll need to create a new repository for the code you created. Hit the "+" in the upper-right, create a "new repository" and fill out the appropriate details (don't create a README just yet).

To stay sane, I recommend using the same name for your repository as the local directory name you used... But that is not a requirement, just a recommendation.

Once you've created the repository, connect your local repository to github and push your changes up to github.


In [ ]:
!git remote add <yourgithubusername> <the url github shows you on the repo web page> #complete

In [ ]:
!git push <yourgithubusername> master -u

The -u is a convenience that means from then on you can use just git push and git pull to send your code to and from github.

1e: Modify the code and send it back up to github

We'll discuss proper documentation later. But for now make sure to add a README to your code repository. Always add a README with basic documentation. Always. Even if only you are going to use this code, trust me, future you will be very happy you did it.

You can just call it README, but to get it to get rendered nicely on the github repository, you can call it README.md and write it using markdown syntax, REAMDE.rst in ReST (if you know what that is) or various other similar markup languages github understands. If you don't know/care, just use README.md, as that's pretty standard at this point.


In [ ]:
%%file README.md

# complete

Don't forget to add and commit via git and push up to github...


In [ ]:
!git #complete

1f: Choose a License

A bet you didn't expect to be reading legalese today... but it turns out this is important. If you do not explicitly license your code, in most countries (including the US and EU) it is technically illegal for anyone to use your code for any purpose other than just looking at it.

(Un?)Fortunately, there are a lot of possible open source licenses out there. Assuming you want an open license, the best resources is to use the "Choose a License" website. Have a look over the options there and decide which you think is appropriate for your code.

Once you've chosen a License, grab a copy of the license text, and place it in your repository as a file called LICENSE (or LICENSE.md or the like). Some licenses might also suggest you place the license text or just a copyright notice in the source code as well, but that's up to you.

Once you've done that, do as we've done before: push all your additions up to github. If you've done it right, github will automatically figure out your license and show it in the upper-right corner of your repo's github page.


In [ ]:
!git #complete

Problem 2: Collaborating with others' repos

There's not much point in having open source code if no one else can look at it or use it. So now we'll have you try modify your neighbors' project using github's Pull Request feature.

2a: Get (git?) your neighbor's code repo

Find someone sitting near you who has gotten through Problem 1. Ask them their github user name and the name of their repository.

Once you've got the name of their repo, navigate to it on github. The URL pattern is always "https://www.github.com/theirusername/reponame". Use the github interface to "fork" that repo, yielding a "yourusername/reponame" repository. Go to that one, take note of the URL needed to clone it (you'll need to grab it from the repo web page, either in "HTTPS" or "SSH" form, depending on your choice in 1a). Then clone that onto your local machine.


In [ ]:
# Don't forget to do this cd or something like it... otherwise you'll clone *inside* your repo
%cd -0

!git clone <url from github>#complete 
%cd <reponame>#complete

2c: create a branch for your change

You're going to make some changes to their code, but who knows... maybe they'll spend so long reviewing it that you want to do another. So it's always best to make changes in a specific "branch" for that change. So to do this we need to make a github branch.


In [ ]:
!git branch <name-of-branch>#complete

2c: modify the code

Make some change to their code repo. Usually this would be a new feature or a bug fix or documentation clarification or the like... But it's up to you.

Once you've done that, be sure to commit the change locally.


In [ ]:
!git add <files modified>#complete
!git commit -m ""#complete

and push it up (to a branch on your github fork).


In [ ]:
!git push origin <name-of-branch>#complete

2d: Issue a pull request

Now use the github interface to create a new "pull request". If you time it right, once you've pushed your new branch up, you'll see a prompt to do this automatically appear on your fork's web page. But if you don't, use the "branches" drop-down to navigate to the new branch, and then hit the "pull request" button. That should show you an interface that you can use to leave a title and description (in github markdown), and then submit the PR. Go ahead and do this.

2e: Have them review the PR

Tell your neighbor that you've issued the PR. They should be able to go to their repo, and see that a new pull request has been created. There they'll review the PR, possibly leaving comments for you to change. If so, go to 2f, but if not, they should hit the "Merge" button, and you can jump to 2g.

2f: (If necessary) make changes and update the code

If they left you some comments that require changing prior to merging, you'll need to make those changes in your local copy, commit those changes, and then push them up to your branch on your fork.


In [ ]:
!git #complete

Hopefully they are now satisfied and are willing to hit the merge button.

2g: Get the updated version

Now you should get the up-to-date version from the original owner of the repo, because that way you'll have both your changes and any other changes they might have made in the meantime. To do this you'll need to connect your local copy to your nieghbor's github repo (not your fork).


In [ ]:
!git remote add <neighbors-username> <url-from-neighbors-github-repo> #complete
!git fetch <neighbors-username> #complete
!git branch --set-upstream-to=<neighbors-username>/master master
!git checkout master
!git pull

Now if you look at the local repo, it should include your changes.

Suggestion To stay sane, you might change the "origin" remote to your username. E.g. git remote rename origin <yourusername>. To go further, you might even delete your fork's master branch, so that only your neighbor's master exists. That might save you headaches in the long run if you were to ever access this repo again in the future.

2h: Have them reciprocate

Science (Data or otherwise) and open source code is a social enterprise built on shared effort, mutual respect, and trust. So ask them to issue a PR aginst your code, too. The more we can stand on each others' shoulders, the farther we will all see.

Hint: Ask them nicely. Maybe offer a cookie or something?

Problem 3: Setting up a bare-bones Python Package

Up to this point we've been working on the simplest possible shared code: a single file with all the content. But for most substantial use cases this isn't going to cut it. After all, Python was designed around the idea of namespaces that let you hide away or show code to make writing, maintaining, and versioning code much easier. But to make use of these, we need to deploy the installational tools that Python provides. This is typically called "packaging". In this problem we will take the code you just made it and build it into a proper python package that can be installed and then used anywhere.

For more background and detail (and the most up-to-date recommendations) see the Python Packaging Guide.

3a: Set up a Python package structure for your code

First we adjust the structure of your code from Problem 1 to allow it to live in a package structure rather than as a stand-alone .py file. All you need to do is create a directory, move the code.py file into that directory, and add a file (can be empty) called __init__.py into the directory.

You'll have to pick a name for the package, which is usually the same as the repo name (although that's not strictly required).

Hint: don't forget to switch back to your code repo directory, if you are doing this immediately after Problem 2.


In [ ]:
!mkdir <yourpkgname>#complete
!git mv code.py <yourpkgname>#complete

In [ ]:
#The "touch" unix command simply creates an empty file if there isn't one already.  
#You could also use an editor to create an empty file if you prefer.

!touch <yourpkgname>/__init__.py#complete

3b: Test your package

You should now be able to import your package and the code inside it as though it were some installed package like numpy, astropy, pandas, etc.


In [ ]:
from <yourpkgname> import code#complete

#if your code.py has a function called `do_something` as in the example above, you can now run it like:
code.do_something()

3c: Apply packaging tricks

One of the nice things about packages is that they let you hide the implementation of some part of your code in one place while exposing a "cleaner" namespace to the users of your package. To see a (trivial) example, of this, lets pull a function from your code.py into the base namespace of the package. In the below make the __init__.py have one line: from .code import do_something. That places the do_something() function into the package's root namespace.


In [ ]:
%%file <yourpkgname>/__init__.py

#complete

Now the following should work.


In [ ]:
import <yourpkgname>#complete
<yourpkgname>.do_something()#complete

BUT you will probably get an error here. That's because Python is smart about imports: once it's imported a package once it won't re-import it later. Usually that saves time, but here it's a hassle. Fortunately, we can use the reload function to get around this:


In [ ]:
from importlib import reload  #not necessary on Py 2.x, where reload() is built-in

reload(<yourpkgname>)#complete
<yourpkgname>.do_something()#complete

3d: Create a setup.py file

Ok, that's great in a pinch, but what if you want your package to be available from other directories? If you open a new terminal somewhere else and try to import <yourpkgname> you'll see that it will fail, because Python doesn't know where to find your package. Fortunately, Python (both the language and the larger ecosystem) provide built-in tools to install packages. These are built around creating a setup.py script that controls installation of a python packages into a shared location on your machine. Essentially all Python packages are installed this way, even if it happens silently behind-the-scenes.

Below is a template bare-bones setup.py file. Fill it in with the relevant details for your package.


In [ ]:
%%file setup.py
#!/usr/bin/env python

from distutils.core import setup

setup(name='<yourpkgname>',
      version='0.1dev',
      description='<a description>',
      author='<your name>',
      author_email='<youremail>',
      packages=['<yourpkgname>'],
     ) #complete

3e: Build the package

Now you should be able to "build" the package. In complex packages this will involve more involved steps like linking against C or FORTRAN code, but for pure-python packages like yours, it simply involves filtering out some extraneous files and copying the essential pieces into a build directory.


In [ ]:
!python setup.py build

To test that it built sucessfully, the easiest thing to do is cd into the build/lib.X-Y-Z directory ("X-Y-Z" here is OS and machine-specific). Then you should be able to import <yourpkgname>. It's usually best to do this as a completely independent process in python. That way you can be sure you aren't accidentally using an old import as we saw above.


In [ ]:
%%sh

cd build/lib.X-Y-Z #complete
python -c "import <yourpkgname>;<yourpkgname>.do_something()" #complete

3f: Install the package

Alright, now that it looks like it's all working as expected, we can install the package. Note that if we do this willy-nilly, we'll end up with lots of packages, perhaps with the wrong versions, and it's easy to get confused about what's installed (there's no reliable uninstall command...) So before installing we first create a virtual environment using Anaconda, and install into that. If you don't have anaconda or a similar virtual environment scheme, you can just do python setup.py install. But just remember that this will be difficult to back out (hence the reason for Python environments in the first place!)


In [ ]:
%%sh

conda create -n test_<yourpkgname> anaconda #complete
source activate test_<yourpkgname> #complete
python setup.py install

Now we can try running the package from anywhere (not just the source code directory), as long as we're in the same environment that we installed the package in.


In [ ]:
%%sh

cd $HOME
source activate test_<yourpkgname> #complete
python -c "import <yourpkgname>;<yourpkgname>.do_something()" #complete

3g: Update the package on github

OK, it's now installable. You'll now want to make sure to update the github version to reflect these improvements. You'll need to add and commit all the files. You'll also want to update the README to instruct users that they should use python setup.py install to install the package.


In [ ]:
!git #complete

Problem 4: Publishing your package on (fake) PyPI

Now that your package can be installed by anyone who comes across it on github. But it tends to scare some people that they need to download the source code and know git to use your code. The Python Package Index (PyPI), combined with the pip tool (now standard in Python) provides a much simpler way to distribute code. Here we will publish your code to a testing version of PyPI.

4a: Create a PyPI account

First you'll need an account on PyPI to register new packages. Go to the testing PyPI, and register. You'll also need to supply your login details in the .pypirc directory in your home directory as shown below. (If it were the real PyPI you'd want to be more secure and not have your password in plain text. But for the testing server that's not really an issue.)

Note that if you've ever done something like this before and hence already have a .pypirc file, you might get unexpected results if you run this without moving/renaming the ond version temorarily.


In [ ]:
%%file -a ~/.pypirc

[distutils]
index-servers = pypi

[pypi]
repository = https://test.pypi.org/legacy/
username = <your user name goes here>
password = <your password goes here>

4b: Build a "source" version of your package

Use distutils to create the source distribution of your package.

Hint: You'll want to make sure your package version is something you want to release before executing the upload command. Released versions can't be duplicates of existing versions, and shouldn't end in "dev" or "b" or the like."


In [ ]:
!python setup.py sdist

Verify that there is a <yourpkg>-<version>.tar.gz file in the dist directory. It should have all of the source code necessary for your package.

4c: Upload your package to PyPI

Once you have an account on PyPI (or testPyPI in our case) you can upload your distributions to PyPI using twine. If this is your first time uploading a distribution for a new project, twine will handle registering the project automatically filling out the details you provided in your setup.py.


In [ ]:
!twine upload dist/<yourpackage>-<version>

If for some reason this fails (which does happen for unclear reasons on occasion), you can usually just directly upload the .tar.gz file from the web interface without too much trouble.

4d: Install your package with pip

The pip tool is a convenient way to install packages on PyPI. Again, we use Anaconda to create a testing environment to make sure everything worked correctly.

(Normally the -i wouldn't be necessary - we're using it here only because we're using the "testing" PyPI)


In [ ]:
%%sh

conda create -n test_pypi_<yourpkgname> anaconda #complete
source activate test_pypi_<yourpkgname> #complete
pip install -i https://testpypi.python.org/pypi <yourpkgname>

In [ ]:
%%sh

cd $HOME
source activate test_pypi_<yourpkgname> #complete
python -c "import <yourpkgname>;<yourpkgname>.do_something()" #complete

4e: have your neighbor try to install your package

Ask your neighbor to try to install your package just like you did above. Hopefully they'll get it to work right out of the box.

Hint: Don't forget to be nice to them! Always be nice to your users - it makes them want to be nice to your by contributing improvements or citations... Also, it's just good to be nice, period, dontcha think?

Challenge Problem: Use the Astropy package template

The above is all based on the assumption of a bare-bones package. In practice there's a lot of stuff you can add to a package that's convenient, but requires a variety of boilerplate setup or knowledge about tricks to get it all to work together. Astropy has created a "package template" that's meant to reduce the burden of this by providing a package without actual code but lots of "batteries included". Then you simply need to fill in your code and use the tools, instead of needing to set them up.

C1: Use the package template to package your already-made code

Try setting up a package using the astropy package template. Go to the Astropy affiliated package site, and follow the instructions at the bottom leading you to the package template and how to use it. Populate it with your code from above.

C2: Use the package template to compile the built-in Cython examples

For an extra challenge (and to see one of the reasons why it's useful), see if you can make the Cython example code work. Cython is a tool that lets you compile Python-like code into C, which can be orders-of-magnitude faster depending on how you design the code. It get be tricky to package correctly, though, and the affiliated package template gets rid of a lot of that pain.

The template comes with a simple example of a Cython code. Try to get it compiled and running.

C3: Use the package template to write your own Cython code

The sky's the limit here. Can you make some Cython code go faster than the numpy equivalent? Experiment as you wish, but revel in not needing to understand how to invoke Cython.