Using Python for your work: Best practices

Since we didn't do any larger projects I wanted to go over some tips for managing projects. These apply to anything that you want to keep around forever, but not necessarily when you just want to mess around with Python. These are tips for one-off projects like analyzing some data or running an experiment, but not necessarily making a library (i.e., code that you will rely on for many other projects). I have different recommendations for making a library and will be happy to share with you some of what I've learned if you ever go down that route. But for now--

There are really two goals here. You want to be confident that if you want to come back to your project in 10 years, for whatever reason, you will still have access to it. And more difficult, you want to be confident that you will still be able to get it to work, and to have it work exactly like it did originally, even if you're on a completely different computer.

Version control: Use it

It doesn't need to be more than a better DropBox for it to be useful for you. If you're working by yourself, all you'll be doing is hg commit --message 'MESSAGE' and at the end of the day, hg push. If you work on a different machine, hg pull URL. It's probaly even easier with TortoiseHG.

Upload your project to BitBucket. Create a repo on the website and it will walk you through how to link it to your computer.

(Bonus goal: In 10 years, you might be able to use the commit history to help remember why you made a certain change.)

Conda environments: Make them, throw them away, make new ones

In this workshop we only used one conda environment (I had you call it python3) so the utility of this might not be so clear. But making new environments is really important. My advice is to have one, or a few, environments for messing around in (i.e., python3). Then, create a new environment for every single new project that you want to be able to reliably reproduce results from. This might seem like a lot, but conda environments are cheap and use very little extra space (libraries are linked, not copied).

One thing to note is that conda is technically Python-agnostic--it can be used to make any kind of environment so you have to specifically tell it if you want Python in your environment. To create new environments, do something like

conda create -n project_name python=3 pip

This will get you an environment with Python and pip, so you'll be able to install packages that aren't in conda.

Also remember that environments are just folders, in your Anaconda or Miniconda directory (/envs). So if something is going wrong with your environment and you can't figure it out, you might as well try making a new one and re-installing its packages. It won't take more than a few seconds (except for pip libraries) and it might fix your problem. Then you can delete the folder with your broken environment; you can even rename the new folder to get back the previous name.

Keep a list of the packages you use in your project

This way, not only will you be able to make things work in 10 years, but others will be able to reproduce your results. This is a distinct advantage of open-source--even after all these scientific Python packages are no longer being maintained, you will be able to find all old versions on the internet. Whereas if Mathworks ever goes out of business and takes down the File Exchange, good luck finding all those packages you downloaded on your old computer, or for that matter old versions of Matlab that you might need to make things work.

I keep two lists, packages I can install with conda, and those I have to use pip for. The file are line-by-line, and can speficiy versions as well as packages, like this:

pandas=0.14.1

(the pip format uses two equal signs). Version-control these files!

If you're catching up with this, you can use conda list and pip list to see what you've already installed.

Later, if you're reinstalling your environment you can easily get back to your previous state:

conda install --file conda-packages.txt
pip install -r pip-packages.txt