A large majority of published results today are not reproducible. Two of my colleagues are preparing a longer course on the subject.
https://bitbucket.org/scilifelab-lts/reproducible_research_example
From a more restricted data science point of view, DevOps are less concerned with automating the production of software. For a researcher, the main interest is research reproducibility, but also data management and backup.
Editors for Python range from any simple raw text editor to the most complex IDE (integrated development environment).
In the first cathegory I reccommend Notepad and Notepad++ for Windows, Emacs for MacOS and Linux, and nano, vim, geany for Linux.
Among IDEs, Spyder is a simpler editor with an interface similar to Matlab and native integration of the IPython interpreter, and we will use that for the purpose of this class. A much more complex favorite of mine is PyCharm from JetBrains, that has a community edition. The one I use more frequently is Atom, built by the git community.
What matters:
Task:
Create a 'src' folder inside your working directory. Use a raw test editor to make a hello world program inside and run it on the command line. Now open the same file inside your favorite editor and run it inside the interpreter embedded into it.
Now write a function called hello_world() and load it here using a module call.
In [1]:
import sys
sys.path
Out[1]:
In [2]:
sys.path.append("/custom/path")
In [3]:
sys.path
Out[3]:
In [4]:
import myfancymodule
myfancymodule.hello_world()
Discussion:
Let us now add the sourced code to our own git repositories!
git init
git status
stage: Now make a change to your source code and run git status again. To tell Git to start tracking changes made to your file, we first need to add it to the staging area by using git add.
git add your_file
# git add .
# git log
# git reset your_file
git status
commit, checkout: Notice how Git says changes to be committed? The files listed here are in the Staging Area, and they are not in the repository yet. We could add or remove files from the stage before we store them in the repository. To store our staged changes we run the commit command with a message describing what we've changed. Files can be changed back to how they were at the last commit by using the command:
git commit -m "I modified the hello function"
# git checkout -- your_file
push, origin, master: To push a local repo to the GitHub server we'll need to add a remote repository. This command takes a remote name and a repository URL. The push command tells Git where to put our commits when we're ready, and now we're ready. So let's push our local changes to our origin repo (on GitHub).
The name of our remote is "origin" and the default local branch name is "master". The -u tells Git to remember the parameters, so that next time we can simply run git push and Git will know what to do. Go ahead and push it!
git remote add origin https://github.com/urreponame/urreponame.git
git push -u origin master
pull Let's pretend some time has passed. We've invited other people to our GitHub project who have pulled your changes, made their own commits, and pushed them. We can check for changes on our GitHub repository and pull down any new changes by running pull. Let's take a look at what is different from our last commit by using the git diff command. In this case we want the diff of our most recent commit, which we can refer to using the HEAD pointer. diff can also be used for files newly staged.
git pull origin master
git diff HEAD
# git diff --staged
In [ ]: