Organization -- Exercise 02

Creating a Project directory template (30 min)

At the lecture, we discussed how a good directory structure can greatly help with reproducibility. At this exercise we'll create a template structure which you could easily use in your future projects.

As already mentioned, there isn't one structure that would fit all workloads, so it is important to choose one that fits you.

Please read the entire document before you start working on the tasks

Tasks

To complete this exercise, you need to do the following

  • Deside on a directory structure that you'll be using for your projects
    • You can use similar to the ones shown bellow
  • Create a new GitHub repository on your account to store this structure.
    • note: Give the repo a "descriptive" name, such as project-template
  • Add your chosen directory structure to your GitHub repo
  • Create one more new repository, by importing your project-template repository.
    • note: We will need this repository for the next parts of the workshop, so don't omit this step.
  • Clone the newly created repository to your Virtual Machine
Note: Your Directory structure should contain at least a top level README
  • Also, you will necessarily need to include README files to subdirectories, to be able to commit them, since git doesn't allow you to commit empty directories

End Results

After completing the exercise you should have:

  • A GitHub repository, with a directory structure that you can import to new projects that you create
  • A project repository, based on the created template
  • A clone of the repository on the VM

Notes

Why READMEs

Every project should describe to users what the purpose of the project is. This is commonly done in a README file. As the starting point for a project the README file is formatted as plain text (or markdown) to make it easily readable. A README file should include the following information:

  • The project name
  • The date the README was created
  • Contact information for the person(s) who maintains the project
  • Three or four sentences about the goal of the project
  • If the project uses data from an external source, where the data is from

Think about the beginning of this lesson, when we had nothing but a file with a name. These are the things that would have made it easy to make sense of that data.

So, before we make any modifications to the raw data, we need a practice for how to record the initial state of the data, as well as our modifications.

Adding a Top Level README

  • Your top level README should be contain the following:
Project name
Today's date
Maintainer's contact info
Data Origin
3-4 sentences about the goal of the project
  • Save as README in the project directory.

This file serves as the starting point for future you, or anyone who receives this data.

Adding a README in a Subdirectory

README files in subdirectories are a good idea too. Often there are many files, and it's distracting to fill the top-level README with details about smaller pieces of the project.

  • For raw data directories, you should include the location (e.g URL) where the file was retrieved or generated.
  • For modified data directories, you should include the exact tools and steps used to modify the data, along with dates
  • For other directories like code or documentation, the README should communicate what the directories contain.

Keeping the READMEs up-to-date

  • Dates are good on file names here
  • When changing something in a directory, you should add a line at the README

Self-documenting Projects

READMEs are commentary on what we consider the "real work", and realistically can be an afterthought. We've all had projects under a deadline or someone asking for a result, and the documentation step is easy to defer until later.

Later never comes, or we forget the details by the time it does. So another good practice is to use good, descriptive names on files, directories, and in code. These are for our benefit, not the computer.

Possible Directory Structures

  1.  .
     ├── code
     │   └── README.md
     ├── data
     │   ├── clean
     │   │   └── README.md
     │   ├── raw
     │   │   └── README.md
     │   └── README.md
     ├── doc
     │   ├── paper
     │   │   └── README.md
     │   └── README.md
     ├── README.md
     ├── results
     │   ├── figures
     │   │   └── README.md
     │   ├── pictures
     │   │   └── README.md
     │   └── README.md
     └── scratch
         └── README.md
  2.  .
     ├── code
     │   ├── notebooks
     │   │   └── README.md
     │   ├── README.md
     │   └── scripts
     │       └── README.md
     ├── data
     │   ├── clean
     │   │   └── README.md
     │   ├── raw
     │   │   └── README.md
     │   └── README.md
     ├── output
     │   ├── figures
     │   │   └── README.md
     │   ├── manuscript
     │   │   └── README.md
     │   ├── pictures
     │   │   └── README.md
     │   ├── README.md
     │   └── tables
     ├── README.md
     └── scratch
         └── README.md