Pre-Class: Reproducible Computational Research

Part 1: Introduction

The concept of reproducible computational research is that any computational result you generate, such as numbers, figures, tables, etc. can be re-generated with minimal effort by other people and yourself. The purpose of this lecture is to show you why reproducibility is necessary in computational research and how to perform reproducible research.

There are two papers talking about this idea. You can download them from CoCalc.

  1. Reproducible Research in Computational Science by Roger D. Peng
  2. Ten Simple Rules for Reproducible Computational Research by Sandve et. al.

Question 1: Why is it important to make your research reproducible?


In [0]:

Question 2: Imagine that you just finished a computational project and submitted the paper to a journal. What can you do to make sure that other people and yourself can reproduce your results? (Give at least two examples)


In [0]:

What is a program?

A program is basically a set of instructions that tells the computer how to perform a task. You can think of a program a bit like a recipe: it gives step-by-step instructions on how to combine data and functions (ingredients) to achieve the desired end result (a tasty cake).

The words "program" and "script" can often be used interchangeably, though "script" is generally used to describe quicker, simpler programs.

What is a programming language?

A program must be written in a language the computer can understand -- but computers only understand binary (0's and 1's)! Writing code in binary is hard for humans and takes a very long time.

Programming languages were created to help translate between a language that humans are comfortable with to something a computer can actually understand. Programming languages usually consist of common English words, such as "print", "and", "or", "if", etc, which humans can understand very easily. Then, under the hood, the programming language translates that code into the corresponding strings of 0's and 1's using an "interpreter" or "compiler" (depending on the language). This translated code is what is actually run by the computer.

There are a huge variety of programming languages out there. They all ultimately do the same thing -- give humans an easier way of communicating with the computer -- but the exact types of "words" and "grammar" that the language allows can differ quite widely. In this course we will focus on R and Python.

Other resources

There are serveral great courses in Coursera teaching reproducible research. For those who are interested in this topic, you can go to this course