Procedural Python for Reproducibility

The goal today is to take some of the ideas we developed last week, and do a couple things to make our lives easier:

  • Define standalone functions which accomplish tasks that we would like to do repeatedly
  • Put these functions in a place where we can easily use them without copy-pasting repeatedly
  • Think about how to make our analysis reproducible – both for the sake of our future selves, and for the sake of anyone who wants to replicate and/or build on our work.

To that end, we are going to work together to do the following tasks:

  1. Write a function which will download the Pronto data and the weather data

  2. Write two functions which, given the downloaded data, will load it, parse dates properly, and return a pandas array.

  3. Write a function which will group and join the trip and weather data into a single DataFrame, making use of the above functions.

  4. Develop some plots showing relationships in the data, and write a function which will create and save plots related to your analysis.

    • Number of rides per day over the course of the year (day-pass and annual members)
    • Number of rides per hour over the course of the day (day-pass and annual members)
    • Number of rides per day as a function of temperature (day-pass and annual members)
  5. Write a master script that you – or anyone – can run, which will produce your analysis from scratch.

Today during the class time we will walk through accomplishing these tasks together.