Final Project Workflow

There is a lot to do, and these are simply suggestions - nobody is expected to do all the suggested steps, and there may be alternative strategies you choose to implement. I suggest that you do most of your development in a Jupyter notebook, supplemented by a code editor if necessary (especially if you are writing C/C++), making use of Markdown cells to document what you are doing. That way, you only have to clean up and refine the notebook and you have a final project ready for submission.

Week 0

  • Choose paper
  • Identify algorithm to implement
  • Write abstract and outline of approach

Week 1

  • Code algorithm in Python
    • Write modular code
    • Functional core - use pure functions where possible
    • Imperative shell - minimize stateful code to interactions and I/O
  • Write tests to check correctness
    • Check boundary conditions
    • Are there known analytic/asymptotic solutions to compare against?
    • Are there other packages implementing the algorithm to compare against?
    • Are there alternative algorithms that should give the same answer?

Deadline for 1st progress report: 11th April 2017

Week 2

  • Profile for speed
    • Use cProfile and the prun magic
    • Identify performance bottlenecks
  • Optimize slow functions
    • Consider using line_profiler if necessary
    • Consider the following strategies:
      • More idiomatic Python
      • Cache results (e.g. lru_cache decorator)?
      • Better data structure?
      • Better algorithm?
      • Vectorize with numpy or pandas
      • Use a JIT compiler (e.g. numba)
      • USe Cython to recode function
      • Write C/C++ function and wrap for use in Python

Week 3

  • Write parallel code
    • Using Cython prange and openmp
    • Using threads
    • Using processes
  • Scaling for massive data sets
    • Using appropriate data storage (e.g. HDF5, databases)
    • Using pyspark for distributed computing
  • Re-run tests after optimization to check that output has not changed
  • Comparative analysis for each new version with time and timeit magic
  • Applications
    • Apply to simulated data sets
    • Apply to real data sets

Deadline for 2nd progress report: 22nd April 2017

Week 4

  • Packaging
    • Bundle code into a package for distribution
    • Provide instructions for installation on GitHub
    • Upload to Python Package Index if appropriate
  • Clean up work and documentation

Submission

  • Submit final project
    • As a Jupyter notebook or series of notebooks using literate programming
      • Generate PDF if possible
      • Use nbsphinx to convert to HTML if appropriate
    • As a LaTeX file, using make to automate document generation

Deadline for final report: May 1, 2017


In [ ]: