Final Project Workflow
There is a lot to do, and these are simply suggestions - nobody is expected to do all the suggested steps, and there may be alternative strategies you choose to implement. I suggest that you do most of your development in a Jupyter notebook, supplemented by a code editor if necessary (especially if you are writing C/C++), making use of Markdown cells to document what you are doing. That way, you only have to clean up and refine the notebook and you have a final project ready for submission.
Week 0
- Choose paper
- Identify algorithm to implement
- Write abstract and outline of approach
Week 1
- Code algorithm in Python
- Write modular code
- Functional core - use pure functions where possible
- Imperative shell - minimize stateful code to interactions and I/O
- Write tests to check correctness
- Check boundary conditions
- Are there known analytic/asymptotic solutions to compare against?
- Are there other packages implementing the algorithm to compare against?
- Are there alternative algorithms that should give the same answer?
Deadline for 1st progress report: 11th April 2017
Week 2
- Profile for speed
- Use cProfile and the
prun magic
- Identify performance bottlenecks
- Optimize slow functions
- Consider using
line_profiler if necessary
- Consider the following strategies:
- More idiomatic Python
- Cache results (e.g.
lru_cache decorator)?
- Better data structure?
- Better algorithm?
- Vectorize with
numpy or pandas
- Use a JIT compiler (e.g.
numba)
- USe
Cython to recode function
- Write C/C++ function and wrap for use in Python
Week 3
- Write parallel code
- Using Cython
prange and openmp
- Using threads
- Using processes
- Scaling for massive data sets
- Using appropriate data storage (e.g. HDF5, databases)
- Using
pyspark for distributed computing
- Re-run tests after optimization to check that output has not changed
- Comparative analysis for each new version with
time and timeit magic
- Applications
- Apply to simulated data sets
- Apply to real data sets
Deadline for 2nd progress report: 22nd April 2017
Week 4
- Packaging
- Bundle code into a package for distribution
- Provide instructions for installation on GitHub
- Upload to Python Package Index if appropriate
- Clean up work and documentation
Submission
- Submit final project
- As a
Jupyter notebook or series of notebooks using literate programming
- Generate PDF if possible
- Use
nbsphinx to convert to HTML if appropriate
- As a LaTeX file, using
make to automate document generation
Deadline for final report: May 1, 2017