For the final project, you will need to implement a "new" statistical algorithm in Python from the research literature and write a "paper" describing the algorithm.
The paper should have the following:
Should be consise and informative.
250 words or less. Identify 4-6 key phrases.
State the research paper you are using. Describe the concept of the algorithm and why it is interesting and/or useful. If appropriate, describe the mathematical basis of the algorithm. Some potential topics for the backgorund include:
First, explain in plain English what the algorithm does. Then describes the details of the algorihtm, using mathematical equations or pseudocode as appropriate.
First implement the algorithm using plain Python in a straightforward way from the description of the algorihtm. Then profile and optimize it using one or more apporpiate mathods, such as:
Document the improvemnt in performance with the optimizations performed.
Are there specific inputs that give known outuputs (e.g. there might be closed form solutions for special input cases)? How does the algorithm perform on these?
If no such input cases are available (or in addition to such input cases), how does the algorithm perform on simulated data sets for which you know the "truth"?
Test the algorithm on the real-world examples in the orignal paper if possible. Try to find at least one other real-world data set not in the original paper and test it on that. Describe and interpret the results.
Find two other algorihtms that addresss a similar problem. Perform a comparison - for example, of accurary or speed. You can use native libraires of the other algorithms - you do not need to code them yourself. Comment on your observations.
Your thoughts on the algorithm. Does it fulfill a particular need? How could it be generalized to other problem domains? What are its limiations and how could it be improved further?
Make sure you cite your sources.
The code should be in a public GitHub repository with:
The package should be downloadable and installable with python setup.py install, or even posted to PyPI adn installable with pip install package.
Each item is worth 10 points, but some sections will give up to 10 bonus points if done really well. Note that the "difficulty factor" of the chosen algorithm will be factored into the grading.
In [ ]: