Special project for the Certificate of Data Science at Georgetown University School of Continuing Studies, Cohort 11
Purpose: To predict whether someone is likely to misuse drugs based on personality factors and their use of legal substances such as caffeine and alcohol
Project Author: Melissa Burn, GU Data Science Certificate Cohort 11
Release Date: 8 July 2018
Methodology: This Jupyter Notebook ingests data from the UCI ML dataset repository, evaluates the data in hand, wrangles it to suit the project and to fit Scikit-Learn classification models, tests three different algorithms with some visualization of the results, and chooses the model that seems to give the best results.
Required Input:
Predictive Output:
Intended Users: The curious and casual
The training dataset comes from a 2015 UK study examining the relationship between demographic and personality factors, such as age when dropping out of school, extraversion and impulsivity, and use of recreational drugs, legal or illegal. The study dataset includes 1885 instances and 32 features.
Downloaded from the UCI Machine Learning Repository in June 2018. The dataset is as follows:
In creating this Notebook, I made heavy use of the XBUS-505 Wheat Classification and Census Notebooks provided by Benjamin Bengfort, several Notebooks for XBUS-506 provided by Dr. Rebecca Bilbro, contributions from my Capstone team mate Mike Iapalucci, and code borrowed from jhboyle's '1984_Congressional_Voting_Classification' (submitted as an example in a previous cohort).
E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan and A. N. Gorban, "The Five Factor Model of personality and evaluation of drug consumption risk.," arXiv [https://arxiv.org/abs/1506.06297], 2015
In [ ]: