=========Introduction=========
Use this code to predict the percentage tip expected after a trip in NYC green taxi.
The code is a predictive model that was built and trained on top of the Gradient Boosting Classifer and the Random Forest Gradient both provided in scikit-learn
The input:
pandas.dataframe with columns:This should be in the same format as downloaded from the website
The data frame go through the following pipeline:
1. Cleaning
2. Creation of derived variables
3. Making predictions
The output:
pandas.Series, two files are saved on disk, submission.csv and cleaned_data.csv respectively.
To make predictions, run 'tip_predictor.make_predictions(data)', where data is any 2015 raw dataframe fresh from http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
Run tip_predictor.read_me() for further instructions