Iris is a small standard multi-class classification data set from the UCI Machine Learning Repository.
In [ ]:
from __future__ import print_function
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
In [ ]:
local_filename = 'data/train.csv'
# Open file and print the first 3 lines
with open(local_filename) as fid:
for line in fid.readlines()[:3]:
print(line)
In [ ]:
data = pd.read_csv(local_filename)
In [ ]:
data.head()
In [ ]:
data.shape
In [ ]:
data.describe()
In [ ]:
data.hist(figsize=(10, 10), bins=50, layout=(3, 2));
In [ ]:
sns.pairplot(data);
It is important that you test your submission files before submitting them. For this we provide a unit test. Note that the test runs on your files in submissions/starting_kit
.
First pip install ramp-workflow
or install it from the github repo. Make sure that the python file classifier.py
is in the submissions/starting_kit
folder, and the data train.csv
and test.csv
are in data
. Then run
ramp_test_submission
If it runs and print training and test errors on each fold, then you can submit the code.
In [ ]:
!ramp_test_submission
Alternatively, load and execute rampwf.utils.testing.py
, and call assert_submission
. This may be useful if you would like to understand how we instantiate the workflow, the scores, the data connectors, and the cross validation scheme defined in problem.py
, and how we insert and train/test your submission.
In [ ]:
# %load https://raw.githubusercontent.com/paris-saclay-cds/ramp-workflow/master/rampwf/utils/testing.py
In [ ]:
# assert_submission()
Once you found a good classifier, you can submit it to ramp.studio. First, if it is your first time using RAMP, sign up, otherwise log in. Then find an open event on the particular problem, for example, the event iris_test for this RAMP. Sign up for the event. Both signups are controled by RAMP administrators, so there can be a delay between asking for signup and being able to submit.
Once your signup request is accepted, you can go to your sandbox and copy-paste (or upload) classifier.py
from submissions/starting_kit
. Save it, rename it, then submit it. The submission is trained and tested on our backend in the same way as ramp_test_submission
does it locally. While your submission is waiting in the queue and being trained, you can find it in the "New submissions (pending training)" table in my submissions. Once it is trained, you get a mail, and your submission shows up on the public leaderboard.
If there is an error (despite having tested your submission locally with ramp_test_submission
), it will show up in the "Failed submissions" table in my submissions. You can click on the error to see part of the trace.
After submission, do not forget to give credits to the previous submissions you reused or integrated into your submission.
The data set we use at the backend is usually different from what you find in the starting kit, so the score may be different.
The usual way to work with RAMP is to explore solutions, add feature transformations, select models, perhaps do some AutoML/hyperopt, etc., locally, and checking them with ramp_test_submission
. The script prints mean cross-validation scores
----------------------------
train acc = 0.62 ± 0.033
train err = 0.38 ± 0.033
train nll = 1.01 ± 0.378
train f1_70 = 0.5 ± 0.167
valid acc = 0.63 ± 0.06
valid err = 0.38 ± 0.06
valid nll = 1.41 ± 1.115
valid f1_70 = 0.5 ± 0.167
test acc = 0.55 ± 0.084
test err = 0.45 ± 0.084
test nll = 1.31 ± 0.858
test f1_70 = 0.4 ± 0.133
The official score in this RAMP (the first score column after "historical contributivity" on the leaderboard) is accuracy ("acc"), so the line that is relevant in the output of ramp_test_submission
is valid acc = 0.63 ± 0.06
. When the score is good enough, you can submit it at the RAMP.
You can find more information in the README of the ramp-workflow library.
Don't hesitate to contact us.