Paris Saclay Center for Data Science

Test RAMP on iris

Balázs Kégl (LAL/CNRS)

Introduction

Iris is a small standard multi-class classification data set from the UCI Machine Learning Repository.


In [ ]:
from __future__ import print_function

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

Fetch the data and load it in pandas


In [ ]:
local_filename = 'data/train.csv'

# Open file and print the first 3 lines
with open(local_filename) as fid:
    for line in fid.readlines()[:3]:
        print(line)

In [ ]:
data = pd.read_csv(local_filename)

In [ ]:
data.head()

In [ ]:
data.shape

In [ ]:
data.describe()

In [ ]:
data.hist(figsize=(10, 10), bins=50, layout=(3, 2));

In [ ]:
sns.pairplot(data);

Local testing (before submission)

It is important that you test your submission files before submitting them. For this we provide a unit test. Note that the test runs on your files in submissions/starting_kit.

First pip install ramp-workflow or install it from the github repo. Make sure that the python file classifier.py is in the submissions/starting_kit folder, and the data train.csv and test.csv are in data. Then run

ramp_test_submission

If it runs and print training and test errors on each fold, then you can submit the code.


In [ ]:
!ramp_test_submission

Alternatively, load and execute rampwf.utils.testing.py, and call assert_submission. This may be useful if you would like to understand how we instantiate the workflow, the scores, the data connectors, and the cross validation scheme defined in problem.py, and how we insert and train/test your submission.


In [ ]:
# %load https://raw.githubusercontent.com/paris-saclay-cds/ramp-workflow/master/rampwf/utils/testing.py

In [ ]:
# assert_submission()

Submitting to ramp.studio

Once you found a good classifier, you can submit it to ramp.studio. First, if it is your first time using RAMP, sign up, otherwise log in. Then find an open event on the particular problem, for example, the event iris_test for this RAMP. Sign up for the event. Both signups are controled by RAMP administrators, so there can be a delay between asking for signup and being able to submit.

Once your signup request is accepted, you can go to your sandbox and copy-paste (or upload) classifier.py from submissions/starting_kit. Save it, rename it, then submit it. The submission is trained and tested on our backend in the same way as ramp_test_submission does it locally. While your submission is waiting in the queue and being trained, you can find it in the "New submissions (pending training)" table in my submissions. Once it is trained, you get a mail, and your submission shows up on the public leaderboard. If there is an error (despite having tested your submission locally with ramp_test_submission), it will show up in the "Failed submissions" table in my submissions. You can click on the error to see part of the trace.

After submission, do not forget to give credits to the previous submissions you reused or integrated into your submission.

The data set we use at the backend is usually different from what you find in the starting kit, so the score may be different.

The usual way to work with RAMP is to explore solutions, add feature transformations, select models, perhaps do some AutoML/hyperopt, etc., locally, and checking them with ramp_test_submission. The script prints mean cross-validation scores

----------------------------
train acc = 0.62 ± 0.033
train err = 0.38 ± 0.033
train nll = 1.01 ± 0.378
train f1_70 = 0.5 ± 0.167
valid acc = 0.63 ± 0.06
valid err = 0.38 ± 0.06
valid nll = 1.41 ± 1.115
valid f1_70 = 0.5 ± 0.167
test acc = 0.55 ± 0.084
test err = 0.45 ± 0.084
test nll = 1.31 ± 0.858
test f1_70 = 0.4 ± 0.133

The official score in this RAMP (the first score column after "historical contributivity" on the leaderboard) is accuracy ("acc"), so the line that is relevant in the output of ramp_test_submission is valid acc = 0.63 ± 0.06. When the score is good enough, you can submit it at the RAMP.

More information

You can find more information in the README of the ramp-workflow library.

Contact

Don't hesitate to contact us.