This document explains how the judging will be done for the "Best Classification" system.
Read the Step 1: Get Data tutorial for information on where to get the training and test data sets.
Your job is to create a CSV file that holds the scores for each of the signals found in the test data set.
Each line of your CSV file "Scorecard", will contain the signal's UUID, followed by scores for each signal class. The order of the scores is absolutely critical.
The signal class with the highest score would be your model's class estimation. Typically these scores are the probability estimates for each class.
For each data file in the test set, generate the appropriate spectrogram, and then pass that to your signal classifier (machine-learning model) to calculate the scores for each class.
For example, each line your the CSV scorecard file should look something like:
abdefgadbc1223234123123cvaf, 0.1, 0.023, 0.451, 0.232, 0.001, 0.07, 0.0083
THE ORDER OF THE SCORES IN EACH ROW OF YOUR CSV FILE MUST BE:
brightpixel, narrowband, narrowbanddrd, noise, squarepulsednarrowband, squiggle, squigglesquarepulsednarrowband
This is in alphabetical order.
In [ ]:
import ibmseti
import csv
import ALL LIBS FOR YOUR CLASSIFIER (tensorflow, Watson, etc)
my_model #this is your trained signal classification model
my_output_results = mydatafolder + '/signal_class_results.csv'
zz = zipfile.ZipFile(mydatafolder + '/' + 'primary_testset_preview_v3.zip')
for fn in zz.namelist():
data = zz.open(fn).read()
aca = ibmseti.compamp.SimCompamp(data)
uuid = aca.header()['uuid']
spectrogram = draw_spectrogram(aca) #whatever signal processing code you need would go in your `draw_spectrogram` code
#cr = class results. In this example, it's a dictionary. But in your experience it could be something else
# like a simple list.
cr = my_model.classify(spectrogram)
with open(my_output_results, 'a') as csvfile:
fwriter = csv.writer(csvfile, delimiter=',')
fwrite.writerow([uuid, cr['brightpixel'], cr['narrowband'], cr['narrowbanddrd'],
cr['noise'], cr['squarepulsednarrowband'], cr['squiggle'],
cr['squigglesquarepulsednarrowband']
])
The Scoreboard for the Preview Test is here.. You can submit up to 10 scorecards to this scoreboard. Also, the Preview test set UUID,class labels are now found in the results
folder of this repository. The results
folder also contains some code that you can use to score your own scorecard. (This means you can submit a perfect score to the Preview scoreboard -- but please don't do this. Your score will be deleted.)
The Scoreboard for the Final Test is here.. You can submit only one scorecard to this scoreboard. The UUID,class labels for the final test set will not be published and you can use this as a final test of your model.
Please read this walkthrough to sign up for the Scoreboard system, form your team, and submit an example result. An example scorecard is found below.
The scores in this example file are random values between 0 and 1. Typically, your scores will be your classification model's estimate of probability for each class. (As such, they would sum to 1.0. However, to be sure, the Log-loss calculator in our Scoreboard will normalize your score to ensure the values sum to 1.0.)
Example Preview Test Set Scorecard
With this scorecard, you should get the exact same values as "TeamRandom" that is currently on the Preview Scoreboard.
In this contest we are using the Log-Loss function as a measure of your model's performance.
If you are running your analysis on IBM DSX (IBM Apache Spark), you'll need to get your .csv
file to a local machine in order to submit your results to the Scoreboard.
One way is to move your .csv
file to your Object Storage account that is provided in DSX.
This tutorial shows you the basic steps to move data to and from your Object Storage instance.
Then, from DSX (or Bluemix), navigate to your Object Storage container and you can download the file to your local machine with a click.
Another good option is to use Pixiedust. Among the many features of Pixiedust, you can load a .csv
file into a Pandas or Spark DataFrame and Pixiedust will display that data in your Jupyter notebook. From the display, there is an icon that lets you download the data directly. This is probably the easier option, though you will need to "pip install --user pixiedust
" and restart your kernel.
In [ ]: