The Leading Edge

Machine learning contest 2016

Welcome to an experiment!

You mission, should you choose to accept it, is to make the best lithology prediction you can. We want you to try to beat the accuracy score Brendon Hall achieved in his Geophyscial Tutorial (TLE, October 2016).

First, read the open access tutorial by Brendon in the October issue of The Leading Edge.

Here's the text of that box again:

I hope you enjoyed this month's tutorial. It picks up on a recent wave of interest in artificial intelligence approaches to prediction. I love that Brendon shows how approachable the techniques are — the core part of the process only amounts to a few dozen lines of fairly readable Python code. All the tools are free and open source, it's just a matter of playing with them and learning a bit about data science.

In the blind test, Brendon's model achieves an accuracy of 43% with exact facies. We think the readers of this column can beat this — and we invite you to have a go. The repository at github.com/seg/2016-ml-contest contains everything you need to get started, including the data and Brendon's code. We invite you to find a friend or two or more, and have a go!

To participate, fork that repo, and add a directory for your own solution, naming it after your team. You can make pull requests with your contributions, which must be written in Python, R, or Julia. We'll run them against the blind well — the same one Brendon used in the article — and update the leaderboard. You can submit solutions as often as you like. We'll close the contest at 23:59 UT on 31 January 2017. There will be a goody bag of completely awesome and highly desirable prizes for whoever is at the top of the leaderboard when the dust settles. The full rules are in the repo.

Have fun with it, and good luck!

Now for the code

All the code and data to reproduce everything in that article is right here in this repository. You can read the code in a Jupyter Notebook here...

[**Facies_classification.ipynb**](Facies_classification.ipynb)

See the February issue of The Leading Edge for Matt Hall's user guide to the tutorials; it explains how to run a Jupyter Notebook.

See Running the notebook live (below) for information on running that noteobook live right now this minute in your web browser.

Entering the contest

Find a friend or two or ten (optional) and form a team.
To get a copy of the repo that you can make pull requests from (that is, notify us that you want to send us an entry to the contest), you need to fork the repo
Use Jupyter Notebook (to make our life easy!) with Python, R, or Julia kernels, or write scripts and put them in the repo in a directory named after your team.
When you have a good result, send it to us by making a pull request.
Everyone can see your entry. If you're not familiar with open source software, this might feel like a bug. It's not, it's a feature. If it's good, your contribution will improve others' results. Welcome to reproducible science!

Running the notebook live

To make it even easier to try machine learning for yourself, you can launch this notebook on mybinder.org and run it there. You can load the data, change the code, and do everything... except enter the contest. Everything on your mybinder.org machine is temporary. If you make something awesome, be sure to use File > Download as... > Notebook (.ipynb) to save it locally. Then you can fork the repo in GitHub, add your new notebook, and make your pull request.

Rules

We've never done anything like this before, so there's a good chance these rules will become clearer as we go. We aim to be fair at all times, and reserve the right to make judgment calls for dealing with unforeseen circumstances.

You must submit your result as code and we must be able to run your code.
The result we get with your code is the one that counts as your result.
To make it more likely that we can run it, your code must be written in Python or R or Julia.
The contest is over at 23:59:59 UT (i.e. midnight in London, UK) on 31 January 2017. Pull requests made aftetr that time won't be eligible for the contest.
If you can do even better with code you don't wish to share fully, that's really cool, nice work! But you can't enter it for the contest. We invite you to share your result through your blog or other channels... maybe a paper in The Leading Edge.
This document and documents it links to will be the channel for communication of the leading solution and everything else about the contest.
This document contains the rules. Our decision is final. No purchase necessary. Please exploit artificial intelligence responsibly.