Machine learning data sets, benchmarks and competitions

Datasets

Most famous datasets:

List of datasets:

Sound data:

Benchmarks and competitions

Most famous competitions:

  • Kaggle : the most popular competition platform. Contains a large list of data sets too. More information on Wikipedia.

List of competition:

TODO (from http://www.chalearn.org/challenges.html):

  • Tunedit: Similar platform more academically oriented (phased out?).
  • DrivenData: For non-profit challenges.
  • Codalab: For academic challenges of greater complexity.
  • Beat: A EU sponsored platform.
  • Epidemium: challenges in epidemiology.
  • Pascal challenges: The Pascal network is sponsoring several challenges in Machine learning.
  • Challenges.gov: Challenges sponsored by the US Government.
  • Ecole Normale Superieure: Datasets and challenges.
  • Beaker notebook: Convert back and forth from R/Python/Javascript
  • Cortana Intelligence: Azure ML platform.
  • RAMP studio: The Paris-Saclay CDS Rapid Analytics Model Prototyping platform.
  • Synapse: The platform on which DREAM challenges are organized.

Collaborative platforms:

  • OpenML: share ML reusable frameworks.
  • MLcomp: compare machine learning programs.
  • E-lico: data mining portal.
  • H20: open source predictive analytics platform.
  • KNIME: Data mining platform.
  • Quantopian: Financial data simulator + ML tutorials.

Crowdsourcing:

  • Amazon Mechanical Turk: Gets you hire people from all around the world to solve your tasks. Used to label computer vision data.
  • Crowdflower: Hire people to collect, filter and enhance data.

International conferences hosting challenges:

  • WCCI: World congress on computational intelligence.
  • ICDAR: International Conference on Document Analysis and Recognition, a bi-annual conference proposing a contest in printed text recognition. Feature extraction/selection is a key component to win such a contest.
  • ICPR: In conjunction with the International Conference on Pattern Recognition, ICPR 2004, a face recognition contest is being organized.
  • ICMI: Competitions on multimodal interaction

Popular challenges:

  • NNGC: Neural Network Grand Challenge in time series forecasting.
  • Netflix: The 1 million dollar Netflix prize, which attracted a lot of attention and broke new grounds for recommender systems.
  • Robocup: Robots who play soccer, a yearly held contest.
  • DELVE: A platform developed at University of Torontoto benchmark machine learning algorithms.
  • CAMDA: Critical Assessment of Microarray Data Analysis, an annual conference on gene expression microarray data analysis. This conference includes a context with emphasis on gene selection, a special case of feature selection.
  • TREC: Text Retrieval conference, organized every year by NIST. The conference is organized around the result of a competition. Past winners have had to address feature extraction/selection effectively.
  • CASP: An important competition in protein structure prediction called Critical Assessment of Techniques for Protein Structure Prediction.
  • ICAPS competitions: Competitions in planning and knowledge engineering
  • MEDIAEVAL benchmarks: Benchmarking Initiative for Multimedia Evaluation. Data sharing in multimediacommons (with incremental annotations). Uses Amazon web services to allow experimentation in the cloud.
  • DREAM: Dialogue for Reverse Engineering Assessments and Methods. Challenges in gene network reconstruction.
  • AVEC: Audio visual Emotion Recognition Challenge and Workshop.
  • CAFA: Predicting function of biological macromolecules (as well as gene-disease associations).

Data resources:

  • KEEL: Knowledge Extraction based on Evolutionary Learning.
  • IO Data Science: Datasets of Paris-Saclay University.

Conferences

  • IJCAI (International Joint Conference on Artificial Intelligence) A*
  • ICML (International Conference on Machine Learning) A*
  • ECML (European Conference on Machine Learning) A
  • ACML (Asian Conference on Machine Learning)
  • NIPS (Advances in Neural Information Processing Systems) A*
  • COLT (Annual Conference on Computational Learning Theory) A*

  • ECAI (European Conference on Artificial Intelligence) A

  • ICONIP (International Conference on Neural Information Processing) A
  • IJCNN (IEEE International Joint Conference on Neural Networks) A