Machine Learning in Astronomy

A wide range of powerful "machine learning" algorithms have been developed by computer scientists and statisticians, in an attempt to automate the analysis of data, especially large datasets.

Machine learning has emerged from the world of engineering, where costly decisions have to be made: here, accuracy of prediction is at a premium, and academic understanding of the system is of secondary importance.

When will machine learning be most useful?
- When you suspect the data can provide more information about the uncertainties than you can via your physical assumptions;
- When the size and/or complexity of the data means that simple physical models for it are unlikely to yield meaningful results;
- When you want to go one further in your initial data exploration and visualization (we didn't get into unsupervised machine learning here, but exploration is a good way of thinking about this class of methods).
- When you need to make reliable predictions (think: photo-z's) or decisions (think: quasar target selection) based on large and or complex data. Much of industrial data science is occupied with this activity.

When might more thought be required, in an astrophysical context?
- When you bring significant prior physical knowledge to the problem. Can you adapt a machine learning algorithm to accommodate your prior knowledge?
- When the uncertainty on an inference (or prediction) is as important as the parameter value itself. Can you unpack the hidden assumptions in a machine learning algorithm, and then make sure the uncertainties correctly propagate the information you have?

Machine learning in astronomy is a growing field: there are some very interesting opportunities here!