Summarizing the excitement for machine learning with one figure:
So... what is it?
Bishop's text book provides a useful 4 word summary:
Briefly, machine learning algorithms build highly complex, non-linear models (i.e., these things are black boxes) to map outcomes from input information.
Ultimately, machine learning is fundamentally concerned with classification (i.e. predicting outcomes).
As astronomers and astrophysicists –– this is great news!
The history of astronomy is a long story of classification. Essentially, point a telescope at some new location in the sky, and there is a decent chance you might find something that has never been seen before. Now, figure out how this new thing relates to all the things that you already know (classification).
Astronomy is (and has been) an observational/experimental led field. The typical pattern is obsservers find some weird thing, and then the theorists try to explain what is going on (there are obviously exceptions, predictions for kilonovae prior to the LIGO detection of a neutron star-neutron star merger are a recent example).
This makes us very different from physics, where theory generates predictions that then lead the observations (e.g., Higgs boson, general relativity, etc.).
Thus, if machine learning fundamentally is about classification, and astronomers spend all their time classifying objects, that must mean that machine learning and astronomy are a match made in heaven.
Right?
This is where I say – not so fast.$^\dagger$
Even though astronomy is an observationally-led, classification-concerned field, ultimately, like physicists, we care about the development of a physical understanding of how the Universe works. And this is not what machine learning is fundamentally built to do.
$^\dagger$ A long list of people would dispute this assertion.
In other words,
machine learning $\longleftrightarrow$ prediction
astronomy $\longleftrightarrow$ inference
And thus, astronomy and machine learning may not be a match made in heaven.
Adam's 1 slide summary of supervised machine learning:
True positive (TP) = + classified as +
False positive (FP) = - classified as + (type I error)
True negative (TN) = - classified as -
False negative (FN) = + classified as - (type II error)
Take a few minutes to discuss with your partner
Ultimately, this depends on the problem at hand. If you are building a model to detect cancer, false negatives are really really bad. If you are building a model to find extremely metal poor stars, and then you obtain a 10 hr spectrum on a 10 m telescope to confirm your candidates, false positives are really really bad.
From TP, FP, TN, and FN, it is possible to calculate several useful metrics for evaluating your model:
Accuracy = (TP + TN)/(TP + FP + TN + FN)
True Positive Rate (TPR) = TP/(TP + FN)
False Positive Rate (FPR) = FP/(TN + FP)
By varying the classification threshold, it is possible to determine the TPR as a function of FPR, also known as the Receiver Operating Characteristic (ROC) curve
Precision = TP/(TP + FP)
This is an incredibly useful metric for astronmical applications, because it informs the degree of loss when obtaining follow up observations (follow up is expensive, e.g., we can only obtain spectra for a small fraction of LSST objects - how to choose which ones to observe?)
Cross validation (CV) – method to estimate any of the above metrics using the training set alone
Basic idea for $k$-fold CV: remove 1/$k$ from training set, train model on remaining data and predict labels for the remaining 1/$k$. Repeat $k$ times.
Enables unique predictions for every training set source.
CV enables the construction of the confusion matrix:
Furthermore, nearly every survey is guaranteed to have a biased training set. Any new survey is likely to be observing the sky in some unique fashion, and thus probing parameter space in some new way. This new survey mode will therefore find sources that were not present in the previous survey.
For example, a new survey that goes 1.5 mag deeper than the previous survey will find a lot of galaxies at higher redshifts. Furthermore, at fixed redshift, the new survey will find more intrinsically faint galaxies. These high redshift and intrinsically faint galaxies, will not have counterparts in a training set based on the previous survey, and therefore they will be classified incorrectly.
This is known as sample selection bias and it is nasty.
Based on very sound theoretical reasoning, you expect to find the following in a sample of 100 stars: 60 orange stars, 30 purple stars, and 10 grey stars.
Data from another survey includes 1000 orange stars, 200 purple stars, and 14 grey stars.
Furthermore, 14 of the orange stars, 7 of the purple stars, and 5 of the grey stars have features that are missing.
You need to build a model to classify stars as either orange, purple, or grey. What do you do?
Take a few minutes to discuss with your partner
There is no correct answer here. Given what we know, I'd do the following:
Machine learning algorithms are extremely powerful (next generation may not learn to drive)
Machine learning = prediction; astronomy = inference (be caseful about equating the two)
All astronomical training sets are biased – very difficult to properly interpret (some) predictions as a result