When most people imagine LSST, they envision this:
However, for most people LSST will really look like this:
Given where our field is headed, a straw person might argue that only 2 skills are necessary for success in the LSST era:
This strawperson may, in fact, not exist in nature, but if you looked hard enough I bet you could find someone that would argue the validity of this statement.
Indeed, these are precisely the skills that we are hoping to teach you at the DSFP.
Once you master them, you will all be fully practicioning data scientists.
But! This conclusion is missing a key ingredient:
Domain Knowledge is an essential ingredient for the data science practitioner.
To "prove" this is the case, let's consider some conclusions that would be derived from the LSST database without a working knowledge of astronomy (and the LSST detectors):
This apparent conclusion is due to the inverse-square law for flux, $f \propto r^{-2}$, combined with the sensitivity limit of the LSST detector. We know fainter galaxies do exist, but they are either too distant or intrinsically dim to be detected by LSST.
This apparent conclusion reflects the typical seeing at Cerro Pachon ($\sim$0.7 arcsec). Very nearby stars ($\theta < 0.3$ arcsec), cannot be resolved by LSST.
The Universe emits more light in the $r$-band than the $y$-band
(i.e., $\sum r_\mathrm{flux} > \sum y_\mathrm{flux}$).
[Red is the color of the Chicago Bulls, who had the greatest basketball player ever, Michael Jordan, so perhaps the Universe is trying to confirm something we already know...]
This apparent conclusion is a bit more subtle than the previous two, and there are multiple factors contributing to this incorrect assertion.* LSST will be far more sensitive in the $r$-band than the $y$-band (lower sky backgrounds and higher detector efficiency are the primary reasons). Thus, many red sources ($m_r - m_y > 0$) will only be detected in the $r$-band.
* Note - If you have a convincing argument that there is more $r$-band flux than $y$-band flux in the Universe let me know.
Domain knowledge (of both astrophysics and the full telescope system) will be an essential ingredient for success once LSST comes online. LSST will push the boundaries for the 3 Vs (volume, variety, and velocity) of data science for astronomy. Success in this era will require substantial working knowledge of both "hacking" and "stats/mathematical analysis", but progress will be impeded without a corresponding expertise in how the data were acquired and why the Universe produced those data in the first place.
Here's a true story...
The Imaginery Telescope has a diameter of 1 AU and it detects all wavelengths of the EM spectrum with 100% efficiency. It is revolutionary in it's design, and, as you might imagine, it will serve as a complete game changer for the field of astronomy.
Fundamentally, the thing we care about is measuring fluxes (and positions - those these two are related).
In principle, flux measurements are straight forward: count the number of photons per unit energy per unit time and you're done.
If you want to be more sensitive to faint fluxes increase the size of your telescope (this is why the Imaginery Telescope is so powerful...)
In practice, things are not this simple:
In practice, things are not this simple (con't):
In summary, we have taken something straightforward — counting — and, out of necessity, have made it far more complicated. We control all the elements of the system, however, and a variety of different measurements can correct for these issues (though this results in more challenging uncertainty estimates).
There is an important element that we cannot control:
The atmosphere really really complicates everything, making calibration a nightmare.
Turbulence distorts the signal, but clouds are the real pain. It's very difficult to measure the absolute attenuation of incident photons due to clouds.
Briefly, we all agree that there is a small handful of stars that are not variable, with precisely known flux. Then on nights that are "photometric" we observe these "standard stars" and the sources we care about, make some asumptions about atmospheric attenuation, and compare the relative counts in the detector for the standards and the sources we care about to determine the absolute flux for the sources we care about.
Break Out Problem 1
Given all these complications, how can one actually make any (informed) inferences about the universe?
Hint - think back to the previous session.
Solution to Break Out 1
Bayes!
We can write down a likelihood function that parameterizes everything (detector noise, atmospheric conditions, optical efficiency, time-dependent emission from the sources we care about), run a giant MCMC to integrate the posterior and marginalize over everything but the astronomical sources to answer any question we might ask of the observations.
But, but, but...
As previously noted, the velocity and volume of LSST observations are going to be enormous. There isn't enough computing power in the world to sample a posterior that accounts for every photon detected by LSST.
Break Out Problem 2
How long would it take to perform basic processing of all of LSST on your laptop?
The bare minimum for image processing includes bias (subtraction) and flat-field (division) corrections. Assume your laptop has a single 3 GHz processor that requires 1 tick to perform a single addition operation and 4 ticks to perform a single subtraction operation.
Solution to Break Out 2
$$\frac{3.2 \times 10^9 \,\mathrm{pix}}{\mathrm{FOV}} \times \frac{\mathrm{FOV}}{10 \, \mathrm{deg}^2} \times 20{,}000\, \mathrm{deg}^2 \\ \times \frac{5\,\mathrm{ticks}}{\mathrm{pix}} \times \frac{\mathrm{s}}{3 \times 10^9 \,\mathrm{ticks}} \times 1000\,\mathrm{obs} \approx 4 \,\mathrm{months}$$A more realistic solution to Break Out 2
Based on PTF, it takes $\sim$30 s to fully process (bias, flat-field, astrometry, photometry, image subtraction...) 1M pixels (much of this is tied up in I/O). Using the same numbers from the previous example, LSST will take $\sim$200 yr to process.
P.S. Fortunately we can parallelize these calculations.