Photons != Science,
and the Challenges of Turning the Former into the Later


Adam A Miller
(CIERA/Northwestern/Adler)

LSSTC DSFP Session 5, 22 Jan 2018

Introduction


Session 5 is focused on imaging processing, and, informally attempts to answer the question: what happens between the glass and the database?

When most people imagine LSST, they envision this:

(credit: Kavli foundation)

However, for most people LSST will really look like this:

(credit: PS1 casjobs)

Given where our field is headed, a straw person might argue that only 2 skills are necessary for success in the LSST era:

  • Programming skills (Python, SQL, etc)
  • Statistical knowledge (machine learning, Bayes, etc)

This strawperson may, in fact, not exist in nature, but if you looked hard enough I bet you could find someone that would argue the validity of this statement.

Indeed, these are precisely the skills that we are hoping to teach you at the DSFP.

Once you master them, you will all be fully practicioning data scientists.

But! This conclusion is missing a key ingredient:

(credit: Drew Conway)

Domain Knowledge is an essential ingredient for the data science practitioner.

To "prove" this is the case, let's consider some conclusions that would be derived from the LSST database without a working knowledge of astronomy (and the LSST detectors):

Incorrect Conclusion #1

There are no galaxies fainter than $i \approx 27.5 \, \mathrm{mag}$.

[Perhaps this signals the edge of the universe...]

This apparent conclusion is due to the inverse-square law for flux, $f \propto r^{-2}$, combined with the sensitivity limit of the LSST detector. We know fainter galaxies do exist, but they are either too distant or intrinsically dim to be detected by LSST.

Incorrect Conclusion #2

Two stars cannot be closer than $\sim$0.35 arcsec in the sky.

[Perhaps there is some repulsive force between stars that keeps them separated...]

This apparent conclusion reflects the typical seeing at Cerro Pachon ($\sim$0.7 arcsec). Very nearby stars ($\theta < 0.3$ arcsec), cannot be resolved by LSST.

Incorrect Conclusion #3

The Universe emits more light in the $r$-band than the $y$-band
(i.e., $\sum r_\mathrm{flux} > \sum y_\mathrm{flux}$).

[Red is the color of the Chicago Bulls, who had the greatest basketball player ever, Michael Jordan, so perhaps the Universe is trying to confirm something we already know...]

This apparent conclusion is a bit more subtle than the previous two, and there are multiple factors contributing to this incorrect assertion.* LSST will be far more sensitive in the $r$-band than the $y$-band (lower sky backgrounds and higher detector efficiency are the primary reasons). Thus, many red sources ($m_r - m_y > 0$) will only be detected in the $r$-band.

* Note - If you have a convincing argument that there is more $r$-band flux than $y$-band flux in the Universe let me know.

Upshot

Domain knowledge (of both astrophysics and the full telescope system) will be an essential ingredient for success once LSST comes online. LSST will push the boundaries for the 3 Vs (volume, variety, and velocity) of data science for astronomy. Success in this era will require substantial working knowledge of both "hacking" and "stats/mathematical analysis", but progress will be impeded without a corresponding expertise in how the data were acquired and why the Universe produced those data in the first place.

Telescopes






Here's a true story...

The Imaginery Telescope has a diameter of 1 AU and it detects all wavelengths of the EM spectrum with 100% efficiency. It is revolutionary in it's design, and, as you might imagine, it will serve as a complete game changer for the field of astronomy.

Fundamentally, the thing we care about is measuring fluxes (and positions - those these two are related).

In principle, flux measurements are straight forward: count the number of photons per unit energy per unit time and you're done.

If you want to be more sensitive to faint fluxes increase the size of your telescope (this is why the Imaginery Telescope is so powerful...)

In practice, things are not this simple:

  • telescope's optical elements are not 100% efficient
    (we can measure inefficiencies and correct them $\rightarrow$ complicates the uncertainties beyond Poisson)
  • our detectors introduce noise to our measurements
  • detectors eventually stop counting photons
    (saturation)

In practice, things are not this simple (con't):

  • cannot measure absolute position of photons
    (Heisenberg)
  • further complicated by pixelated detectors
    (cannot measure continuous distribution)
  • shutter opening and closing produces a variable exposure time across focal plane

In summary, we have taken something straightforward — counting — and, out of necessity, have made it far more complicated. We control all the elements of the system, however, and a variety of different measurements can correct for these issues (though this results in more challenging uncertainty estimates).

There is an important element that we cannot control:

The atmosphere really really complicates everything, making calibration a nightmare.

Turbulence distorts the signal, but clouds are the real pain. It's very difficult to measure the absolute attenuation of incident photons due to clouds.

Briefly, we all agree that there is a small handful of stars that are not variable, with precisely known flux. Then on nights that are "photometric" we observe these "standard stars" and the sources we care about, make some asumptions about atmospheric attenuation, and compare the relative counts in the detector for the standards and the sources we care about to determine the absolute flux for the sources we care about.

Break Out Problem 1

Given all these complications, how can one actually make any (informed) inferences about the universe?

Hint - think back to the previous session.

Solution to Break Out 1

write your solution here

But, but, but...

Speed Matters






As previously noted, the velocity and volume of LSST observations are going to be enormous. There isn't enough computing power in the world to sample a posterior that accounts for every photon detected by LSST.

Break Out Problem 2

How long would it take to perform basic processing of all of LSST on your laptop?

The bare minimum for image processing includes bias (subtraction) and flat-field (division) corrections. Assume your laptop has a single 3 GHz processor that requires 1 tick to perform a single addition operation and 4 ticks to perform a single subtraction operation.

Solution to Break Out 2

write your solution here

Conclusions

Most astronomers will only "know" LSST via the database.

Domain knowledge will nevertheless be vitally important.

A lot of complicated analysis happens between the glass and the database.