NOTE: This assumes that RISE is installed. If so, just click the bar chart in the toolbar above, and the presentation should begin. Note that in edit mode, you can easily change some slide types of cells below, with

shift-i : toggle slide
shift-b : toggle subslide
shift-g : toggle fragment

Gravitational-wave astronomy with real data

Matched filtering

Introduction

Gravitational waves (GWs) are like sound waves
- GW medium is spacetime itself (no matter needed)
- Transverse rather than longitudinal

LIGO is an enormous microphone
- We could actually hear (very loud) GWs with our own ears
- We can hear LIGO data

But there’s a lot of noise
- Earthquakes, storms, logging, traffic, shotguns

We need a good (and quantitative) way of digging signal out of the noise

Outline

Overtly:

Sounds of gravitational waves
Sounds of LIGO
FFTs
Matched filtering

Covertly:

Data analysis
Python
Jupyter notebook (Jupyterlab, in the future)
Github

The ostensible purpose of this talk is to introduce you to matched filtering, which is the basic method that GW detectors use in searching for and measuring GW signals. But that's a pretty narrow purpose, and most of you will not get involved in GWs. So I want to also give you some exposure to a few other ideas that hopefully will have more broad application to all of you when you go into other fields. And we'll use matched-filtering as a way into those other ideas.

So ostensibly, the outline of this activity starts off with introducing you to the sounds of GWs. I'll make this analogy that LIGO is just an extraordinary microphone, and we'll listen to the sounds a GW makes, and the sounds of the LIGO instrument itself. Then, we'll see that FFTs are a really powerful way of analyzing these sounds, and matched filtering is a really sensitive way of measuring those FFTs.

But of course, while we're doing that, I also want to give you a little flavor of data analysis. Pretty much all of you either are working on or will work on data analysis at some point, and there are some very general rules and ideas that can be applied to basically any type of data analysis. So I'll want to use this stuff as a sort of analogy for other types of data analysis, so hopefully you can apply these principles to your own work.

Jupyter notebooks

Like Mathematica, Maple, Matlab, ... but better
Runs a live session of python (or basically any other language)
Manipulate files, write code, interact with data, make plots, take notes, give presentations, etc.

So first, I just want to introduce how we're working here. Who here has used python before?

Python is really dynamic, and powerful, but also a lot simpler than most other languages. It's not always the fastest at any computation, but since most of your time is spent writing programs (rather than running them), that's not usually a big problem. And new developments are making python just as fast as even C/C++ in a lot of cases.

Now, we throw in the Jupyter notebook. Who here has used Mathematica before?

Well the Jupyter notebook looks and acts like a nice version of Mathematica. The notebook is connected to a live session of python. It has these code cells that you run, and you can see the results. So click on the first cell, and hit Shift+Enter.

Mathematica is better at symbolic math (for now). But otherwise, python is more useful and general. And the Jupyter notebook makes it better at interactive stuff. So here's my unsolicited advice: if you're deciding what programming language to learn, go with python. There are nerdier options out there, but not many more broadly useful options. And if you're using python interactively, you'll want to us Jupyter (which is just a different interface) or -- better yet -- the Jupyter notebook.

You don't need to know python
Put cursor in grey boxes and hit Shift-Enter
username: workshopguest
password: workshopguest

(Discrete) Fourier transforms

$$ s(t) = \sum_{f_i} \left[ \tilde{s}_{f_i}\, \sin (2\,\pi\,f_i\,t + \phi_i) \right] $$

Discrete frequencies: $f_i$

FT amplitude: $\tilde{s}_{f_i}$

FT phase: $\phi_{f_i}$

2. Listening to gravitational waves

Can you hear any of the features that you see in the plots?
Why does the amplitude of the Fourier transform decrease as the frequency increases?
Why does it sound so quiet at the beginning and loud at the highest frequency?

3. Listening to detector data

What happened? Why is this signal nothing at all like the pure GW we saw above?
Can you see the signal in the plots?
Can you hear the signal?
What are the sources of noise here?
The raw data sounds generally pretty high-pitched. But looking in the frequency domain, we see that most of the power is at very low frequencies. Why doesn't the raw data sound low-pitched?

Matched filtering

$$ {\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}} $$

$$ {\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}}\, \cos \left[ \delta\phi_{f_i} \right] $$

$$ \sum_{f_{i}} {\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}}\, \cos \left[ \delta\phi_{f_i} \right] $$

$$ \sum_{f_{i}} {\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}}\, \cos \left[ \delta\phi_{f_i} + f_{i}\, \delta t \right] $$

$$ \sum_{f_{i}} {\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}}\, \cos \left[ \delta\phi_{f_i} + f_{i}\, \delta t \right]\, \Delta f_i $$

Parseval's / Plancherel's Theorem says that this is precisely the same as the integral in the time domain:

$$ \sum_{f_{i}} {\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}}\, \cos \left[ \delta\phi_{f_i} + f_{i}\, \delta t \right]\, \Delta f_i = \sum_{t_{i}} d_{t_{i}}\, s_{t_{i}-\delta t}\, \Delta t_i $$

So the question is: why do we use the frequency domain, rather than the time domain — especially since the data is actually measured in the time domain? There are two good reasons here. First, take a look at this expression on the left, and you'll realize that it's actually a Fourier transform as a function of $\delta t$ — usually called the inverse Fourier transform. Since we can't really predict $\delta t$ (it's just some time offset between our model and when nature decided the black holes would actually merge), we need to evaluate this quantity for all possible time offsets.

$$ \big\langle d \big| s \big\rangle_{\delta t} = \sum_{f_{i}} {\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}}\, \cos \left[ \delta\phi_{f_i} + f_{i}\, \delta t \right] $$

And it turns out that this can actually be evaluated extremely efficiently — which is not so true of the right-hand side. There's this algorithm called the fast Fourier transform (FFT), that let's us calculate $\tilde{d}$, $\tilde{s}$, and this entire sum really efficiently.

$$ \sum_{f_{i}} {\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}}\, \cos \left[ \delta\phi_{f_i} + f_{i}\, \delta t \right]\, \Delta f_i = \sum_{t_{i}} d_{t_{i}}\, s_{t_{i}-\delta t}\, \Delta t_i $$

But the second reason is perhaps more important. If I just look at any particular value on the right-hand side, there's just some data value at some instant of time, and I can't tell you whether that value is due to noise, or physics, or what. But by looking very carefully at correlations between different instants of time, we can start to pick out which parts of the data are due to noise — at least statistically. For example, one thing that's very common in nature is the harmonic oscillator. Basically, any time a (possibly generalized) restoring force is proportional to some (possibly generalized) displacement, you get harmonic motion. And since analytic functions always look linear for small displacements, you find this sort of thing a lot.

Stationary processes

Harmonic oscillators
Statistics of large numbers

$$ \big\langle d \big| s \big\rangle_{\delta t} = \sum_{f_{i}} \frac{\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}} {\tilde{n}_{f_{i}}^{2}}\, \cos \left[ \delta\phi_{f_i} + f_{i}\, \delta t \right] $$

Matched filtering

$$ \big\langle d \big| s \big\rangle_{\delta t} = \sum_{f_{i}} \frac{\tilde{d}_{f_{i}}\, \tilde{s}_{f_{i}}} {\tilde{n}_{f_{i}}^{2}}\, \cos \left[ \delta\phi_{f_i} + f_{i}\, \delta t \right] $$

Think of this as either

Filtering the data
Taking the projection of the signal vector with the data vector

Fourier series are vectors; Fourier space is a vector space.

We've provided the vector space with a "dot product" (which makes it into a Hilbert space).

The dot product accounts for different amounts of noise in the different vector components.

Matched filtering is taking a signal vector and measuring its projection along a template vector.

4. Digging signal out of noise manually

Can you adjust the filters so that you can see the peak amplitude in the time domain? What does it look like when you zoom in? Can you hear it?
If you leave the $45\,\mathrm{Hz}$ band in the data, you see (and hear) big slow oscillations in the time-domain amplitude that are much slower than $45\,\mathrm{Hz}$. Can you explain where these come from, and why they oscillate like that?
How does the largest data point of your filtered data (in the time domain) compare to the largest point in the raw data? Why is this?
With good settings, you should be able to see a large spike right in the middle of the data. But there are other spikes, too. How is the middle spike different from the others?

If you zoom in on the plot of the filtered data, there's just a small amount that looks like it might contain the actual signal. About how much time would you say this lasts for? Roughly how many cycles of data would you believe come from the GW150914 event, as opposed to noise?
How could we improve the crude filtering we've done here?

Conclusions

Matched filtering:

Current GW detectors are like giant microphones
There's lots of noise
So we filter the data and test for signals

Conclusions

Data analysis:

FFTs are great for time series (any periodic signal)
Python and Jupyter notebooks are really useful
Look at your data in as many ways as possible
Don't blindly trust hand-me-down algorithms
Don't blindly trust your results
- Think about whether they make sense
- Understand all the features
- Things you don't understand may lead to discovery

github.com/moble/MatchedFiltering

hosting.astro.cornell.edu/~rjennings/reu-pulsar



In [ ]:



In [ ]:



In [ ]:



In [ ]: