Import standard modules:


In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import HTML 
HTML('../style/course.css') #apply general CSS

7.1. Jones Notation

Jones notation, or Jones calculus, forms the foundation of the radio interferometer measurement equation (RIME). Jones calculus is a mathematical description of the propagation of electromagnetic plane waves. The signal measured by any telescope is not a perfect representation of the original astrophysical signal, but is rather contaminated by successive layers of propagation effects as the signal makes its way from the astrophysical source to an actual measurement on our instrument. Jones notation gives us a mathematical way of describing these corruptions. The business of estimating and removing the corruptions is called calibration and is dealt with in Chapter 8 ➞. In this section we simply introduce Jones notation and show how it can be used to describe propagation effects.

7.1.1. Electromagnetic plane waves & polarization

For starters, we need to understand that the electromagnetic (EM) radiation we are measuring comes in the form of plane waves (since the sources of radiation are sufficiently far away from the observer). Mathematically, this means the following: pick a coordinate frame $xyz$, where $z$ points along the direction of propagation. In general, any electric field at point $(x,y,z)$ and time $t$ can be described by a complex 3-vector:

$$\mathbf{e}(x,y,z,t) = \left[ \begin{array}{c}e_x\\e_y\\e_z\end{array} \right].$$

This is the general case. If we have a plane wave, then the EM field vector has two specific properties:

  • the field vector is the same across the entire $xy$ plane, $\mathbf{e}(x,y,z,t)\equiv\mathbf{e}(0,0,z,t)$
  • its component along the direction of propagation is null, $e_z=0.$

In this case we can describe the entire plane wave (as a function of time) by a single complex 2-vector $$\mathbf{e}(z,t) = \left[ \begin{array}{c}e_x\\e_y\end{array} \right].$$

The images below (courtesy of https://en.wikipedia.org/wiki/Plane_wave) schematically show two very special kinds of coherent plane waves: perfectly linearly and circularly polarized waves. Both of which are monochromatic. Note that this only shows the complex amplitude of the $e_x$ and $e_y$ components.

linearly polarized plane wavecircularly polarized plane wave

The $\mathbf{e}$-vector of these two plane waves follows some very specific equations.

For a linearly polarized [along the $x$ axis] plane wave: $$e_x = A_0 \cos( 2\pi(z - ct)/\lambda+\phi),~~~e_y=0.$$

For a cicularly polarized plane wave: $$e_x = A_0 \cos( 2\pi(z - ct)+\phi), ~~~e_y = A_0 \sin( 2\pi(z - ct)/\lambda+\phi),$$

where $A_0$ is the wave amplitude, $\lambda$ is wavelength, $\phi$ is the phase shift of the wave, and $c$ is the spedd of light.

7.1.2. Incoherent radiation and Stokes parameters

The above shows coherent radiation; the radiation from astrophysical sources is, by its nature, incoherent and broad-spectrum because is is the result of a natural processes. In other words, it is essentially noise-like -- you can think of the $\vec{e}$ vector as "waving around" more or less randomly, so the neat figures above do not really apply.

We can avoid the broad spectrum issue for now by considering radiation within a narrow frequency bin $[\nu-\Delta\nu/2,\nu+\Delta\nu/2]$, that is a bin centred at $\nu$, with a bandwidth of $\Delta \nu$. As we'll see elsewhere in this chapter, most instruments measure signals within such narrow frequency bins; we can think of the full broad-spectrum signal as being a superposition of near-monochromatic signals.

As for polarization, it is still possible to describe the polarization state of the signal in a statistical sense. This is done in terms of the Stokes parameters, which are defined in terms of the coherency of the $\mathbf{e}$-vector components (the $\langle\cdot\rangle$ operator indicates averaging over time):

$$\begin{eqnarray} I&=&\langle e_x e_x^*\rangle + \langle e_y e_y^*\rangle\\ Q&=&\langle e_x e_x^*\rangle - \langle e_y e_y^*\rangle\\ U&=&\langle e_x e_y^*\rangle+\langle e_y e_x^*\rangle = 2 \Re \langle e_x e_y^*\rangle \\ V&=&-\imath(\langle e_x e_y^*\rangle-\langle e_y e_x^*\rangle) = 2 \Im \langle e_x e_y^*\rangle \end{eqnarray} $$

$I$ is the total power (flux) in the signal. $Q$ and $U$ correspond to linearly polarized flux, while $V$ corresponds to circularly polarized flux. For a signal with $I=1, Q=0.01, U=0, V=0$ the $\mathbf{e}$-vector tends to "wave around" in the $x$ direction a little bit (or exactly 1%) more than in the $y$ direction; a signal with $Q=-0.01$ "waves around" the $y$ direction more than the $x$ direction; a signal with $Q=0,U=0.01$ tends to "wave around" the $45^\circ$ axis. The two perfectly polarized plane waves in the figure above correspond to $I=A_0, Q=A_0, U=0, V=0$ and $I=A_0, Q=0, U=0, V=A_0$ respectively.

7.1.2.1. Alternative representations of polarization

The usual approach to interferometry is to describe polarization in terms of the four Stokes parameters $IQUV$, since this is the representation that most readily yields itself to Jones calculus (see below). There are alternative ways to represent polarization which you may come across in the scientific literature. For completeness, we describe them here.

Fractional polarization describes the polarization state in terms of the polarization fraction $p$, position angle $\psi$, and the angle $\chi$: $$\begin{eqnarray} p &=& \frac{\sqrt{Q^2+U^2+V^2}}{I}\\ \psi &=& \frac{1}{2} \tan^{-1}\frac{U}{Q}\\ \chi &=& \frac{1}{2} \tan^{-1}\frac{V}{\sqrt{Q^2+U^2}}\\ \end{eqnarray} $$

These parameters can be represented in the form of a so-called polarization ellipse (image courtesy of https://en.wikipedia.org/wiki/Stokes_parameters).

Astrophysical circular polarization ($V$) is exceedingly rare, so scientific literature often describes polarization using only $p$ and $\psi$, which, physically, gives the percentage of linearly polarized signal, and its orientation on the sky. In the examples above, for $I=1, Q=0.01, U=0, V=0$, the signal is said to be 1% linearly polarized ($p=0.01$) with polarixation angle $0^\circ$; a signal with $Q=-0.01$ has polarization angle $90^\circ$, a signal with $Q=0,U=0.01$ has polarization angle $45^\circ$ axis. This can be illustrated as (courtesy of https://en.wikipedia.org/wiki/Stokes_parameters):

Poincaré sphere. Another way to represent polarization state is via the Poincaré vector $(S_1,S_2,S_3)=(Q,U,V)$. This vector lies on the so-called Poincaré sphere, whose radius is given by the total polarized flux $P=\sqrt{Q^2+U^2+V^2}$ (image courtesy of https://en.wikipedia.org/wiki/Stokes_parameters):

Propagation effects that changes of polarization state (such as Faraday rotation) can often be described by rotation of the Poincaré vector. However, in radio interferometry you will not come across this representation very often.

7.1.2.2 Polarization over 100%?

By definition of the Stokes parameters, a physical signal has the property $Q^2+U^2+V^2 \le I^2$, i.e. the polarized fraction cannot exceed 100%. This seems obvious, however, a curious fact of radio interferometry is that it is possible to observe signals that appear to formally have polarization in excess of 100%. Recall from Chapter 5 that an interferometer samples a limited set of spatial frequencies, as given by its shortest and longest baselines. Therefore, if the $I$ flux varies smoothly over the sky (i.e. has very low spatial frequencies, or large spatial scales) while the polarized $QU$ varies on smaller scales, it is quite possible for the interferometer to "see" less (or even no) $I$ flux, and significant $QU$ flux. This is in fact the case when observing near the plane of the Milky Way: unpolarized synchrotron radiation from around the galactic plane is very smooth (on the order of degrees), and gets mostly suppressed, while the polarized foregrounds have smaller-scale structure (on the order of arcminutes) that is quite apparent to most interefometers.

7.1.3. Propagation and Jones matrices

Jones calculus answers the following question: how do we mathematically describe what happens to the signal as it interacts with some medium on it way to us, for example:

  • propagation through free space
  • propagation through a cloud or layer of electrons (i.e. the ionosphere)
  • reflection off the dish surface
  • propagation through the receiver electronics

The fundamental (and only) assumption of Jones calculus is that all propagation effects are linear. That is, if the original signal vector is $\mathbf{e}$, and the propagated vector is given by $\mathbf{e}'=\mathbf{f}(\mathbf{e})$, then

$$ \mathbf{f}(a\mathbf{e}_1+b\mathbf{e}_2) = a\mathbf{f}(\mathbf{e}_1)+b\mathbf{f}(\mathbf{e}_2). $$

Fortunately for us, almost all (with a few very exotic exceptions) reasonable physics of signal propagation are linear. Now, from linear algebra we know that any linear function on 2-vectors can be described by multiplication with a $2\times2$ matrix:

$$ \mathbf{e}' = \left[ \begin{array}{c}e'_x\\e'_y\end{array} \right] = \left[ \begin{array}{cc}j_{11} & j_{12} \\ j_{21} & j_{22} \end{array} \right] \left[ \begin{array}{c}e_x\\e_y\end{array} \right] = \mathbf{J} \mathbf{e}. $$

$\mathbf{J}$ is called a Jones matrix. Any (linear) propagation effect can thus be described by its own particular kind of Jones matrix.

As a generalization, we may also consider the two complex voltages measured by our (dual-element) receiver as a voltage 2-vector, and also treat it as a linear function of the input EM vector. The receiver can then also be described by a Jones matrix:

$$ \mathbf{v} = \left[ \begin{array}{c}v_1\\v_2\end{array} \right] = \left[ \begin{array}{cc}j_{11} & j_{12} \\ j_{21} & j_{22} \end{array} \right] \left[ \begin{array}{c}e_x\\e_y\end{array} \right] = \mathbf{J} \mathbf{e}. $$

The notation of $\mathbf{e}$ being transformed into $\mathbf{v}$ is because we can think of the EM radiation being converted to voltage measurement within our telescope receiver system.

7.1.4. Examples of specific Jones matrices

In radio interferometry, Jones matrices are conventionally designated by a capital letter, with different letters corresponding to different propagation effects. Some examples are:

  • K-Jones or geometric delay, describing propagation through a path in free space of length $\tau$. As we'll see later, in some sense the K-Jones matrix is at the heart of all interferometry. K-Jones is an example of a scalar matrix (i.e. a diagonal matrix with the same element on the diagonal). We'll use normal-weight Roman font (e.g. $K$) to emphasize scalar matrices, as opposed to boldface ($\mathbf{K}$) for general matrices:
$$ \mathbf{K} = K = \left[ \begin{array}{cc}\mathrm{e}^{-2\pi \imath \tau / \lambda} & 0 \\ 0 & \mathrm{e}^{-2\pi \imath \tau / \lambda} \end{array} \right] = \mathrm{e}^{-2\pi \imath \tau / \lambda}, $$
  • P-Jones, describing parallactic angle rotation (i.e. the rotation of the reference frame of the telescope w.r.t. the frame of the signal/sky). This is an example of a rotation matrix:
$$ \mathbf{P} = \left[ \begin{array}{cc}\cos\gamma & -\sin\gamma \\ \sin\gamma & \cos\gamma\end{array} \right] = \mathrm{Rot}\,\gamma. $$
  • Z-Jones is another scalar matrix describing phase delay due to propagation through the ionosphere, which is proportional to the ionospheric total electron content (TEC), and inversely proportional to frequency:
$$ Z = \mathrm{e}^{-\imath \kappa \mathrm{TEC} / \nu}, $$

where $\kappa$ is a proportionality constant.

  • F-Jones describes Faraday rotation in the ionosphere. The angle of rotation is proprotional to a quantity called the rotation measure (RM), and invesely proportional to frequency squared:
$$ \mathbf{F} = \left[ \begin{array}{cc}\cos (\mathrm{RM}/\nu^2) & -\sin (\mathrm{RM}/\nu^2) \\ \sin (\mathrm{RM}/\nu^2) & \cos (\mathrm{RM}/\nu^2) \end{array} \right] = \mathrm{Rot}(\mathrm{RM}/\nu^2). $$
  • G-Jones describes the complex gain of the dual-element receiver:
$$ \mathbf{G} = \left[ \begin{array}{cc} g_x & 0 \\ 0 & g_y \end{array} \right ]. $$
  • In real life situations (see Chapter 8 ➞ on calibration) it may be useful to take advantage of the fact that the receiver gain ➞ has a frequency-dependent component that varies only slowly with time (the bandpass), and a time-variable component that is only weakly dependent on frequency (the gain per se). To emphasize this, the complex gain can then be split into two separate diagonal matrices:
$$ \mathbf{G} = \left[ \begin{array}{cc} g_x(t) & 0 \\ 0 & g_y(t) \end{array} \right],~~~ \mathbf{B} = \left[ \begin{array}{cc} b_x(\nu) & 0 \\ 0 & b_y(\nu) \end{array} \right]. $$
  • E-Jones describes the primary beam gain. This is further discussed in 7.5. Primary Beam ➞.

  • D-Jones describes the polarization leakage due to cross-talk between the two receiver elements. It typically looks like

$$ \mathbf{D} = \left[ \begin{array}{cc} 1 & d \\ -d & 1 \end{array} \right],~~d\ll 1 $$

Both the E- and D-Jones terms are examples of direction-dependent effects ➞.

Note that the above are only examples. In particular, depending on our instrumental model, the G-, B-, E-, and D-Jones terms may take a different form, or may be rolled into a single Jones matrix.

7.1.5. Jones chains

A sequence of propagation effects is represented by a chain of Jones matrix multiplications. We can describe the voltage vector measured by our instrument as $$ \mathbf{v} = \mathbf{J}_n \mathbf{J}_{n-1} ... \mathbf{J}_1 \mathbf{e} = \mathbf{Je}, $$

where the individual Jones terms describe the different effects in sequence. The system then has an overall Jones matrix of $\mathbf{J}_{\textrm{sys}} = \mathbf{J}_n \mathbf{J}_{n-1} ... \mathbf{J}_1 $.

For example a system Jones chain might be $\mathbf{J}_{\textrm{sys}} = \mathbf{G} \, \mathbf{B} \, \mathbf{D} \, \mathbf{E} \, \mathbf{K} \, \mathbf{P} \, \mathbf{Z} \, \mathbf{F}$. Note that the order of the chain is important: the original signal is operated on by the Jones chain in the order of in which the propagation effects occur. For example the signal is Faraday rotated on its way through the ionosphere, then modulated by the primary beam, before gain is applied by the amplifiers, thus $\mathbf{F}$ must be applied before $\mathbf{E}$, which must be applied before $\mathbf{G}$. In general matrix multiplication does not commute.

In a real-life observation, some of the effects in the chain are known perfectly in advance (e.g. K-Jones, P-Jones), some may have a reasonable prior model (E-Jones), and some must be estimated through calibration (e.g. G-Jones, B-Jones, see Chapter 8 ➞),

From linear algebra we know that matrix multiplication in general does not commute, so the chaining of Jones matrices must correspond to the physical order of the effects experienced by the signal. However, some kinds of matrices do commute, for example:

  • a scalar matrix commutes with any matrix (thus, for example, the K- and Z-Jones matrices can be moved to anywhere in the the chain without changing the result)

  • diagonal matrices commute among themselves (thus the B- and G-Jones terms may be swapped around without changing the result)

  • rotation matrices commute among themselves (thus the P- and F-Jones terms may be swapped around)

The combination of Jones chains and linear algebra constitutes the real power and utility of Jones calculus. In a nutshell:

  • To determine the true underlying astrophysical signal, we must model all the instrumentation and propagation effects.

  • Jones calclus allows us to construct a notionally perfect description of signal propagation, with individual effects described by elements of the Jones chain.

  • We can then use our prior knowledge to fix some of the Jones terms in the chain.

  • We can use calibration (Chapter 8 ➞) to estimate others.

  • We can use the rules of linear algebra to reorder the Jones terms where allowed, in order to simplify our calculations.

We will see all this come together in the next section on the RIME, and in the next chapter on calibration.