In [13]:
from IPython.display import Image, display

Analogy

Consider the following completely made-up scenario:

There's a country that has many different companies working at many different cities within it.

We also have information of each employee working at each of these companies. All kinds of things like height, names, hair length, tenure, and so on. Because these companies are located in different cities, they each adopt a separate (unknown) cost-of-living adjustment that they apply uniformly to all of their employees. And, for each company, each employee gets the same salary.

We have a method of getting the salary from the employees but, for contrived reasons, we get a measured value drawn from a Normal distribution centered on their true salary and an estimate of the error (which will be different for each salary). Again, we get 2 numbers when we get the salary information: the salary and an estimate error. The salary data is heteroskedastic (but we have a good idea of the errors for each number).

And this data was taken over the past 4 decades.

Hypothesis

There is a linear relationship between an employee's height and the relative salary at each of these companies.

If this turns out to be the case, then, this is a huge deal (Nobel prize winning deal). Let's just add that at this point, the relationship has to be a straight line in salary vs height space, in that, if this hypothesis were correct, it would appear as a straight line (of any slope/intercept) in this plot (assuming no other systematic errors).

So, a group of researchers takes the salary data for all companies in the Northern part of the country, and the employee's height data for each company, and fits a straight line to the data. They report the slope of each line and an estimated error on the fit, and they find significant non-zero slopes across the ensemble of these measurements.

Here's an example of the kind of fit being done:


In [166]:
plot_example_company(company=19, daa=2.5)


The above plot was generated for company number 19, with a slope of 2.5, and a best fit line is clearly a decent fit to the data. They made 140 measurements across the Northern half of the country. Reporting the slopes and errors for each company. The data looks like:


In [195]:
keck[['delta_alpha', 'error_delta_alpha']].rename(columns={'delta_alpha':'slope', 'error_delta_alpha':'error'}).head()


Out[195]:
slope error
0 0.34 7.27
1 36.05 39.54
2 -12.53 11.67
3 -78.43 35.48
4 -31.05 24.33

A bunch of other information was recorded at the same time (when the measurement was made, and so on). So they summarized the results across 140 companies by plotting the slope value vs when measurement was made (because if there were some height-bias in the different salaries, it might have changed over time). Remember the expected relationship for this plot should be a bunch of (statistical) zeros.


In [196]:
plot_example_telescope_results()



In [217]:
# np.average(keck.delta_alpha, weights=1.0 / keck.extra_error_delta_alpha**2.0)
# np.sqrt(1.0 / np.sum(1.0 / keck.extra_error_delta_alpha**2.0))

Taking a weighted mean of all slope measurements, they found a non-zero result of -5.7 $\pm$ 1.1 ppm. To look for a relationship with time, they took a weighted mean of measurements in time, binned so that each bin would contain roughly an equal number of companies. The next 2-panel plot shows the individual measurements in the upper panel colored to correspond with the color their weighted bin on the lower panel. (The horizontal lines on the upper panel denote the limits on the y-axis for the lower panel).


In [216]:
plot_example_telescope_bins(nbins=12, dataframe=keck)


This looks very interesting! But we only looked at the Northern half of the country. Another team of researchers went off to measure the Southern half. And they found the following:


In [224]:
plot_example_telescope_bins(nbins=13, dataframe=vlt)


A surprising result! The weighted bins are no longer uniformly negative, but appear to switch from negative to positive around 1.5 decades ago. Because the data include the lat/longitude of each of these companies, they went to see if they could fit a model that took into account spatial differences. Basically, if you want to think of this semi-accurately, the country is around an entire planet, with companies appearing at specific locations. And there is a gradient across the whole planet that can be thought of in terms of Latitude alone. The fitting model is technically called a dipole, so I will call it a dipole fit.


In [241]:
plot_a_v_theta(vlt, color=0, nbins=13, label='VLT')
plot_a_v_theta(keck, color=2, nbins=13, label='Keck')



In [242]:
plot_a_v_theta(all_systems, color=1, nbins=13, label='Combined')


A small wrinkle

Another group of scientists discovers a systematic error in the way the salary measurements are being made. Not only do we have height, salary, and location data, we also have each employee's hair length. It turns out that there is no relationship between height and hair length (just like in the real world).

However, for even further contrived reasons, the systematic error between an employee's hair length and their measured salary! Just to provide a mental model of what happens, try to stomach the following example. For the Southern companies, they use 2 rulers to measure the length of an employee's hair, one for if their hair is less than say... 5 cm, and another one for if their hair is longer than 5 cm. The Northern companies use 1 ruler to measure their employee's hair.

The thing is, for all of these rulers, on a per-company basis, they might introduce a systematic error in the recorded . The good news is that if there's an error, it's reasonably well-behaved. So, for the 2-ruler system, you'd have something that might look like:


In [278]:
plot_example_systematic(run17)


Where the two rulers show up clearly as two lines w/ roughly the same slope. The y-intercept of this is meaningless, as we only can ever know the relative salary differences per company.


In [180]:
def generate_sigmas(vshifts):
    return np.random.rand(len(vshifts)) * 30.0 + 50.0

def create_blank_transitions_dataset(source=all_systems):
    df = pd.DataFrame(columns=['system',
                               'J2000',
                               'source',
                               'wavelength',
                               'vshift', 
                               'sigma', 
                               'qval',
                               'rest_wave',
                              ])
    count = 0
    abs_count = 0
    for index, row in source.iterrows():
        waves = []
        rest_waves = []
        vshifts = []
        qvals_list = []
        for tran in row['transitions'].split():
            vshift = 0.0
            rest_wave = qvals.loc[codes.loc[tran].trans].wave
            measured_wave = rest_wave * (1 + row.z_absorption)
            qval = qvals.loc[codes.loc[tran].trans].qval
            waves.append(measured_wave)
            rest_waves.append(rest_wave)
            vshifts.append(vshift)
            qvals_list.append(qval)
        vshifts = np.array(vshifts)
        errors = generate_sigmas(vshifts)
#         vshifts += errors * np.random.randn(len(vshifts))
#         vshifts = vshifts - vshifts[0]
        for single in range(len(vshifts)):
            abs_count += 1
            df.loc[abs_count] = [int(index), #system
                                 row['J2000'],
                                 row.source,
                                 waves[single], #wavelength 
                                 vshifts[single],
                                 errors[single],
                                 qvals_list[single], # qvalues
                                 rest_waves[single],
                                ]
    df['system'] = df.system.astype(int)
    # For numerical stability during fitting, dividing the x-values by 1.0e14
    df['x'] = -2.0 * c * df['qval'] * df['rest_wave'] / 1.0e14
    return df

In [193]:
def fit_alpha(dataframe):
    """Takes a dataframe of transitions and returns a systems dataframe w/ best fit sim_fit_alpha."""
    new_df = all_systems.copy()
    new_df['sim_fit_alpha'] = -1.0
    for index, dfg in dataframe.groupby('system'):
        design_matrix = sm.add_constant(dfg.x)
        results = sm.WLS(dfg.vshift, design_matrix, weights=1.0/dfg.sigma).fit()
        chisq = np.sum((dfg.vshift - results.fittedvalues)**2.0 / (dfg.sigma) ** 2.0)
        const, slope = results.params
        new_df['sim_fit_alpha'].iloc[index] = slope
    return new_df

In [400]:
for trans in vlt[vlt.J2000.str.startswith('J043037')]['transitions']:
    trans = trans.split()
    for tran in trans:
        print(tran, codes.loc[tran].trans)


a2 MgI2852
j1 FeII1608
j2 FeII1611
j3 FeII2260
j4 FeII2344
j5 FeII2374
j6 FeII2382
j7 FeII2586
j8 FeII2600
c1 AlII1670
e1 SiII1526
e2 SiII1808
h1 CrII2056
h3 CrII2066
k1 NiII1709
k2 NiII1741
k3 NiII1751
i1 MnII2576
i2 MnII2594
i3 MnII2606

In [ ]:
temp2.plot()

In [527]:
def plot_distortion(distortions, xlims=(3000, 10000)):
    fig, ax = plt.subplots(figsize=(12, 8))
    wave = np.arange(xlims[0], xlims[1], 2)
    vshift = np.ones_like(wave)
    temp3 = pd.DataFrame(columns=['wavelength', 'vshift'], 
                 data=np.array([wave, vshift]).T)
    temp3['vshift'] = temp3.apply(lambda x:offset(x, distortions), axis=1)
    temp3.plot(x='wavelength', y='vshift', ax=ax)

plot_distortion(temp)



In [531]:
plot_distortion(temp4)



In [485]:
vlt_ccds.columns


Out[485]:
Index(['wav_min', 'midpoint', 'wav_max'], dtype='object')

In [14]:
def list_of_distortion(chip='B390', slope=0.0, offset=0.0, chip_info=vlt_ccds):
    
    wav_min = chip_info.loc[chip].wav_min
    midpoint = chip_info.loc[chip].midpoint
    wav_max = chip_info.loc[chip].wav_max
    intercept = -slope * (midpoint + offset)
    return list([chip, wav_min, midpoint, wav_max, slope, intercept])

In [15]:
vlt_ccds


Out[15]:
wav_min midpoint wav_max
chip
B346 3024.5 3454.25 3884.0
B390 3259.1 3888.95 4518.8
B437 3732.4 4365.90 4999.4
R564 4583.5 5634.90 6686.3
R580 4726.9 5780.90 6834.9
R760 5655.8 7559.70 9463.6
R860 6650.6 8628.15 10605.7

In [16]:
columns = ['chip', 'wav_min', 'midpoint', 'wav_max', 'slope', 'intercept']
chips=['B346', 'B437', 'R580', 'R860']
distortion_1 = [list_of_distortion(chip=chip, slope=0.2) for chip in chips]
temp = pd.DataFrame(columns=columns, data=distortion_1)

In [17]:
def create_distortion_from_ccd(chips=['B390', 'R580'],
                      slopes=[0.0, 20.5],
                      offsets=[0.0, 15.5],
                      columns=['chip', 'wav_min', 'midpoint', 'wav_max', 'slope', 'intercept'],
                      chip_info=vlt_ccds,):
    chips=['B390', 'R580', ]
    temp_list = []
    for (chip, slope, offset) in zip(chips, slopes, offsets):
        temp_list.append(list_of_distortion(chip=chip, slope=slope, offset=offset))
    return pd.DataFrame(columns=columns, data=temp_list)
create_distortion_from_ccd()


Out[17]:
chip wav_min midpoint wav_max slope intercept
0 B390 3259.1 3888.95 4518.8 0.0 -0.0
1 R580 4726.9 5780.90 6834.9 20.5 -118826.2

In [521]:
temp2['vshift'] += temp2.apply(lambda x:offset(x, temp), axis=1)

In [519]:
temp2


Out[519]:
system J2000 source wavelength vshift sigma qval rest_wave x
1 0 J000149-015940 Keck 5171.257827 73.562834 66.379070 270.0 1670.788610 -2.704805
2 0 J000149-015940 Keck 4725.312789 94.894536 52.211440 47.0 1526.707631 -0.430234
3 0 J000149-015940 Keck 5595.981560 23.870158 77.660533 526.0 1808.013169 -5.702142
4 0 J000149-015940 Keck 6364.330443 -66.026662 63.683419 -1110.0 2056.260038 13.685218
5 0 J000149-015940 Keck 6382.826953 -68.190754 51.262235 -1168.0 2062.236100 14.442152

In [105]:
fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(vlt_ccds.wav_min, np.zeros_like(vlt_ccds.wav_min), c=sns.color_palette()[0])
ax.scatter(vlt_ccds.wav_max, np.zeros_like(vlt_ccds.wav_min), c=sns.color_palette()[0], label='CCD wavelength edges')

# ax.scatter()
# temp_transitions.wavelength.hist(bins=50, ax=ax, alpha=0.5, color=sns.color_palette()[0])
sns.distplot(temp_transitions.wavelength,  ax=ax, hist=True)
knot_positions = [3024.5,
                  3259.1,
                  np.average([3732.4, 3884.0, ]),
                  4518.8,
                  4999.4,
                  5655.8,
                  np.average([6650.6, 6834.9]),
                  9463.6,
                  10605.7,
                 ]
ax.scatter(knot_positions, 
           np.zeros_like(knot_positions)-0.00002, 
           c='k',
           s=100,
           marker='x',
           label='Knot positions'
          )

ax.set_ylim(-0.00005, 0.0005)
ax.set_yticklabels([])
ax.set_ylabel("Fraction of transitions measured")
ax.set_xlabel("Wavelength [$\AA$]")
ax.legend(loc='best')
fig.tight_layout()



In [136]:
from scipy.interpolate import LSQUnivariateSpline, splrep, splev, Akima1DInterpolator, interp1d

In [77]:
knot_positions


Out[77]:
[3024.5,
 3259.1,
 3808.1999999999998,
 4518.8,
 4999.4,
 5655.8,
 6742.75,
 9463.6,
 10605.7]

In [78]:
# start simple
knot_positions = np.array([3024.5, 4518.8, 4999.4, 6742.75, 10605.7])
y = np.random.rand(len(knot_positions)-1)
y = np.array([0] + list(y))

In [79]:
tck = splrep(knot_positions, y, )

In [80]:
x = np.linspace(np.min(knot_positions), np.max(knot_positions), num=2000)

In [81]:
plt.scatter(knot_positions, y)
plt.plot(x, splev(x, tck))


Out[81]:
[<matplotlib.lines.Line2D at 0x11e3146a0>]

In [82]:
interp = Akima1DInterpolator(knot_positions, y)

In [83]:
def spline_offset(row, interpolator):
    return float(interpolator(row.wavelength))

In [87]:
for ydist in


Out[87]:
array([ 0.        ,  0.42529576,  0.38595446,  0.59502697,  0.07231275])

In [144]:
from itertools import combinations_with_replacement, permutations, combinations, product

In [172]:
yvals = [-20, -10, 0, 10, 20]
knot_positions = np.array([3024.5, 4518.8, 4999.4, 6742.75, 10605.7])
fig, ax = plt.subplots(figsize=(12, 8))
# for ycombo in permutations(yvals, len(knot_positions) - 1):
for ycombo in product(*[list(yvals) for _ in range(len(knot_positions)-1)]):
    y = np.array([0] + list(ycombo))
    interp = Akima1DInterpolator(knot_positions, y, )
    ax.scatter(knot_positions, y, c='k')
    ax.plot(x, interp(x), alpha=0.2, c='k')

ax.set_xlabel(r"Wavelength $\left[\AA\right]$")
ax.set_ylabel("Distortion [m/s]")
fig.tight_layout()



In [173]:
yvals = [-10, 0, 10]
knot_positions = np.array([3024.5, 4518.8, 4999.4, 6742.75, 10605.7])
fig, ax = plt.subplots(figsize=(12, 8))
count = 0
for ycombo in product(*[list(yvals) for _ in range(len(knot_positions)-1)]):
    count += 1
    y = np.array([0] + list(ycombo))
    interp = interp1d(knot_positions, y)
    ax.scatter(knot_positions, y)
    ax.plot(x, interp(x))
ax.set_xlabel(r"Wavelength $\left[\AA\right]$")
ax.set_ylabel("Distortion [m/s]")
fig.tight_layout()
print(count)


81

In [186]:
tuple(y)


Out[186]:
(0, -10, -10, -10, -10)

In [9]:
def velocity_shift(wavelength, distortion):
    for segment in distortion:
        if (wavelength > distortion[segment]['waves_start']) and (wavelength <= distortion[segment]['waves_end']):
            slope = distortion[segment]['slope']
            offset = distortion[segment]['offset']
    return -(wavelength * slope + offset)

In [174]:
vlt.head()


Out[174]:
J2000 z_emission z_absorption delta_alpha error_delta_alpha extra_error_delta_alpha dipole_delta_alpha dipole_angle sample source sigflag imrotator transitions
140 J000344-232355 2.28 0.4521 -4.59 7.87 11.993306 0.733603 74.9813 D VLT 3 0 a2 b1 b2 j4 j5 j6 j7 j8
141 J000344-232355 2.28 0.9491 -15.34 27.88 29.312061 0.733603 74.9813 D VLT 3 0 a2 b1 b2 j4 j6 j8
142 J000344-232355 2.28 1.5864 -4.10 10.03 13.509382 0.733603 74.9813 D VLT 3 0 a2 j1 j4 j5 j6 j7 j8 c1 e1
143 J000448-415728 2.76 1.9886 2.66 19.45 21.452389 3.182962 59.2265 D VLT 3 0 j4 j6 j7 j8 c1 d1 d2
144 J000448-415728 2.76 2.1679 13.81 9.44 13.077312 3.182962 59.2265 D VLT 3 0 b1 j1 j4 j5 j6 c1 e1

In [215]:
def offset(row, distortions):
    add_vshift = []
    for index, distortion in distortions.iterrows():
        if (row.wavelength > distortion.wav_min) & (row.wavelength < distortion.wav_max):
            add_vshift.append(row.wavelength * distortion.slope + distortion.intercept)
    if len(add_vshift) == 0:
        add_vshift.append(0.0)
    return np.average(add_vshift)

In [216]:
def chisq(system=vlt,
          fit_alpha='dipole_delta_alpha',
          alpha_col='delta_alpha',
          error_col='error_delta_alpha',):
    return np.sum((system[alpha_col] - system[fit_alpha]) ** 2.0 / (system[error_col] ** 2.0))

In [224]:
fit_dict = {}
yvals = [-50, 0, 50]
knot_positions = np.array([3024.5, 4518.8, 4999.4, 6742.75, 10605.7])
fig, ax = plt.subplots(figsize=(12, 8))
for ycombo in product(*[list(yvals) for _ in range(len(knot_positions)-1)]):
    y = np.array([0] + list(ycombo))
    fit_dict[tuple(y)] = {}
    interp = interp1d(knot_positions, y)
    fit_dict[tuple(y)]['interpolate'] = interp
    ax.scatter(knot_positions, y)
    ax.plot(x, interp(x))
ax.set_xlabel(r"Wavelength $\left[\AA\right]$")
ax.set_ylabel("Distortion [m/s]")
fig.tight_layout()



In [225]:
temp_vlt = create_blank_transitions_dataset(source=vlt)

In [226]:
chisq(temp_vlt_systems[temp_vlt_systems.source=='VLT'], fit_alpha='sim_fit_alpha')


Out[226]:
277.77318902254046

In [234]:
def create_fit_dict(yvals=[-50, 0, 50],
                    fit_dictionary=fit_dict,
                    clobber=False,
                    knot_positions = np.array([3024.5, 4518.8, 4999.4, 6742.75, 10605.7]),
                   ):
    if clobber:
        fit_dictionary = {}
        for ycombo in product(*[list(yvals) for _ in range(len(knot_positions)-1)]):
            y = np.array([0] + list(ycombo))
            fit_dictionary[tuple(y)] = {}
            interp = interp1d(knot_positions, y)
            fit_dictionary[tuple(y)]['interpolate'] = interp
    elif :


Out[234]:
(0, 0, 0, 50, 50)

In [238]:
blank_transitions = create_blank_transitions_dataset(source=vlt)
def fit_distortion_model(key=(0, 0, 0, 50, 50),
                         min_chisq=500.0,
                         fit_dictionary=fit_dict,
#                          blank_transitions=blank_transitions,
                        ):
    if 'chisq' not in fit_dictionary[key]:
        temp_vlt = blank_transitions.copy()
        temp_vlt['vshift'] = fit_dictionary[key]['interpolate'](temp_vlt.wavelength)
        temp_vlt_systems = fit_alpha(temp_vlt)
        chisquare = chisq(temp_vlt_systems[temp_vlt_systems.source=='VLT'],
                          fit_alpha='sim_fit_alpha')
        fit_dictionary[key]['chisq'] = chisquare
        print(pd.Timestamp('now').strftime("%Y-%m-%d %H:%M:%S"), key, chisquare)
    else:
        chisquare = fit_dictionary[key]['chisq']
        print("Already fit:", key, chisquare)
    if chisquare < min_chisq:
        print("    New minimum found! ")
        min_chisq = chisquare
    return chisquare

In [227]:
blank_transitions = create_blank_transitions_dataset(source=vlt)
min_chisq = 500.
for key in fit_dict:
    if 'chisq' not in fit_dict[key]:
        temp_vlt = blank_transitions.copy()
        temp_vlt['vshift'] = fit_dict[key]['interpolate'](temp_vlt.wavelength)
        temp_vlt_systems = fit_alpha(temp_vlt)
        chisquare = chisq(temp_vlt_systems[temp_vlt_systems.source=='VLT'],
                          fit_alpha='sim_fit_alpha')
        fit_dict[key]['chisq'] = chisquare
        print(pd.Timestamp('now').strftime("%Y-%m-%d %H:%M:%S"), key, chisquare)
    else:
        chisquare = fit_dict[key]['chisq']
        print("Already fit:", key, chisquare)
    if chisquare < min_chisq:
        print("    New minimum found! ")
        min_chisq = chisquare


2017-03-02 16:19:35 (0, 50, -50, 50, 0) 297.107831062
    New minimum found! 
2017-03-02 16:19:50 (0, -50, 50, 50, 0) 284.235720086
    New minimum found! 
2017-03-02 16:20:06 (0, 50, -50, -50, 0) 286.275054373
2017-03-02 16:20:22 (0, 0, 0, -50, -50) 270.324909718
    New minimum found! 
2017-03-02 16:20:39 (0, 50, -50, -50, -50) 280.704222664
2017-03-02 16:20:55 (0, -50, -50, -50, 50) 291.48678305
2017-03-02 16:21:11 (0, 50, 0, -50, -50) 270.805093392
2017-03-02 16:21:27 (0, -50, -50, -50, 0) 285.628059948
2017-03-02 16:21:44 (0, 50, 0, -50, 50) 281.519981072
2017-03-02 16:21:59 (0, -50, 50, 50, -50) 279.553561647
2017-03-02 16:22:15 (0, 50, 50, 50, 0) 275.44166904
2017-03-02 16:22:31 (0, -50, -50, 0, 0) 292.826962099
2017-03-02 16:22:48 (0, 50, -50, 0, -50) 284.804990233
2017-03-02 16:23:04 (0, -50, 0, 50, 0) 292.286985511
2017-03-02 16:23:21 (0, 0, 0, -50, 0) 275.465485349
2017-03-02 16:23:37 (0, -50, 0, -50, -50) 271.839329076
2017-03-02 16:23:52 (0, 0, -50, 0, 50) 296.478154027
2017-03-02 16:24:07 (0, 50, 0, 50, 0) 285.101242062
2017-03-02 16:24:23 (0, 50, -50, 0, 50) 296.279306442
2017-03-02 16:24:40 (0, 0, 50, -50, 50) 273.431325748
2017-03-02 16:24:56 (0, -50, 50, -50, 50) 275.676976391
2017-03-02 16:25:12 (0, 0, 50, 0, -50) 267.520160455
    New minimum found! 
2017-03-02 16:25:28 (0, -50, -50, 50, -50) 297.215518834
2017-03-02 16:25:44 (0, 0, 50, 50, 50) 283.920826648
2017-03-02 16:26:00 (0, 0, -50, 50, 0) 298.899247473
2017-03-02 16:26:16 (0, 50, 50, 50, 50) 280.557563898
2017-03-02 16:26:32 (0, -50, -50, 0, -50) 287.34313346
2017-03-02 16:26:48 (0, 50, 0, -50, 0) 275.982130279
2017-03-02 16:27:05 (0, 0, -50, 0, 0) 290.597050226
2017-03-02 16:27:21 (0, 50, 0, 50, 50) 290.61093174
2017-03-02 16:27:38 (0, 0, 0, 50, -50) 282.584397754
2017-03-02 16:27:55 (0, 0, 50, 50, 0) 278.841393047
2017-03-02 16:28:12 (0, 0, -50, 50, -50) 293.393038135
2017-03-02 16:28:29 (0, 50, 0, 0, -50) 274.049028507
2017-03-02 16:28:45 (0, 0, 50, 50, -50) 274.122773352
2017-03-02 16:29:01 (0, 0, 0, 0, -50) 275.124952403
2017-03-02 16:29:17 (0, 0, 50, 0, 0) 272.252860707
2017-03-02 16:29:34 (0, -50, 50, 0, 50) 281.148133078
2017-03-02 16:29:51 (0, 50, 0, 0, 50) 284.735755074
2017-03-02 16:30:07 (0, -50, 50, 0, -50) 271.39484118
2017-03-02 16:30:23 (0, -50, 0, 0, 50) 288.736360872
2017-03-02 16:30:39 (0, 0, -50, -50, -50) 279.419885192
2017-03-02 16:30:56 (0, 0, 0, 0, 50) 285.738756457
2017-03-02 16:31:14 (0, 0, -50, -50, 50) 290.849440003
2017-03-02 16:31:31 (0, 50, 50, 0, -50) 265.640082761
    New minimum found! 
2017-03-02 16:31:49 (0, 0, -50, 50, 50) 304.766270717
2017-03-02 16:32:05 (0, 50, 50, -50, -50) 263.252980099
    New minimum found! 
2017-03-02 16:32:23 (0, 50, -50, -50, 50) 292.206699987
2017-03-02 16:32:40 (0, 0, 0, 50, 0) 287.696812271
2017-03-02 16:32:58 (0, -50, 0, 0, -50) 278.195479331
2017-03-02 16:33:12 (0, 50, 50, 0, 0) 270.40924427
2017-03-02 16:33:29 (0, -50, -50, 0, 50) 298.671604644
2017-03-02 16:33:46 (0, -50, 0, -50, 50) 282.408371731
2017-03-02 16:34:03 (0, -50, 0, 0, 0) 283.285513148
2017-03-02 16:34:18 (0, 0, 50, -50, 0) 268.323731032
2017-03-02 16:34:34 (0, -50, 50, -50, -50) 265.895523378
2017-03-02 16:34:50 (0, 0, 50, 0, 50) 277.346374865
2017-03-02 16:35:07 (0, 0, 0, 0, 0) 280.251447477
2017-03-02 16:35:23 (0, 50, -50, 0, 0) 290.361741385
2017-03-02 16:35:39 (0, -50, -50, 50, 50) 308.515828904
2017-03-02 16:35:55 (0, -50, 0, -50, 0) 276.943443451
2017-03-02 16:36:11 (0, 50, 50, -50, 50) 273.180278136
2017-03-02 16:36:28 (0, 50, 50, 0, 50) 275.539219684
2017-03-02 16:36:44 (0, 50, 0, 0, 0) 279.211984838
2017-03-02 16:37:00 (0, 0, -50, -50, 0) 284.954255645
2017-03-02 16:37:17 (0, -50, 50, 50, 50) 289.278692431
2017-03-02 16:37:34 (0, -50, 50, -50, 0) 270.605842932
2017-03-02 16:37:51 (0, 50, 50, 50, -50) 270.686588089
2017-03-02 16:38:07 (0, 50, -50, 50, 50) 303.011315562
2017-03-02 16:38:23 (0, 0, -50, 0, -50) 285.076760331
2017-03-02 16:38:42 (0, 0, 50, -50, -50) 263.576950223
2017-03-02 16:38:58 (0, -50, 0, 50, 50) 297.723752678
2017-03-02 16:39:15 (0, -50, 0, 50, -50) 287.211032251
2017-03-02 16:39:31 (0, 0, 0, -50, 50) 280.966874886
2017-03-02 16:39:48 (0, -50, -50, 50, 0) 302.685266916
2017-03-02 16:40:04 (0, 50, 0, 50, -50) 279.952366288
2017-03-02 16:40:21 (0, -50, -50, -50, -50) 280.130150752
2017-03-02 16:40:38 (0, 50, 50, -50, 0) 268.036222165
2017-03-02 16:40:54 (0, -50, 50, 0, 0) 276.091080176
2017-03-02 16:41:10 (0, 50, -50, 50, -50) 291.565160467
2017-03-02 16:41:26 (0, 0, 0, 50, 50) 293.170040693

In [228]:
chisq()


Out[228]:
263.54873459627362

In [229]:
import operator

In [233]:
sorted([(fit_dict[key]['chisq'], key) for key in fit_dict])


Out[233]:
[(263.25298009871881, (0, 50, 50, -50, -50)),
 (263.5769502227231, (0, 0, 50, -50, -50)),
 (265.64008276087799, (0, 50, 50, 0, -50)),
 (265.89552337844214, (0, -50, 50, -50, -50)),
 (267.52016045453638, (0, 0, 50, 0, -50)),
 (268.03622216474253, (0, 50, 50, -50, 0)),
 (268.32373103241503, (0, 0, 50, -50, 0)),
 (270.32490971786831, (0, 0, 0, -50, -50)),
 (270.40924426979575, (0, 50, 50, 0, 0)),
 (270.6058429318021, (0, -50, 50, -50, 0)),
 (270.68658808852109, (0, 50, 50, 50, -50)),
 (270.8050933917736, (0, 50, 0, -50, -50)),
 (271.39484117990952, (0, -50, 50, 0, -50)),
 (271.83932907567777, (0, -50, 0, -50, -50)),
 (272.25286070712235, (0, 0, 50, 0, 0)),
 (273.18027813635501, (0, 50, 50, -50, 50)),
 (273.43132574769555, (0, 0, 50, -50, 50)),
 (274.0490285073131, (0, 50, 0, 0, -50)),
 (274.12277335183364, (0, 0, 50, 50, -50)),
 (275.12495240306191, (0, 0, 0, 0, -50)),
 (275.44166904033301, (0, 50, 50, 50, 0)),
 (275.46548534903241, (0, 0, 0, -50, 0)),
 (275.53921968430228, (0, 50, 50, 0, 50)),
 (275.67697639075084, (0, -50, 50, -50, 50)),
 (275.98213027926954, (0, 50, 0, -50, 0)),
 (276.09108017616359, (0, -50, 50, 0, 0)),
 (276.94344345050996, (0, -50, 0, -50, 0)),
 (277.34637486529698, (0, 0, 50, 0, 50)),
 (278.19547933052553, (0, -50, 0, 0, -50)),
 (278.84139304731366, (0, 0, 50, 50, 0)),
 (279.21198483770308, (0, 50, 0, 0, 0)),
 (279.41988519205074, (0, 0, -50, -50, -50)),
 (279.55356164686089, (0, -50, 50, 50, -50)),
 (279.95236628833652, (0, 50, 0, 50, -50)),
 (280.13015075195062, (0, -50, -50, -50, -50)),
 (280.25144747712011, (0, 0, 0, 0, 0)),
 (280.55756389773353, (0, 50, 50, 50, 50)),
 (280.70422266386549, (0, 50, -50, -50, -50)),
 (280.96687488578516, (0, 0, 0, -50, 50)),
 (281.14813307800637, (0, -50, 50, 0, 50)),
 (281.51998107235414, (0, 50, 0, -50, 50)),
 (282.40837173093087, (0, -50, 0, -50, 50)),
 (282.58439775373949, (0, 0, 0, 50, -50)),
 (283.28551314825177, (0, -50, 0, 0, 0)),
 (283.92082664838233, (0, 0, 50, 50, 50)),
 (284.235720086009, (0, -50, 50, 50, 0)),
 (284.73575507368173, (0, 50, 0, 0, 50)),
 (284.80499023278537, (0, 50, -50, 0, -50)),
 (284.95425564468695, (0, 0, -50, -50, 0)),
 (285.07676033062467, (0, 0, -50, 0, -50)),
 (285.10124206162061, (0, 50, 0, 50, 0)),
 (285.62805994825499, (0, -50, -50, -50, 0)),
 (285.73875645676691, (0, 0, 0, 0, 50)),
 (286.27505437283367, (0, 50, -50, -50, 0)),
 (287.21103225085722, (0, -50, 0, 50, -50)),
 (287.34313346017871, (0, -50, -50, 0, -50)),
 (287.69681227069168, (0, 0, 0, 50, 0)),
 (288.73636087156672, (0, -50, 0, 0, 50)),
 (289.27869243074588, (0, -50, 50, 50, 50)),
 (290.36174138464753, (0, 50, -50, 0, 0)),
 (290.59705022615498, (0, 0, -50, 0, 0)),
 (290.61093174049336, (0, 50, 0, 50, 50)),
 (290.84944000291188, (0, 0, -50, -50, 50)),
 (291.48678305014806, (0, -50, -50, -50, 50)),
 (291.56516046718923, (0, 50, -50, 50, -50)),
 (292.20669998739044, (0, 50, -50, -50, 50)),
 (292.2869855114775, (0, -50, 0, 50, 0)),
 (292.82696209937711, (0, -50, -50, 0, 0)),
 (293.17004069323258, (0, 0, 0, 50, 50)),
 (293.39303813468257, (0, 0, -50, 50, -50)),
 (296.27930644209835, (0, 50, -50, 0, 50)),
 (296.47815402727394, (0, 0, -50, 0, 50)),
 (297.10783106194543, (0, 50, -50, 50, 0)),
 (297.21551883389071, (0, -50, -50, 50, -50)),
 (297.72375267768655, (0, -50, 0, 50, 50)),
 (298.67160464416418, (0, -50, -50, 0, 50)),
 (298.89924747310693, (0, 0, -50, 50, 0)),
 (302.68526691598322, (0, -50, -50, 50, 0)),
 (303.01131556229035, (0, 50, -50, 50, 50)),
 (304.76627071712005, (0, 0, -50, 50, 50)),
 (308.51582890366439, (0, -50, -50, 50, 50))]

In [538]:
temp_transitions['vshift'] = temp_transitions.apply(lambda x:offset(x, temp4), axis=1)

In [539]:
temp_systems2 = fit_alpha(temp_transitions)

In [379]:
temp_systems = fit_alpha(temp_transitions)

In [381]:
plot_example_telescope_bins(nbins=13, 
                            alphacol='sim_fit_alpha', 
                            dataframe=temp_systems[temp_systems.source == 'VLT'])



In [535]:
plot_example_telescope_bins(nbins=13, 
                            alphacol='sim_fit_alpha', 
                            dataframe=temp_systems2[temp_systems2.source == 'VLT'])



In [540]:
plot_example_telescope_bins(nbins=13, 
                            alphacol='sim_fit_alpha', 
                            dataframe=temp_systems2[temp_systems2.source == 'VLT'])



In [ ]:


In [ ]:
row = all_systems.iloc[company]
    df = df_a[df_a.system==company]
    heights = df.x
    design_matrix = sm.add_constant(heights)

    results = sm.WLS(vshifts, design_matrix, weights=1.0/sigmas).fit()
    chisq = np.sum((vshifts - results.fittedvalues)**2.0 / (sigmas) ** 2.0)
    const, slope = results.params

    ax.scatter(heights, vshifts, color=sns.color_palette()[color_index], label='')
    ax.errorbar(heights, vshifts, yerr=sigmas, c=sns.color_palette()[color_index],  ls='none', label='')
    ax.plot(heights, results.fittedvalues, color='k', label='Fit slope: ' + str(round(slope, 2)) + " ppm")
    
    ax.legend(loc='best')
    ax.set_xlabel("Height")
    ax.set_ylabel("Salary")
    ax.set_title("Company: " + str(company) + " Generating slope: " + str(daa))
    fig.tight_layout()

In [215]:
def plot_example_telescope_bins(nbins = 12,
                                alphacol = 'delta_alpha',
                                errorcol = 'error_delta_alpha',
                                binned_lim = 25.0,
                                dataframe=keck,
                               ):
    fig, (ax, ax2) = plt.subplots(figsize=(12, 10), 
                                  nrows=2,
                                  sharex=True,
                                  gridspec_kw={'height_ratios':[1.5,1]},
                                 )


    for index, df in enumerate(np.array_split(dataframe.sort_values('z_absorption'), nbins)):
        color = sns.color_palette(n_colors=13)[index]

        x = df.z_absorption
        y = df[alphacol]
        e = df[errorcol]
        ax.scatter(x, (y), c=color, label='', s=40)
        ax.errorbar(x, (y), yerr=e, c=color,  ls='none', label='')

        x = np.average(df.z_absorption)
        y = np.average(df[alphacol], weights=(1.0 / (df[errorcol] ** 2.0)))
        e = np.sqrt(1.0 / np.sum(1.0 / (df[errorcol] ** 2.0)))
        label=''
        if index == 0:
            label=label
        else:
            label=''
        ax2.scatter(x, y, c=color, label=label)
        ax2.errorbar(x, y, yerr=e, c=color)

    ax.hlines(0, -2, 6, linestyles=':', lw=1.5, color='k')
    ax.hlines(binned_lim, -2, 6, linestyles=':', lw=.5, color='k')
    ax.hlines(-binned_lim, -2, 6, linestyles=':', lw=.5, color='k')
    ax2.hlines(0, -2, 6, linestyles=':', lw=1.5, color='k')

    ax.set_ylabel(r"Slopes per Company ($\Delta \alpha/\alpha$)")
    ax2.set_ylabel(r"Weighted Binneds")
    ax2.set_xlabel(r"Decades Ago (Redshift [z])")
    ax.legend(loc='best')
    ax.set_xlim(0.3, 3.9)
    ax.set_ylim(-150, 150)
    ax2.set_ylim(-binned_lim, binned_lim)
    fig.tight_layout()

In [117]:
def fit_hypothesis(system=0, dataframe1=df_a, hypothesis='x'):
    """Return the chisq and the fit model object for a given dataframe and hypothesis."""
    plotdf1 = dataframe1[dataframe1.system == system]
    assert(hypothesis in ['x', 'w'])
    if hypothesis == 'x':
        X = sm.add_constant(plotdf1.x)
    elif hypothesis == 'w':
        X = sm.add_constant(plotdf1.wavelength)
    results = sm.WLS(plotdf1.vshift, X, weights=1.0/plotdf1.sigma).fit()
    chisq = np.sum((plotdf1.vshift - results.fittedvalues)**2.0 / (plotdf1.sigma) ** 2.0)
    return chisq, results

In [164]:
def plot_example_company(company=19, # which system to use
                         daa=2.5, # ppm of generated slope
                        ):

    row = all_systems.iloc[company]
    df = df_a[df_a.system==company]
    heights = df.x

    color_index = 0

    waves = []
    rest_waves = []
    vshifts = []
    qvals_list = []
    for tran in row['transitions'].split():
        vshift = 0.0
        rest_wave = qvals.loc[codes.loc[tran].trans].wave
        measured_wave = rest_wave * (1 + row.z_absorption)
        qval = qvals.loc[codes.loc[tran].trans].qval
        vshift += shifted_velocity(daa,
                                   qval,
                                   rest_wave)

        waves.append(measured_wave)
        rest_waves.append(rest_wave)
        vshifts.append(vshift)
        qvals_list.append(qval)

    waves = np.array(waves)
    rest_waves = np.array(rest_waves)
    vshifts = np.array(vshifts)
    qvals_list = np.array(qvals_list)
    sigmas = np.ones_like(waves) * 5.0
    
    vshifts += sigmas * np.random.randn(len(vshifts))
    fig, ax = plt.subplots(figsize=(12, 8))

    design_matrix = sm.add_constant(heights)

    results = sm.WLS(vshifts, design_matrix, weights=1.0/sigmas).fit()
    chisq = np.sum((vshifts - results.fittedvalues)**2.0 / (sigmas) ** 2.0)
    const, slope = results.params

    ax.scatter(heights, vshifts, color=sns.color_palette()[color_index], label='')
    ax.errorbar(heights, vshifts, yerr=sigmas, c=sns.color_palette()[color_index],  ls='none', label='')
    ax.plot(heights, results.fittedvalues, color='k', label='Fit slope: ' + str(round(slope, 2)) + " ppm")
    
    ax.legend(loc='best')
    ax.set_xlabel("Height")
    ax.set_ylabel("Salary")
    ax.set_title("Company: " + str(company) + " Generating slope: " + str(daa))
    fig.tight_layout()

In [ ]:
def plot_example_telescope_results(dataframe=keck):
    fig, ax = plt.subplots(figsize=(12, 8))
    x = dataframe.z_absorption
    y = dataframe.delta_alpha
    e = dataframe.error_delta_alpha
    color=0
    label='Slopes'
    ax.scatter(x, (y), c=sns.color_palette()[color], label=label)
    ax.errorbar(x, (y), yerr=e, c=sns.color_palette()[color],  ls='none', label='')
    ax.hlines(0, -2, 6, linestyles=':', lw=1.5, color='k')
    ax.set_ylabel(r"Slopes per Company ($\Delta \alpha/\alpha$)")
    ax.set_xlabel(r"Decades Ago (Keck redshift [z])")
    ax.legend(loc='best')
    ax.set_xlim(0.3, 3.7)
    ax.set_ylim(-200, 200)
    fig.tight_layout()

In [ ]:
def generate_dataset(gen_dipole_alpha=True, 
                     wavelength_distortion=False,
                     seed=228,
                    ):
    df = pd.DataFrame(columns=['system',
                               'J2000',
                               'source',
                               'wavelength',
                               'vshift', 
                               'sigma', 
                               'qval',
                               'rest_wave'
                              ])
    count = 0
    abs_count = 0
    for index, row in all_systems.iterrows():
        waves = []
        rest_waves = []
        vshifts = []
        qvals_list = []
        for tran in row['transitions'].split():
            vshift = 0.0
            rest_wave = qvals.loc[codes.loc[tran].trans].wave
            measured_wave = rest_wave * (1 + row.z_absorption)
            qval = qvals.loc[codes.loc[tran].trans].qval
            if gen_dipole_alpha:
                vshift += shifted_velocity(row['dipole_delta_alpha'],
                                           qval,
                                           rest_wave)
            if wavelength_distortion:
                vshift += distorted_velocity(row, measured_wave)
            waves.append(measured_wave)
            rest_waves.append(rest_wave)
            vshifts.append(vshift)
            qvals_list.append(qval)
        vshifts = np.array(vshifts)
        errors = sigmas(vshifts)
        vshifts += errors * np.random.randn(len(vshifts))
        vshifts = vshifts - vshifts[0]
        for single in range(len(vshifts)):
            abs_count += 1
            df.loc[abs_count] = [int(index), #system
                                 row['J2000'],
                                 row.source,
                                 waves[single], #wavelength 
                                 vshifts[single],
                                 errors[single],
                                 qvals_list[single], # qvalues
                                 rest_waves[single],
                                ]
    df['system'] = df.system.astype(int)
    # For numerical stability during fitting, dividing the x-values by 1.0e14
    df['x'] = -2.0 * c * df['qval'] * df['rest_wave'] / 1.0e14
    return df

In [114]:
fig, ax = plt.subplots(figsize=(12, 8))
for company in keck.index:
    chisq, results = fit_hypothesis(system=company, dataframe1=df_a, hypothesis='x')
    results.params.x
    ax.scatter(keck.iloc[company].delta_alpha, results.params.x)



In [277]:
def read_systematic(infile='../data/run.17.json'):
    """Read systematic error file."""
#     with open('../data/run.17.json', 'w') as fp:
#         json.dump(run17, fp, indent=4, sort_keys=True)
    with open(infile, 'r') as fp:
        temp = dict(json.load(fp))
    return temp

run17 = {int(k):v for k, v in temp.items()}

def plot_example_systematic(region_dictionary=run17):
    color = "black"
    linewidth = 5.0
    fig, ax = plt.subplots(figsize=(12, 8))
    for index in region_dictionary:
        begin = region_dictionary[index]['waves_start']
        end = region_dictionary[index]['waves_end']
        wavelength = np.linspace(begin, end, 50)
        quad = region_dictionary[index]['quad']
        slope = region_dictionary[index]['slope']
        offset = region_dictionary[index]['offset']
        if index == 1:
            color = "blue"
            linestyle = "-"
        elif index == 3:
            color = "red"
            linestyle = '-'
        else:
            color = "black"
            linestyle = '--'
        ax.plot(wavelength, quad * wavelength ** 2.0 + wavelength * slope + offset,
                 color=color, 
                 linewidth=linewidth, 
                 linestyle=linestyle)

    ax.set_xlim(2900, 7500)
    ax.set_xlabel("Hairlength")
    ax.set_ylabel("Systematic Error of Salary")
    fig.tight_layout()

In [89]:
keck[keck.J2000.eq("J000149-015940")]


Out[89]:
J2000 z_emission z_absorption delta_alpha error_delta_alpha extra_error_delta_alpha dipole_delta_alpha dipole_angle sample source sigflag imrotator transitions
0 J000149-015940 2.31 2.0951 0.34 7.27 18.885386 -2.377815 93.5334 B1 Keck 2 0 d g h i j k l s t u v w
1 J000149-015940 2.31 2.1539 36.05 39.54 39.540000 -2.377815 93.5334 B1 Keck 1 0 d f g l

In [60]:
fig, ax = plt.subplots(figsize=(12, 8))
x = keck.z_absorption
y = keck.delta_alpha
e = keck.error_delta_alpha
color=0
label='measurement'
ax.scatter(x, (y), c=sns.color_palette()[color], label=label)
ax.errorbar(x, (y), yerr=e, c=sns.color_palette()[color],  ls='none', label='')
ax.hlines(0, -2, 6, linestyles=':', lw=1.5, color='k')
ax.set_ylabel(r"$\Delta \alpha/\alpha$")
ax.set_xlabel(r"Decades Ago (Keck redshift [z]")
ax.legend(loc='best')
ax.set_xlim(0.3, 3.7)
ax.set_ylim(-200, 200)
fig.tight_layout()



In [70]:
fig, (ax, ax2) = plt.subplots(nrows=2)
x = keck.z_absorption
y = keck.delta_alpha
e = keck.error_delta_alpha
color=0
label='measurement'
ax.scatter(x, (y), c=sns.color_palette()[color], label='', s=40)
ax.errorbar(x, (y), yerr=e, c=sns.color_palette()[color],  ls='none', label='')
ax.hlines(0, -2, 6, linestyles=':', lw=1.5, color='k')
ax.set_ylabel(r"$\Delta \alpha/\alpha$")
ax.set_xlabel(r"Decades Ago (Keck redshift [z])")
ax.legend(loc='best')
ax.set_xlim(0.3, 3.7)
ax.set_ylim(-200, 200)
plot_a_v_z(keck, nbins=13, ylims=200, ax=ax2)
ax2.set_ylim(-20, 20)
fig.tight_layout()



In [ ]:
plot_a_v_z(keck)

It's important to note that none of the measurements by themselves is enough to establish a measurement at a 5-$\sigma$ level, but taken as an ensemble, it approaches ~4.7$\sigma$.


In [ ]:

Simulated data

I'm going to give you a set of data. This set of data, the y-values will be generated in one of two ways (or a combination).

  • An $\alpha$ process ($\alpha$ process)
  • A wavelength process (w process)

Here is a single system, plotted as though it was generated in either $\alpha$ or w processes (and transitions smoothly between them to highlight the fact that that y-values are the same. It is merely the x-axis that gets changed:

Given a dataset, can you devise a test that quantifies the relative likelihood of either process $\alpha$ or w generating it?

One extra piece of information that is important: if $\alpha$ is generating a signal, it must be a straight line in the $\alpha$ plot. If it is a w-process, it can be more flexible.


In [3]:
%matplotlib inline
%config InlineBackend.figure_format='retina'
import warnings

warnings.filterwarnings('ignore')

In [4]:
from __future__ import absolute_import, division, print_function
import matplotlib as mpl
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle
from matplotlib.pyplot import GridSpec
from itertools import combinations, islice, takewhile
import seaborn as sns
import json
import mpld3
import numpy as np
import pandas as pd
import scipy.sparse

import os, sys
import warnings
from astropy import units as u
from astropy.coordinates import SkyCoord
import statsmodels.api as sm
sns.set_context('poster', font_scale=1.3)

from ipywidgets import interact, FloatSlider, SelectMultiple, interactive

# "constants"
from scipy.constants import c # speed of light in m/s

In [5]:
DIP_RA = 17.3
DIP_RA_ERR = 1.0
DIP_DEC = -61.0
DIP_DEC_ERR = 10.0
DIP_AMPLITUDE = 0.97e-5
DIP_AMPLITUDE_ERR = 0.21e-5 # average of asymmetric errors
DIP_MONOPOLE = -0.178e-5
DIP_MONOPOLE_ERR  = 0.084e-5

dipole = SkyCoord(DIP_RA, DIP_DEC, unit=(u.hourangle, u.deg))

# Data
vltqs = pd.read_csv('../data/vlt-transitions-new.tsv', sep='\t')
vltqs['trans'] = vltqs['transition'] + vltqs.four.astype(str)

keckqs = pd.read_csv("../data/keck-transitions-new.tsv", sep='\t')
keckqs['trans'] = keckqs.transition + keckqs.four.astype(str)

qvals = pd.read_csv("../data/qvalues.txt", sep=' ', index_col=0)

codes = pd.concat([keckqs.set_index('code')[['trans', 'wavelength']],
                   vltqs.set_index('code')[['trans', 'wavelength']]])

all_systems = pd.read_csv("../data/full-parse-new.tsv", sep='\t')

vlt_ccds = pd.read_csv("../data/vlt-ccd.csv", index_col='chip')

In [6]:
vlt = all_systems[all_systems.source.eq('VLT')].copy()
keck = all_systems[all_systems.source.eq('Keck')].copy()

In [62]:
def plot_a_v_z(dataframe, alphacol='delta_alpha', 
               errorcol='extra_error_delta_alpha', 
               nbins=13,
               color=0,
               ylims=(20),
               label='',
               fig=None,
               ax=None,
              ):
    if fig == None:
        fig = plt.gcf()
    if ax == None:
        ax = plt.gca()
    for index, df in enumerate(np.array_split(dataframe.sort_values('z_absorption'), nbins)):
        x = np.average(df.z_absorption)
        y = np.average(df[alphacol], weights=(1.0 / (df[errorcol] ** 2.0)))
        e = np.sqrt(1.0 / np.sum(1.0 / (df[errorcol] ** 2.0)))
        if index == 0:
            label=label
        else:
            label=''
        ax.scatter(x, y, c=sns.color_palette()[color], label=label)
        ax.errorbar(x, y, yerr=e, c=sns.color_palette()[color])
    ax.hlines(0, -2, 6, linestyles=':', lw=1.5, color='k')
    ax.set_ylabel(r"$\Delta \alpha/\alpha$")
    ax.set_xlabel(r"Redshift [z]")
    ax.legend(loc='best')
    ax.set_xlim(0.3, 3.7)
    ax.set_ylim(-ylims, ylims)
    fig.tight_layout()

In [32]:
def plot_a_v_zresid(dataframe, 
                    dataframe2,
                    alphacol='delta_alpha',
                    alphacol2='dipole_delta_alpha',
                    errorcol='extra_error_delta_alpha', color=0, label=''):
    """Measured - model"""
    fig = plt.gcf()
    ax = plt.gca()
    nbins = 13
    for index in range(nbins):
        df = np.array_split(dataframe.sort_values('z_absorption'), nbins)[index]
        x = np.average(df.z_absorption)
        y = np.average(df[alphacol], weights=(1.0 / (df[errorcol] ** 2.0)))
        e = np.sqrt(1.0 / np.sum(1.0 / (df[errorcol] ** 2.0)))
        
        df2 = np.array_split(dataframe2.sort_values('z_absorption'), nbins)[index]
        x2 = np.average(df2.z_absorption)
        y2 = np.average(df2[alphacol2], weights=(1.0 / (df2[errorcol] ** 2.0)))
        e2 = np.sqrt(1.0 / np.sum(1.0 / (df2[errorcol] ** 2.0)))
        if index == 0:
            label=label
        else:
            label=''
        ax.scatter(x, (y - y2), c=sns.color_palette()[color], label=label)
        ax.errorbar(x, (y - y2), yerr=e, c=sns.color_palette()[color])
    ax.hlines(0, -2, 6, linestyles=':', lw=1.5, color='k')
    ax.set_ylabel(r"Residual $\Delta \alpha/\alpha$")
    ax.set_xlabel(r"Redshift [z]")
    ax.legend(loc='best')
    ax.set_xlim(0.3, 3.7)
    ax.set_ylim(-20, 20)
    fig.tight_layout()

In [33]:
all_systems.head()


Out[33]:
J2000 z_emission z_absorption delta_alpha error_delta_alpha extra_error_delta_alpha dipole_delta_alpha dipole_angle sample source sigflag imrotator transitions
0 J000149-015940 2.31 2.09510 0.34 7.27 18.885386 -2.377815 93.5334 B1 Keck 2 0 d g h i j k l s t u v w
1 J000149-015940 2.31 2.15390 36.05 39.54 39.540000 -2.377815 93.5334 B1 Keck 1 0 d f g l
2 J000322-260316 4.11 1.43420 -12.53 11.67 11.670000 1.115461 72.6324 C Keck 1 1 b c p r
3 J000322-260316 4.11 3.38970 -78.43 35.48 35.480000 1.115461 72.6324 C Keck 1 1 d g l m
4 J000520+052410 1.90 0.59137 -31.05 24.33 24.330000 -3.538861 100.4470 C Keck 1 0 b c n p q r

In [34]:
plot_a_v_zresid(vlt, vlt)



In [231]:
def plot_a_v_theta(dataframe,
                   alphacol='delta_alpha',
                   alphacol2='dipole_delta_alpha',
                   nbins=13,
                   errorcol='extra_error_delta_alpha', color=0, label=''):
    """Measured - model"""
    fig = plt.gcf()
    ax = plt.gca()
    for index in range(nbins):
        df = np.array_split(dataframe.sort_values('dipole_angle'), nbins)[index]
        x = np.average(df.dipole_angle)
        y = np.average(df[alphacol], weights=(1.0 / (df[errorcol] ** 2.0)))
        e = np.sqrt(1.0 / np.sum(1.0 / (df[errorcol] ** 2.0)))
        
        if index == 0:
            label=label
        else:
            label=''
        ax.scatter(x, (y), c=sns.color_palette()[color], label=label)
        ax.errorbar(x, (y), yerr=e, c=sns.color_palette()[color])
    ax.hlines(0, -2, 200, linestyles=':', lw=1.5, color='k')
    ax.vlines(90, -30, 30, linestyles=':', lw=1.5, color='k')
    ax.set_ylabel(r"Residual $\Delta \alpha/\alpha$")
    ax.set_xlabel(r"$\Theta$, angle from best-fitting dipole [degrees]")
    ax.legend(loc='best')
    ax.set_xlim(0.0, 180.0)
    ax.set_ylim(-25, 25)
    fig.tight_layout()

In [226]:
plot_a_v_theta(vlt, color=0)
plot_a_v_theta(keck, color=2)



In [37]:
plot_a_v_theta(keck, color=2)



In [229]:
first = keck[keck.sigflag == 1]
second = keck[keck.sigflag == 2]

In [230]:
plt.scatter(first.z_absorption, first.delta_alpha)
plt.scatter(second.z_absorption, second.delta_alpha)


Out[230]:
<matplotlib.collections.PathCollection at 0x1176f7400>

In [38]:
fig, ax = plt.subplots(figsize=(12, 8))
plot_a_v_z(vlt, color=2, label='VLT (measured)')
plot_a_v_zresid(vlt, vlt)



In [40]:
fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(vlt.z_absorption, vlt.dipole_delta_alpha)
ax.scatter(vlt.z_absorption, vlt.delta_alpha)
plot_a_v_z(vlt, alphacol='dipole_delta_alpha', color=3, label='VLT (fill with dipole-fit values)')



In [41]:
fig, ax = plt.subplots(figsize=(12, 8))
plot_a_v_z(vlt, alphacol='dipole_delta_alpha', color=3, label='VLT (fill with dipole-fit values)')
plot_a_v_z(vlt, color=2, label='VLT (measured)')



In [43]:
fig, ax = plt.subplots(figsize=(12, 8))
plot_a_v_z(keck, color=2, label='Keck (measured)')



In [44]:
fig, ax = plt.subplots(figsize=(12, 8))
plot_a_v_z(keck, alphacol='dipole_delta_alpha', color=3, label='Keck (fill with dipole-fit values)')



In [190]:
fig, ax = plt.subplots(figsize=(12, 8))
plot_a_v_z(keck, color=2, label='Keck (measured)')
plot_a_v_z(keck, alphacol='dipole_delta_alpha', color=3, label='Keck (fill with dipole-fit values)')



In [151]:
def observed_shifts(telescope='VLT'):
    waves = []
    shifts = []
    for index, row in all_systems[all_systems.source.eq(telescope)].iterrows():
        for tran in row['transitions'].split():
            rest_wave = qvals.loc[codes.loc[tran].trans].wave
            measured_wave = rest_wave * (1 + row.z_absorption)
            qval = qvals.loc[codes.loc[tran].trans].qval
            waves.append(measured_wave)
            shifts.append(shifted_velocity(row.delta_alpha, qval, rest_wave))
    return np.array(waves), np.array(shifts)


def parse_j2000(name):
    """Takes the J2000 name stored in the results and returns it in a format astropy can understand."""
    return ' '.join([name[1:3], name[3:5], name[5:7], name[7:10], name[10:12], name[12:]])

def j2000_to_theta(name):
    """Returns the angle (degrees) between the position on the sky from 
    a given `name` and the position of the dipole model from 2012, King."""
    c = SkyCoord(parse_j2000(name), unit=(u.hourangle, u.deg))
    return float(c.separation(dipole).to_string(decimal=True))

def dipole_alpha(name):
    """Returns the value of Delta alpha/alpha as given by the best fit 2012 King model for 
    the given name (position).
    """
    theta = j2000_to_theta(name)
    return (DIP_AMPLITUDE * np.cos(np.deg2rad(theta)) + DIP_MONOPOLE) * 1e6

def plot_system(system, dataframe):
    plotdf = dataframe[dataframe.system == system]
    fig, (ax1, ax2) = plt.subplots(figsize=(12, 10), nrows=2)
    sns.regplot('wavelength', 'vshift', data=plotdf, ax=ax1)
    sns.regplot('x', 'vshift', data=plotdf, ax=ax2)
    fig.tight_layout()


def shifted_velocity(del_alpha, q, lamb):
    # vj =v0 + ∆α xj, xj =−2cqjλ0j,
    x = -2 * c * q * lamb
    return del_alpha * x * 1e-14


def VLT_distortion(measured_wave, 
                   cutoff=10000., 
                   slope1=0.06, 
                   intercept1 = -100.0,
                   slope2 =0.160,
                   intercept2=-1500.0,
                  ):
    """Telescope dependent distortion function for the VLT sample."""
    if measured_wave < cutoff:
        return measured_wave * slope1 + intercept1
    else:
        return measured_wave * slope2 + intercept2

def Keck_distortion(measured_wave, cutoff=10000.):
    """Telescope dependent distortion function for the Keck sample."""
    slope1 = .0600
    intercept1 = -100
    slope2 = .160
    intercept2 = -1500
    if measured_wave < cutoff:
        return measured_wave * slope1 + intercept1
    else:
        return measured_wave * slope2 + intercept2
    
def distorted_velocity(row, measured_wave):
    """Telescope dependent distortion function for the VLT sample."""
    if row.source == "VLT":
        return VLT_distortion(measured_wave)
    elif row.source == "Keck":
        return Keck_distortion(measured_wave)

In [152]:
df_a = generate_dataset(gen_dipole_alpha=True, wavelength_distortion=False)
df_w = generate_dataset(gen_dipole_alpha=False, wavelength_distortion=True)

In [104]:
def plot_hypotheses(system, dataframe1, dataframe2):
    plotdf1 = dataframe1[dataframe1.system == system]
    plotdf2 = dataframe2[dataframe2.system == system]
    fig, ((ax2, ax4), (ax3, ax1)) = plt.subplots(figsize=(14, 10), nrows=2, ncols=2, )
    
    chi_one, mod_one = fit_hypothesis(system=system, dataframe1=dataframe1, hypothesis='w')
    ax1.errorbar(plotdf1.wavelength, plotdf1.vshift, yerr=plotdf1.sigma,  ls='none', 
                 label=r'$\chi^2$ = ' + str(np.round(chi_one, 2)), color=sns.color_palette()[2])
    ax1.scatter(plotdf1.wavelength, plotdf1.vshift, color=sns.color_palette()[2], label='')
    ax1.plot(plotdf1.wavelength, mod_one.fittedvalues, color='k')

    chi_one, mod_one = fit_hypothesis(system=system, dataframe1=dataframe1, hypothesis='x')
    ax3.errorbar(plotdf1.x, plotdf1.vshift, yerr=plotdf1.sigma,  ls='none', 
                 label=r'$\chi^2$ = ' + str(np.round(chi_one, 2)), color=sns.color_palette()[0])
    ax3.scatter(plotdf1.x, plotdf1.vshift, color=sns.color_palette()[0], label='')
    ax3.plot(plotdf1.x, mod_one.fittedvalues, color='k')

    chi_one, mod_one = fit_hypothesis(system=system, dataframe1=dataframe2, hypothesis='w')
    ax2.errorbar(plotdf2.wavelength, plotdf2.vshift, yerr=plotdf2.sigma,  ls='none', 
                 label=r'$\chi^2$ = ' + str(np.round(chi_one, 2)), color=sns.color_palette()[0])
    ax2.scatter(plotdf2.wavelength, plotdf2.vshift, color=sns.color_palette()[0], label='')
    ax2.plot(plotdf2.wavelength, mod_one.fittedvalues, color='k')

    chi_one, mod_one = fit_hypothesis(system=system, dataframe1=dataframe2, hypothesis='x')
    ax4.errorbar(plotdf2.x, plotdf2.vshift, yerr=plotdf2.sigma,  ls='none', 
                 label=r'$\chi^2$ = ' + str(np.round(chi_one, 2)), color=sns.color_palette()[2])
    ax4.scatter(plotdf2.x, plotdf2.vshift, color=sns.color_palette()[2], label='')
    ax4.plot(plotdf2.x, mod_one.fittedvalues, color='k')

    autoAxis = ax2.axis()
    rec = Rectangle((autoAxis[0],
                     autoAxis[2]),
                    (autoAxis[1]-autoAxis[0]),
                    (autoAxis[3]-autoAxis[2]),
                    fill=False,lw=2)
    rec = ax2.add_patch(rec)
    rec.set_clip_on(False)
    
    autoAxis = ax3.axis()
    rec = Rectangle((autoAxis[0],
                     autoAxis[2]),
                    (autoAxis[1]-autoAxis[0]),
                    (autoAxis[3]-autoAxis[2]),
                    fill=False,lw=2)
    rec = ax3.add_patch(rec)
    rec.set_clip_on(False)
    
    ax3.set_title(r"$\alpha$ process $\alpha$ fit")
    ax2.set_title(r"W process W fit")
    ax1.set_title(r"$\alpha$ process W fit")
    ax4.set_title(r"W process $\alpha$ fit")
    ax2.set_ylabel("vshift [m/s]")
    ax3.set_ylabel("vshift [m/s]")
    ax1.set_xlabel("wavelength")
    ax2.set_xlabel("wavelength")
    ax3.set_xlabel("x")
    ax4.set_xlabel("x")
    leg = ax1.legend(handlelength=0, handletextpad=0, fancybox=True, frameon=True, facecolor='white', loc='best')
    for item in leg.legendHandles:
        item.set_visible(False)
    leg = ax2.legend(handlelength=0, handletextpad=0, fancybox=True, frameon=True, facecolor='white', loc='best')
    for item in leg.legendHandles:
        item.set_visible(False)
    leg = ax3.legend(handlelength=0, handletextpad=0, fancybox=True, frameon=True, facecolor='white', loc='best')
    for item in leg.legendHandles:
        item.set_visible(False)
    leg = ax4.legend(handlelength=0, handletextpad=0, fancybox=True, frameon=True, facecolor='white', loc='best')
    for item in leg.legendHandles:
        item.set_visible(False)
    fig.tight_layout()

To Do:

  • animation of transitioning between x and w on the x-axis
  • interactive plot that fits a spline with offsets
  • Binned model from paper with dipole result

In [15]:
plot_hypotheses(231, df_a, df_w)



In [70]:
plot_hypotheses(264, df_a, df_w)



In [17]:
df_a[df_a.system==264][['wavelength', 'x', 'vshift', 'sigma']]


Out[17]:
wavelength x vshift sigma
1755 6307.606019 -10.570881 0.000000 67.976694
1756 8880.132002 -15.395341 -183.532169 58.355253
1757 8703.930794 -35.545013 -137.297620 66.331114
1758 8726.270535 -20.339554 92.848987 75.616709
1759 7036.901052 -194.518419 -133.810894 53.660659
1760 7296.596596 -193.263758 -67.622065 79.228620
1761 7390.744395 -231.349696 -234.822291 54.856296
1762 8051.204649 -234.963760 -13.484665 74.865876
1763 8093.295722 -213.586285 -88.895179 75.334126
1764 5627.621790 -57.021421 39.181348 74.643997
1765 6400.314994 136.852178 83.482650 66.500987
1766 6418.916085 144.421525 1.191747 77.352706
1767 6431.141238 168.482323 110.252705 65.407266
1768 6306.554306 -302.252500 -151.424592 51.622702
1769 6420.236917 -196.023305 -95.371981 71.310984
1770 5321.313473 2.050106 -28.727743 57.401675
1771 5420.757525 146.189238 -27.174443 76.853809
1772 5453.012154 73.529547 57.321850 63.162448
1773 8020.831197 -197.150297 -160.717142 71.998956
1774 8112.791439 -135.805536 -101.281514 50.916272

In [230]:
chisqs = {# data_hypo
            'x_x':[],
            'x_w':[],
            'w_x':[],
            'w_w':[]}

data = {'x':df_a, 'w':df_w}
for system in sorted(df_w.system.unique()):
    for df in ['x', 'w']:
        for hypo in ['x', 'w']:
            chisq, results = fit_hypothesis(system=system, dataframe1=data[df], hypothesis=hypo)
            chisqs['_'.join([df, hypo])].append(chisq)
for item in chisqs:
    chisqs[item] = np.array(chisqs[item])

In [245]:
np.sum(chisqs['w_x']/chisqs['w_w'] < 1.0) / len(chisqs['w_x'])


Out[245]:
0.058020477815699661

In [253]:
np.sum(chisqs['x_x']/chisqs['x_w'] < 1.0) / len(chisqs['w_x'])


Out[253]:
0.90443686006825941

In [247]:
all_systems.head()


Out[247]:
#J2000 zem zabs da eda sample source sigflag imrotator transition
0 J000149-015940 2.31 2.09510 0.34 7.27 B1 Keck 2 0 d g h i j k l s t u v w
1 J000149-015940 2.31 2.15390 36.05 39.54 B1 Keck 1 0 d f g l
2 J000322-260316 4.11 1.43420 -12.53 11.67 C Keck 1 1 b c p r
3 J000322-260316 4.11 3.38970 -78.43 35.48 C Keck 1 1 d g l m
4 J000520+052410 1.90 0.59137 -31.05 24.33 C Keck 1 0 b c n p q r

In [251]:
angles = []
dipole_alphas = []
measured_alphas = []

for index, row in all_systems.iterrows():
    name = row['J2000']
    angles.append(j2000_to_theta(name))
    dipole_alphas.append(dipole_alpha(name))
    measured_alphas.append(row.delta_alpha)

In [264]:
plt.scatter(angles, dipole_alphas)
plt.scatter(angles, measured_alphas, alpha=0.5)


Out[264]:
<matplotlib.collections.PathCollection at 0x11dc76240>

In [263]:
plt.scatter(angles, measured_alphas)


Out[263]:
<matplotlib.collections.PathCollection at 0x11dc9a048>

In [ ]:
# Todo: ipywidgets minimization of spline + offsets for vshift

In [329]:
alpha = []
distortion = []
for (index, g) in df2.groupby('system'):
    X = sm.add_constant(g.x)
    model = sm.WLS(g.vshift, X, weights=1.0/g.sigma)
    results = model.fit()
    chisq = np.sum((g.vshift - results.fittedvalues)**2.0 / (g.sigma) ** 2.0)
    if chisq < 1e7:
        alpha.append(chisq)
    
    X = sm.add_constant(g.wavelength)
    model = sm.WLS(g.vshift, X, weights=1.0/g.sigma)
    results = model.fit()
    chisq = np.sum((g.vshift - results.fittedvalues)**2.0 / (g.sigma) ** 2.0)
    if chisq < 1e10:
        distortion.append(chisq)

alpha = np.array(alpha)
distortion = np.array(distortion)
fig, ax = plt.subplots(figsize=(12, 8))

ax.hist(alpha, alpha=0.5, 
                bins=np.linspace(0, 5, 50),
        label='Alpha hypothesis')
ax.hist(distortion, alpha=0.5, 
                        bins=np.linspace(0, 5, 50),
        label='Distortion hypothesis')
# ax.hist(alpha / distortion, 
#         bins=np.linspace(0, 5, 50), 
#         label='Wavelength',
#         alpha=0.5)
# ax.set_xlim(0, 5.0, )
ax.legend()
fig.tight_layout()


Alpha measurements as proxies for wavelength distortions

We have many measurements of $\alpha$ on both the Keck and VLT telescopes. Measurements with spectrographs both telescopes reveal statistically significant (though small in absolute terms) differences between the expected positions of absorption lines and the measured positions.

Hypothesis tests or statistical inference

There are many possible hypotheses that could explain these differences. To list just two:

  • the value of the fine-structure constant ($\alpha$) has changed
  • there are velocity distortions in the instrument.

We can consider any velocity shift measurements as a hypothesis test. Previous studies started with the assumption that any shift in absorption lines would come as a result of a change in $\alpha$. We have the reported $\frac{\Delta \alpha}{\alpha}$ values for both the Keck and VLT samples. If we invert the hypothesis: use the measured alpha values as a system of velocity shifts, we can effectively discriminate which hypothesis best fits the data.

Simulations

The goal will be to simulate what the velocity shifts look like if the dipole model is correct, and compare that to what we currently see.

Discussion

The end goal of this analysis is ultimately a change in how $\frac{\Delta \alpha}{\alpha}$ is measured. The proposed recipe is:

  1. Create the best fit velocity structure as usual.
  2. Fit for relative velocity shifts between all lines (vpfit already allows for this).
  3. Combine all velocity shifts for all measured systems with a particular telescope.
  4. Use standard hypothesis statistical tests to discriminate the most likely hypothesis: a wavelength distortion model, or a change in the fine-structure constant.

This will likely have rather low power for any single absorption system, but for ensembles, I think that it's likely the only case

Useful References (in reverse chronological order)


In [5]:
species = set([spec[0] for spec in qvals.index.str.split('I')])
colors = sns.color_palette(n_colors=len(species))
plot_colors = {}
for index, specie in enumerate(sorted(species)):
    plot_colors[specie] = colors[index]

In [11]:
def plot_absorption(specie=['Al', 'Mg'],
                    z=1.3,
                    daa=0.05,
                   ):
    fig, ax = plt.subplots(figsize=(12, 6))
    ax.hlines(1, 0, 10000)
    for tran, row in qvals[qvals.index.str.startswith(tuple(specie))].iterrows():
        ax.vlines((1.0 + z) * row.wave + row.qval * ((daa + 1)**2.0-1.0), 
                      0, 1, color=plot_colors[tran[:2]])
        ax.vlines((1.0 + z) * row.wave, 
                      0, 1, color=plot_colors[tran[:2]], alpha=0.2)
        
    ax.set_xlim(3000, 8e3)
    ax.set_ylim(0, 1.1)
    ax.set_ylabel("Normalized Flux")
    ax.set_xlabel(r"Wavelength $\AA$")
    
    for spec in specie:
        ax.vlines(-1, -1, 0, color=plot_colors[spec], label=spec)
    box = ax.get_position()
    ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
    fig.tight_layout()

plot_absorption(specie=['Al', 'Mg', 'Fe', 'Ni'], )



In [51]:
def plot_shift(specie=['Al', 'Mg'],
                    z=1.3,
                    daa=0.05,
                   ):
    fig, ax = plt.subplots(figsize=(12, 6))
    ax.hlines(0, 0, 10000)
    for tran, row in qvals[qvals.index.str.startswith(tuple(specie))].iterrows():
        ax.scatter((1.0 + z) * row.wave, 
#                    row.qval * ((daa + 1)**2.0-1.0),
                   shifted_velocity(daa, row.qval, (1.0 + z) * row.wave),
                   color=plot_colors[tran[:2]], zorder=3)
        
    ax.set_xlim(3000, 8e3)
    ax.set_ylim(-200, 200)
    ax.set_ylabel("Velocity Shift [m/s]")
    ax.set_xlabel(r"Observed Wavelength $\AA$")
    
    for spec in specie:
        ax.vlines(-1, -1, 0, color=plot_colors[spec], label=spec, zorder=-3)
    box = ax.get_position()
    ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
    fig.tight_layout()

plot_shift(specie=['Al', 'Mg', 'Fe', 'Ni'], )



In [52]:
redshift_slider = FloatSlider(value=2.1, min=0.0, max=3.5)
alpha_slider = FloatSlider(value=0.0, min=-5.0, max=5.0, step=0.1)
species_select = SelectMultiple(options=['Al', 'Cr', 'Fe', 'Mg', 'Mn', 'Ni', 'Si'],
                                description="Species",
                                value=['Al', 'Fe']
                               )
# w = interactive(plot_absorption, 
#                 specie=species_select,
#                 z=redshift_slider,
#                 daa=alpha_slider,
#                )
shift_interactive = interactive(plot_shift, 
                specie=species_select,
                z=redshift_slider,
                daa=alpha_slider,
               )

In [53]:
shift_interactive



In [42]:
all_systems.head()


Out[42]:
#J2000 zem zabs da eda sample source sigflag imrotator transition
0 J000149-015940 2.31 2.09510 0.34 7.27 B1 Keck 2 0 d g h i j k l s t u v w
1 J000149-015940 2.31 2.15390 36.05 39.54 B1 Keck 1 0 d f g l
2 J000322-260316 4.11 1.43420 -12.53 11.67 C Keck 1 1 b c p r
3 J000322-260316 4.11 3.38970 -78.43 35.48 C Keck 1 1 d g l m
4 J000520+052410 1.90 0.59137 -31.05 24.33 C Keck 1 0 b c n p q r

In [15]:
def plot_shifts(telescope='VLT'):
    fig, ax = plt.subplots(figsize=(12, 6))
    w, s = observed_shifts(telescope=telescope)
    sns.regplot(w, s, lowess=True, scatter_kws={'alpha':0.2}, ax=ax)
    ax.set_ylim(-200, 200)
    ax.set_xlabel(r"Observed Wavelength [$\AA$]")
    ax.set_ylabel("Velocity [m/s]")
    ax.set_title(telescope)
    fig.tight_layout()

In [16]:
plot_shifts(telescope='VLT')



In [17]:
plot_shifts(telescope='Keck')



In [59]:
def plot_all(title='Keck + VLT'):
    fig, ax = plt.subplots(figsize=(12, 8))
    telescope='VLT'
    w, s = observed_shifts(telescope=telescope)
    sns.regplot(w, s, lowess=True, scatter_kws={'alpha':0.5}, ax=ax, label=telescope)
    telescope='Keck'
    w, s = observed_shifts(telescope=telescope)
    sns.regplot(w, s, lowess=True, scatter_kws={'alpha':0.5}, ax=ax, label=telescope)
    ax.set_xlim(2000, 10000)
    ax.set_ylim(-200, 200)
    ax.legend()
    ax.set_xlabel(r"Observed Wavelength [$\AA$]")
    ax.set_ylabel("Velocity [m/s]")
    ax.set_title(title)
    fig.tight_layout()

In [60]:
plot_all()



In [61]:
def observed_shifts_by_sample(sample='A'):
    waves = []
    shifts = []
    for index, row in all_systems[all_systems['sample'] == sample].iterrows():
        for tran in row['transitions'].split():
            rest_wave = qvals.loc[codes.loc[tran].trans].wave
            measured_wave = rest_wave * (1 + row.z_absorption)
            qval = qvals.loc[codes.loc[tran].trans].qval
            waves.append(measured_wave)
            shifts.append(shifted_velocity(row.delta_alpha, qval, rest_wave))
    return np.array(waves), np.array(shifts)

def plot_samples():
    fig, ax = plt.subplots(figsize=(12, 8))
    telescope='VLT'
    for sample in sorted(all_systems['sample'].unique()):
        w, s = observed_shifts_by_sample(sample=sample)
        sns.regplot(w, s, lowess=True, scatter_kws={'alpha':0.5}, ax=ax, label=sample)
    ax.set_ylim(-400, 400)
    ax.set_xlim(2000, 10000)
    ax.legend()
#     frame = ax.get_frame()
#     frame.set_alpha(1.0)
    ax.set_xlabel(r"Observed Wavelength [$\AA$]")
    ax.set_ylabel("Velocity [m/s]")
    ax.set_title('Samples')
    fig.tight_layout()

In [62]:
plot_samples()


Simulations

Generate velocity shifts for the dipole hypothesis.


In [54]:
w



In [74]:
def plot_sim(telescope='VLT', use_dipole=True):
    fig, ax = plt.subplots(figsize=(12, 6))
    w, s = observed_shifts(telescope=telescope)
    sns.regplot(w, s, lowess=True, scatter_kws={'alpha':0.05}, ax=ax, label='Implied by Measured VLT')
    w, s = simulated_shifts(telescope=telescope, use_dipole=use_dipole)
    sns.regplot(w, s, lowess=True, scatter_kws={'alpha':0.05}, ax=ax, label='Implied by Dipole VLT')
    ax.set_ylim(-200, 200)
    ax.legend()
    ax.set_xlabel(r"Observed Wavelength [$\AA$]")
    ax.set_ylabel("Velocity [m/s]")
    ax.set_title('Simulated velocity shifts for ' + telescope)
    fig.tight_layout()

In [75]:
plot_sim()



In [69]:
def simulated_shifts(telescope='VLT', use_dipole=True):
    waves = []
    shifts = []
    for index, row in all_systems[all_systems.source.eq(telescope)].iterrows():
        for tran in row['transitions'].split():
            rest_wave = qvals.loc[codes.loc[tran].trans].wave
            measured_wave = rest_wave * (1 + row.z_absorption)
            qval = qvals.loc[codes.loc[tran].trans].qval
            waves.append(measured_wave)
#             da = row.delta_alpha
            if use_dipole:
                da = row['dipole_delta_alpha']
            else:
                da = -3.0
            shifts.append(shifted_velocity(da, qval, rest_wave))
    return np.array(waves), np.array(shifts)

plot_sim(use_dipole=False)



In [ ]:


In [ ]:

Hypothesis test

I think that a non-parametric test for $\alpha$ would be to fit for all velocity shifts as independent shifts.

The general procedure would be thus:

  • fit velocity shifts per wavelength region (with errors).

In [80]:



Out[80]:
#J2000 zem zabs da eda sample source sigflag imrotator transition
152 J004131-493611 3.24 2.2485 -12.3 6.72 D VLT 3 0 j1 j2 j3 j6 j7 j8 c1 d1 d2 e2 h1 h2 h3 l1 l2 k...

In [99]:
telescope = 'VLT'
row = all_systems[all_systems.source.eq(telescope)].sample(1,
                                                   random_state=2
                                                  ).iloc[0]
waves = []
shifts = []
for tran in row['transitions'].split():
    rest_wave = qvals.loc[codes.loc[tran].trans].wave
    measured_wave = rest_wave * (1 + row.z_absorption)
    qval = qvals.loc[codes.loc[tran].trans].qval
    waves.append(measured_wave)
#             da = row.delta_alpha
#     if use_dipole:
    da = row['dipole_delta_alpha']
#     else:
#         da = -3.0
    shifts.append(shifted_velocity(da, qval, rest_wave))
wav, shift = np.array(waves), np.array(shifts)



fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(wav, shift)
ax.set_ylim(-200, 200)
# ax.legend()
ax.set_xlabel(r"Observed Wavelength [$\AA$]")
ax.set_ylabel("Velocity [m/s]")
ax.set_title('Simulated velocity shifts for ' + telescope)
fig.tight_layout()



In [ ]:
telescope = 'VLT'
row = all_systems[all_systems.source.eq(telescope)].sample(1,
                                                   random_state=2
                                                  ).iloc[0]
waves = []
shifts = []
for tran in row['transitions'].split():
    rest_wave = qvals.loc[codes.loc[tran].trans].wave
    measured_wave = rest_wave * (1 + row.z_absorption)
    qval = qvals.loc[codes.loc[tran].trans].qval
    waves.append(measured_wave)
#             da = row.delta_alpha
#     if use_dipole:
    da = row['dipole_delta_alpha']
#     else:
#         da = -3.0
    shifts.append(shifted_velocity(da, qval, rest_wave))
wav, shift = np.array(waves), np.array(shifts)



fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(wav, shift)
ax.set_ylim(-200, 200)
# ax.legend()
ax.set_xlabel(r"Observed Wavelength [$\AA$]")
ax.set_ylabel("Velocity [m/s]")
ax.set_title('Simulated velocity shifts for ' + telescope + " " + str(np.round(da, 2)))
fig.tight_layout()

In [113]:
telescope = 'VLT'
row = all_systems[all_systems.source.eq(telescope)].sample(1,
                                                   random_state=3
                                                  ).iloc[0]
waves = []
shifts = []
for tran in row['transitions'].split():
    rest_wave = qvals.loc[codes.loc[tran].trans].wave
    measured_wave = rest_wave * (1 + row.z_absorption)
    qval = qvals.loc[codes.loc[tran].trans].qval
    waves.append(measured_wave)
#             da = row.delta_alpha
#     if use_dipole:
    da = row['dipole_delta_alpha']
#     else:
#         da = -3.0
    shifts.append(shifted_velocity(da, qval, rest_wave))
wav, shift = np.array(waves), np.array(shifts)



fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(wav, shift)
ax.set_ylim(-200, 200)
# ax.legend()
ax.set_xlabel(r"Observed Wavelength [$\AA$]")
ax.set_ylabel("Velocity [m/s]")
ax.set_title('Simulated velocity shifts for ' + telescope + " " + str(da))
fig.tight_layout()



In [112]:
telescope = 'VLT'
row = all_systems[all_systems.source.eq(telescope)].sample(1,
                                                   random_state=8
                                                  ).iloc[0]
waves = []
shifts = []
for tran in row['transitions'].split():
    rest_wave = qvals.loc[codes.loc[tran].trans].wave
    measured_wave = rest_wave * (1 + row.z_absorption)
    qval = qvals.loc[codes.loc[tran].trans].qval
    waves.append(measured_wave)
    da = row['dipole_delta_alpha']
    shifts.append(shifted_velocity(da, qval, rest_wave))
wav, shift = np.array(waves), np.array(shifts)


fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(wav, shift)

waves = []
shifts = []
for tran in row['transitions'].split():
    rest_wave = qvals.loc[codes.loc[tran].trans].wave
    measured_wave = rest_wave * (1 + row.z_absorption)
    qval = qvals.loc[codes.loc[tran].trans].qval
    waves.append(measured_wave)
    da = row.delta_alpha
    shifts.append(shifted_velocity(da, qval, rest_wave))
wav, shift = np.array(waves), np.array(shifts)

ax.scatter(wav, shift)
ax.set_xlabel(r"Observed Wavelength [$\AA$]")
ax.set_ylabel("Velocity [m/s]")
ax.set_title('Simulated velocity shifts for ' + telescope + " " + str(da))
fig.tight_layout()



In [ ]:


In [ ]:


In [ ]:


In [222]:
#  Todo consider including the outliers stripped via the LTS method
# # Outliers http://astronomy.swin.edu.au/~mmurphy/files/KingJ_12a_VLT+Keck.dat
# 127   J194454+770552     3.02    2.8433     -4.959     1.334    C          Keck    1             1
# http://astronomy.swin.edu.au/~mmurphy/MurphyM_09a.dat
# -          -     2.8433  dgl              -4.959 1.334 0.000  0.149 0.000 C

# 145   J000448-415728     2.76    1.5419     -5.270     0.906    D          VLT     3             1
# TableA1 
# b1b2j4j5j6j7j8


# # 
# for trans in vlt[vlt.J2000.str.startswith('J043037')]['transitions']:
#     trans = trans.split()
#     for tran in trans:
#         print(tran, codes.loc[tran].trans)