In [ ]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
from datetime import date, datetime, timedelta, time
import pandas as pd
import seaborn
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
from pkg_resources import get_distribution
In [ ]:
seaborn.set_context('talk')
seaborn.set_style('white')
A data-exploration script accompanying the AMIsurvey package, described in Staley & Anderson (in prep). It is used to explore the metadata recorded during reduction of AMI-LA datasets.
Code versions used to generate this notebook:
In [ ]:
print get_distribution('drive-ami')
print get_distribution('drive-casa')
print get_distribution('chimenea')
print get_distribution('amisurvey')
In [ ]:
import json
from collections import OrderedDict
import chimenea
from chimenea.obsinfo import ObsInfo
In [ ]:
datafile = "./good_files_reduced3_processed.json"
In [ ]:
with open(datafile) as f:
rawdata=json.load(f, cls=ObsInfo.Decoder)
In [ ]:
len(rawdata)
In [ ]:
rawdata_dict={obs.name: obs for obs in rawdata}
In [ ]:
# testobs=rawdata[1]
# print testobs.rms_delta
# # testobs.name
# testobs.maps_masked.ms.image
# testsrcs = testobs.meta['masked_sources']
# testsrcs
# max([s[4] for s in testsrcs])
Build a set of records by pulling key bits of data from the reduction metadata JSON-dump:
In [ ]:
def pull_metrics(obsinf):
brightest=0.
if 'masked_sources' in obsinf.meta:
sources = obsinf.meta['masked_sources']
brightest = max([src[4] for src in sources])
record = dict(
group = obsinf.group,
rms_dirty=obsinf.rms_dirty,
rms_dirty_naive=obsinf.rms_dirty_naive,
rms_best=obsinf.rms_best,
rms_delta=obsinf.rms_delta,
n_rms_estimates=len(obsinf.rms_history),
masked_clean = (obsinf.maps_masked.ms.image is not None),
flagged=obsinf.meta['flagged_final_percent'],
brightest_mjy=brightest*1000.,
duration = obsinf.meta['duration_hrs']
)
if record['masked_clean']:
record['rms_single_clean']=obsinf.rms_history[1]
return (obsinf.name, record)
In [ ]:
mets = [pull_metrics(obs) for obs in rawdata]
idx, data = zip(*mets)
Load the records into a pandas dataframe, for convenience:
In [ ]:
all_obs=pd.DataFrame.from_records(index=idx, data=data)
all_obs.head()
Filter out GRB110328A, as it's part of a long-term monitoring program, not our rapid-response mode.
In [ ]:
all_obs = all_obs.loc[all_obs.group!='GRB110328A']
How much data are we dealing with?
In [ ]:
len(all_obs)
In [ ]:
print all_obs.duration.sum()/24, "days"
How many targets / fields of view?
In [ ]:
len(all_obs.group.unique())
Typical observation length?
In [ ]:
all_obs.duration.mean(), all_obs.duration.median()
In [ ]:
all_obs.duration.hist(bins=40)
ax=plt.gca()
ax.set_title('Observation durations')
ax.set_xlabel('Integration time [hours]')
In [ ]:
# all_obs[all_obs.duration>3.5].group.unique()
A quick comparison of RMS estimation methods, out of curiosity:
In [ ]:
all_obs['sd_bias']=all_obs.rms_dirty_naive / all_obs.rms_dirty
all_obs.hist('sd_bias')
ax=plt.gca()
ax.set_title('Comparison of naive and debiased std.dev. estimators, as applied to dirty maps')
ax.set_xlabel('Naive s.d. / debiased s.d.')
In [ ]:
# all_obs[all_obs.sd_bias<0.9]
# all_obs[all_obs.sd_bias<1.1]
OK: find the observations which had a steady source in the deep image, used to create a Clean-mask:
In [ ]:
print all_obs.shape
masked_obs = all_obs.loc[all_obs.masked_clean].copy()
In [ ]:
print len(masked_obs), "observations from "
print len(masked_obs.group.unique()), "fields containing a source in the deep image."
We always obtain one RMS estimate initially from the dirty map. After that, for masked re-cleans, we store an extra RMS estimate after each cycle. (We always perform at least one masked-clean cycle). So, we can check the typical number of clean-cycles as follows:
In [ ]:
n_rms_est = masked_obs.n_rms_estimates.value_counts()
n_masked_clean_cyles = pd.Series(index = n_rms_est.index -1 , data = n_rms_est.values)
print "N recleans:"
print n_masked_clean_cyles
In [ ]:
n_masked_clean_cyles=n_masked_clean_cyles.sort_index()
ax = n_masked_clean_cyles.plot(kind='bar')
fig = ax.get_figure()
locs, labels = plt.xticks()
_ = plt.setp(labels, rotation=0)
# fig.autofmt_xdate()
ax.set_title('Number of masked re-clean cycles until convergence')
ax.set_xlabel('Number of masked re-clean cycles')
ax.set_ylabel('Number of observations')
What are the final re-clean iteration RMS decrease values like?
In [ ]:
masked_obs.hist(column='rms_delta')
ax = plt.gca()
ax.set_title('RMS decrease from final re-clean iteration')
ax.set_xlabel('RMS proportional decrease')
ax.set_ylabel('Number of observations')
And how about for those observations which took 3 re-clean iterations?
In [ ]:
masked_obs[masked_obs.n_rms_estimates==4].hist(column='rms_delta')
ax = plt.gca()
ax.set_title('RMS decrease from final re-clean iteration, where 3 iterations used')
ax.set_xlabel('RMS proportional decrease')
ax.set_ylabel('Number of observations')
It appears that only one observation in 943 fails our 're-clean convergence' criteria after 3 cycles. Inspection of the concatenated image suggests extended emission / blended sources in this field (SN2014C):
In [ ]:
masked_obs.loc[masked_obs.rms_delta>0.05]
We can also plot the total proportional decrease in RMS, comparing the RMS estimated from the dirty map and the final RMS value. For fields with only faint sources, we expect this to be close to 1.
In [ ]:
masked_obs['rms_drop_total']=masked_obs['rms_best']/masked_obs['rms_dirty']
masked_obs.hist('rms_drop_total')
ax=plt.gca()
ax.set_title('Total proportional RMS decrease (compared to initial dirty-map estimate)')
ax.set_xlabel('Total prop. RMS decrease')
ax.set_ylabel('Number of observations')
In [ ]:
# masked_obs.loc[masked_obs.rms_drop_total <0.5]
We can also compare the iterative 're-Clean' process against a single Clean operation (both using a Clean mask).
In [ ]:
masked_obs['rms_drop_single']=masked_obs['rms_best']/masked_obs['rms_single_clean']
masked_obs.hist('rms_drop_single')
ax=plt.gca()
ax.set_title('Proportional RMS decrease due to re-Clean (compared to single Clean operation)')
ax.set_xlabel('Prop. RMS decrease')
ax.set_ylabel('Number of observations')
In [ ]:
# masked_obs[masked_obs.rms_drop_single<0.9].sort('duration')
Even for those observations which undergo a fairly extreme RMS decrease, the RMS seems to have converged pretty well by the third re-clean cycle:
In [ ]:
for key in (masked_obs.loc[masked_obs['rms_drop_total']<0.4]).index:
history = np.array(rawdata_dict[key].rms_history)
plt.plot(history/history[-1])
# plt.ylim(.99,1.5)
ax = plt.gca()
ax.set_title('RMS history for observations with rms_drop<0.4')
ax.set_xlabel('Re-clean cycle')
ax.set_ylabel('RMS normalised by final RMS value')
Here's the same plot, zoomed in a bit:
In [ ]:
for key in (masked_obs.loc[masked_obs['rms_drop_total']<0.4]).index:
history = np.array(rawdata_dict[key].rms_history)
plt.plot(history/history[-1])
ax = plt.gca()
ax.set_title('RMS history for observations with rms_drop<0.4')
ax.set_xlabel('Re-clean cycle')
ax.set_ylabel('RMS normalised by final RMS value')
plt.xlim(0.5,3)
plt.ylim(.99,1.5)
A question: Does flagging percentage have a noticeable relation to severity of RMS decrease? First, let's see the all-obs histogram of flagging percentages:
In [ ]:
masked_obs.hist('flagged')
ax = plt.gca()
ax.set_title('Histogram of flagging percentages for all masked-clean observations')
Now, let's compare with the flagging histogram for sources which undergo RMS reduction by a factor of 0.4, or even less:
In [ ]:
masked_obs.loc[masked_obs.rms_drop_total < 0.4].hist('flagged')
ax = plt.gca()
ax.set_title('Histogram of flagging percentages for masked-clean obs. which undergo extreme RMS reduction')
Result: Flagging does not appear to have a pronounced effect on RMS estimation, flagging distribution is similar for those obs which see more extreme RMS reduction. If anything, observations with low levels of flagging are likely to see more reduction in estimated RMS.
The flagged / rms_drop scatterplot seems to tell the same story:
In [ ]:
masked_obs.plot(kind='scatter', x='flagged', y='rms_drop_total')
OK: now lets compare RMS decrease against the flux of the brightest source in the field. We expect these to be correlated - a bright source causes sidelobe artefacts in the dirty map which lead to increased (biased) estimates of the underlying RMS. As we iteratively apply masked-cleans to lower RMS thresholds, the sidelobes are cleaned and RMS estimates should bottom out at the intrinsic (unbiased) value.
To do this, we plot the total RMS decrease against the flux of the brightest source identified in the concatenated image. We expect fields without any bright sources to undergo minimal RMS decrease. At some point, as the brightest source flux increases, we might expect strange behaviour as the default AMI-LA calibration settings are no longer appropriate.
After some data exploration, a number of 'outlier' datasets have been identified in the faint-flux regime, which present more extreme levels of RMS decrease than might be expected given the general trend. Investigation reveals a few reasons for these anomalies:
In [ ]:
mobs = masked_obs
# mobs = masked_obs[masked_obs.duration < 1.5]
# Extended sources in field of XTEJ1098 / (J1908? typo somewhere)
filtered = mobs.loc[mobs.group!='XTEJ1098+094']
# filtered=filtered[filtered.duration>3.5]
XTEJ1098 = mobs.loc[mobs.group=='XTEJ1098+094']
# Bright single-epoch source in SWIFT_554620 dataset (GRB140327A)
filtered= filtered[ [not g.startswith('SWIFT_554620') for g in filtered.group] ]
SWIFT_554620= mobs[ [g.startswith('SWIFT_554620') for g in mobs.group] ]
# Extended / multi-component sources in field of PTF09AXC
filtered= filtered[ (filtered.index!= 'PTF09AXC-140525')]
PTF09AXC = mobs.loc['PTF09AXC-140525']
# filtered= filtered[ (filtered.group != 'GRB110328A')]
# GRB110328A = mobs[ (mobs.group == 'GRB110328A')]
# short_filtered = filtered
In [ ]:
seaborn.set_context('poster')
In [ ]:
# SWIFT_554620
In [ ]:
# flagmax=40
# ax = filtered.loc[filtered.flagged<flagmax].plot(kind='scatter', x='brightest', y='rms_drop')
# filtered.loc[filtered.flagged>flagmax].plot(kind='scatter', x='brightest', y='rms_drop',
# marker='*', s=80, ax=ax, )
plt.clf()
def makeplot(ax, yquant):
filtered.plot(kind='scatter', x='brightest_mjy', y=yquant,
label='All other observations',
ax=ax)
# filtered[filtered.duration>3.5].plot(kind='scatter', x='brightest_mjy', y=yquant,
# label='All other observations with $t_{obs} > 3.5$ hrs',
# color='blue',
# ax=ax)
# filtered[(filtered.duration>1.5)&(filtered.duration<2.5)].plot(kind='scatter', x='brightest_mjy', y=yquant,
# # label='All other datasets',
# label='Observations 1.5>t>2.5',
# marker='^',
# ax=ax)
SWIFT_554620.plot(kind='scatter', x='brightest_mjy', y=yquant,
ax=ax, marker='v', s=80, c='g', label='SWIFT_554620')
XTEJ1098.plot(kind='scatter', x='brightest_mjy', y=yquant,
ax=ax, marker='d', s=80, c='r', label='XTEJ1908+094')
ax.scatter(PTF09AXC.brightest_mjy, PTF09AXC[yquant],
marker='x', s=80, lw=5, c='r', label='PTF09AXC')
# mdf.loc['PTF09AXC-140525'].plot(kind='scatter', x='brightest', y='rms_drop',
# ax=ax, marker='d', s=80, c='g', label='PTF09AXC-140525')
# plt.legend(loc='upper right')
ax.set_ylabel('Proportional drop in RMS')
ax.set_xlabel('Brightest masked source flux [mJy]')
# frame = plt.legend(loc='lower right')
legend = plt.legend(frameon = 1, loc='lower left', borderpad=1)
frame = legend.get_frame()
frame.set_color('white')
frame.set_edgecolor('black')
# fig, axes = plt.subplots(2)
# ax0, ax1 = axes
ax0 = plt.gca()
yquant= 'rms_drop_total'
title = 'Total fractional RMS decrease'
# yquant= 'rms_drop_single'
# title = 'Fractional RMS decrease from re-Clean'
plt.gcf().suptitle(title, size=18)
makeplot(ax0, yquant)
# filtered.plot(kind='scatter', x='brightest', y=yquant,
# label='Observations conforming to general trend',ax=ax0)
ax0.set_title('Fields with sources < 2mJy')
ax0.set_xlim(0,2)
# ax0.set_title('Fields with sources < 10mJy')
# ax0.set_xlim(0,0.01)
# ax0.set_ylim(0.9,1.02)
plt.savefig('rms_to_2mjy.pdf')
# plt.savefig('rms_to_2mjy_4hrs.pdf')
In [ ]:
plt.gcf().suptitle(title, size=18)
ax1 = plt.gca()
makeplot(ax1, yquant)
# ax1.set_title('All observations')
ax1.set_title('Fields with sources < 10mJy')
ax1.set_xlim(0,10)
# ax1.set_ylim(0.9,1.05)
plt.savefig('rms_to_10mjy.pdf')
# plt.savefig('rms_to_10mjy_4hrs.pdf')
In [ ]:
mobs[(0.0009<mobs.brightest_mjy) & (mobs.brightest_mjy<0.01)].sort(yquant).head()
# len(mobs[(mobs.brightest>0.01)])
# filtered[(filtered.rms_drop<0.7) & (filtered.brightest<0.001)]
# filtered[(filtered.rms_drop<0.7) & (filtered.brightest<0.001)]
In [ ]:
# rawdata_dict['DEL2013-140703']
In [ ]:
# mobs[mobs.group=="GRB110328A"].duration.hist()
In [ ]:
# filtered[filtered.duration<0.4]
We also see more pronounced RMS-decrease when processing shorter observations (due to poor uv-plane coverage).
In [ ]:
temp = filtered[ (3.5<filtered.duration) & (filtered.brightest_mjy<10)]
print len(temp)
len(temp.group.unique())
In [ ]:
filtered.plot(kind='scatter', x='duration', y=yquant,
label='All other datasets',)
$5<5$
In [ ]: