In a previous document (BaselineFlagging), I examined the possibility of mitigating bad baselines by "flagging out" the worst offenders. In this document, I'll go into more detail on a specific data set to see how it performs.
I'll check the following criteria:
In [1]:
from pyspeckit.spectrum.readers import read_class
cl = read_class.ClassObject('/Users/adam/work/gc/apex/april2014/M-093.F-0009-2014-2014-04/M-093.F-0009-2014-2014-05-04.apex')
cl
Out[1]:
In [2]:
spectra = cl.get_spectra(sourcere='^MAP_115', telescope='AP-H201-X202')
In [3]:
# Get the data as an array ("spectra" is a list of spectrum,header pairs)
data = np.array([s for s,h in spectra])
header = spectra[0][1]
data.shape
Out[3]:
In [4]:
mean_spectrum = data.mean(axis=0)
dft = np.fft.fft(data, axis=1)
mode = 3 # the fourier mode to flag
mode_data = np.abs(dft[:, mode])
fraction = 0.05
threshold = np.percentile(mode_data, 100*(1-fraction))
good = mode_data < threshold
mean_spectrum_flagged = data[good].mean(axis=0)
In [24]:
import pylab as pl
pl.rcParams['figure.figsize'] = [16,8]
In [25]:
h,l,p = pl.hist(mode_data, bins=50)
lines = pl.vlines(threshold,0,h.max(),color='k',linestyle='--')
In [26]:
pl.plot(mean_spectrum, 'k,', alpha=0.5)
pl.plot(mean_spectrum_flagged, 'r,', alpha=0.5)
Out[26]:
In [27]:
pl.plot(mean_spectrum-mean_spectrum_flagged, 'k,')
Out[27]:
The 5% worst spectra have significant baseline ripples, so perhaps it's worth mitigating those.
In [39]:
frq = np.fft.fftfreq(data.shape[1])
vres = header['VRES']
fvr = data.shape[1]*vres
fvelo = fvr/(frq/frq[1])/2.
meanpsd = np.abs(dft).mean(axis=0)
meanpsdflagged = np.abs(dft[good,:]).mean(axis=0)
In [40]:
pl.loglog(frq, meanpsd, 'k')
pl.loglog(frq, meanpsdflagged, 'r')
pl.xlim(frq[meanpsd>0].min(), frq[meanpsd>0].max())
pl.ylim(1e2,1e3)
Out[40]:
This PSD shows that removing the top 5% of the mode #3 spectra didn't actually affect the power spectrum that much.
Unlike the high frequency spectra, though, there are no obvious peaks at high (fourier) frequency to flag out.
In [29]:
pl.loglog(frq, meanpsd, 'k')
pl.loglog(frq, meanpsdflagged, 'r')
pl.xlim(3e-3,9e-3)
pl.ylim(1.5e2,4e2)
from matplotlib.ticker import FormatStrFormatter
pl.gca().xaxis.set_minor_formatter(FormatStrFormatter("%.2g"))
pl.gca().yaxis.set_minor_formatter(FormatStrFormatter("%.2g"))
In [30]:
ohohfour = np.argmin(np.abs(frq-0.004))
ohohsix = np.argmin(np.abs(frq-0.006))
ohoheight = np.argmin(np.abs(frq-0.008))
peak1 = np.argmax(meanpsd[ohohfour:ohohsix]) + ohohfour
peak2 = np.argmax(meanpsd[ohohsix:ohoheight]) + ohohsix
print "Fourier frequencies of the peaks: ",frq[peak1],frq[peak2]
print "Velocity scales: ",fvelo[peak1],fvelo[peak2]
In [58]:
# See what happens if we flag these modes
mode_data1 = np.abs(dft[:, peak1])
threshold1 = np.percentile(mode_data1, 100*(1-fraction))
good1 = mode_data1 < threshold1
mean_spectrum_flagged1 = data[good1].mean(axis=0)
mode_data2 = np.abs(dft[:, peak2])
threshold2 = np.percentile(mode_data2, 100*(1-fraction))
good2 = mode_data2 < threshold2
mean_spectrum_flagged2 = data[good2].mean(axis=0)
Out[58]:
In [32]:
h,l,p = pl.hist(np.abs(dft[:,peak1]), bins=50, histtype='step', color='b')
lines = pl.vlines(threshold1,0,h.max(),color='b',linestyle='--')
h,l,p = pl.hist(np.abs(dft[:,peak2]), bins=50, histtype='step', color='g')
lines = pl.vlines(threshold2,0,h.max(),color='g',linestyle='--')
In [56]:
pl.title("Unflagged")
pl.plot(mean_spectrum, 'k.', alpha=0.5)
pl.xlim(10000,15000)
pl.ylim(0.25,0.5)
pl.figure()
pl.title("Flagged")
pl.plot(mean_spectrum_flagged1, 'b.', alpha=0.5)
pl.xlim(10000,15000)
pl.ylim(0.25,0.5)
pl.figure()
pl.plot(mean_spectrum-mean_spectrum_flagged1, 'b.', alpha=0.5)
pl.xlim(10000,15000)
pl.ylim(-0.03, 0.04)
Out[56]:
This is a very bad result: flagging out the 5% "worst" spectra in this bin made the result look terrible! This is disturbing, and probably indicates that the mean looks OK only because these gigantic and terrible looking features are getting averaged out. But, of course, they aren't getting averaged out in the maps, so the map spectra look like panel 3 here.
In [57]:
pl.plot(mean_spectrum, 'k.', alpha=0.5)
pl.xlim(10000,15000)
pl.ylim(0.25,0.5)
pl.figure()
pl.plot(mean_spectrum_flagged2, 'g.', alpha=0.5)
pl.xlim(10000,15000)
pl.ylim(0.25,0.5)
pl.figure()
pl.plot(mean_spectrum-mean_spectrum_flagged2, 'g.')
pl.xlim(10000,15000)
pl.ylim(-0.02,0.03)
Out[57]:
Now let's examine the effect on the PSDS:
In [64]:
meanpsdflagged1 = np.abs(dft[good1,:]).mean(axis=0)
meanpsdflagged2 = np.abs(dft[good2,:]).mean(axis=0)
In [65]:
pl.loglog(frq, meanpsd, 'k')
pl.loglog(frq, meanpsdflagged, 'r')
pl.loglog(frq, meanpsdflagged1, 'b')
pl.loglog(frq, meanpsdflagged2, 'g')
pl.xlim(3e-3,9e-3)
pl.ylim(1.5e2,4e2)
from matplotlib.ticker import FormatStrFormatter
pl.gca().xaxis.set_minor_formatter(FormatStrFormatter("%.2g"))
pl.gca().yaxis.set_minor_formatter(FormatStrFormatter("%.2g"))
There's nothing remarkable about the PSD; the suppression is spread over frequency.
In [35]:
lon = np.array([h['OFF1'] for s,h in spectra])
lat = np.array([h['OFF2'] for s,h in spectra])
In [49]:
pl.plot(lon, lat, 'ks', markeredgecolor='none', markersize=20)
pl.plot(lon[~good], lat[~good], 'r.', alpha=0.8)
Out[49]:
In [50]:
pl.plot(lon, lat, 'ks', markeredgecolor='none', markersize=20)
pl.plot(lon[~good1], lat[~good1], 'g.', alpha=0.8)
Out[50]:
In [51]:
pl.plot(lon, lat, 'ks', markeredgecolor='none', markersize=20)
pl.plot(lon[~good2], lat[~good2], 'b.', alpha=0.8)
Out[51]:
The spatial distribution of the bad spectra is mostly random, with occasional peaks on individual scans, so there is no obvious harm in flagging them.
The flagging above was based primarily on work done in another notebook on a different data set, so I want to assess whether there might be more logical fourier modes to flag on
In [63]:
for ii in range(1,13):
if (ii-1)%3 == 0: pl.figure()
pl.hist(np.abs(dft[:,ii]), bins=50, histtype='step', label=str(ii))
pl.legend(loc='best')
In [ ]: