In a previous document (BaselineFlagging), I examined the possibility of mitigating bad baselines by "flagging out" the worst offenders. In this document, I'll go into more detail on a specific data set to see how it performs.
I'll check the following criteria:
In [1]:
from pyspeckit.spectrum.readers import read_class
cl = read_class.ClassObject('/Users/adam/work/gc/apex/april2014/M-093.F-0009-2014-2014-04/M-093.F-0009-2014-2014-05-04.apex')
cl
Out[1]:
In [2]:
spectra = cl.get_spectra(sourcere='^MAP_115', telescope='AP-H201-X201')
In [3]:
# Get the data as an array ("spectra" is a list of spectrum,header pairs)
data = np.array([s for s,h in spectra])
header = spectra[0][1]
data.shape
Out[3]:
In [4]:
mean_spectrum = data.mean(axis=0)
dft = np.fft.fft(data, axis=1)
mode = 3 # the fourier mode to flag
mode_data = np.abs(dft[:, mode])
fraction = 0.05
threshold = np.percentile(mode_data, 100*(1-fraction))
good = mode_data < threshold
mean_spectrum_flagged = data[good].mean(axis=0)
In [20]:
import pylab as pl
pl.rcParams['figure.figsize'] = [16,8]
In [21]:
h,l,p = pl.hist(mode_data, bins=50)
lines = pl.vlines(threshold,0,h.max(),color='k',linestyle='--')
In [22]:
import pylab as pl
pl.plot(mean_spectrum, 'k,', alpha=0.5)
pl.plot(mean_spectrum_flagged, 'r,', alpha=0.5)
Out[22]:
In [23]:
pl.plot(mean_spectrum-mean_spectrum_flagged, 'k,')
Out[23]:
The 5% worst spectra have significant baseline ripples, so perhaps it's worth mitigating those.
In [25]:
frq = np.fft.fftfreq(data.shape[1])
vres = header['VRES']
fvr = data.shape[1]*vres
fvelo = fvr/(frq/frq[1])/2.
meanpsd = np.abs(dft).mean(axis=0)
meanpsdflagged = np.abs(dft[good,:]).mean(axis=0)
In [32]:
pl.loglog(frq, meanpsd, 'k')
pl.loglog(frq, meanpsdflagged, 'r')
pl.xlim(frq[meanpsd>0].min(), frq[meanpsd>0].max())
pl.ylim(1e2,2e3)
Out[32]:
This PSD shows that removing the top 5% of the mode #3 spectra didn't actually affect the power spectrum that much. Also, though, it lets us see that there are two narrow, high-frequency regions that have high amplitude.
In [27]:
pl.loglog(frq, meanpsd, 'k')
pl.loglog(frq, meanpsdflagged, 'r')
pl.xlim(3e-3,9e-3)
pl.ylim(1.5e2,4e2)
from matplotlib.ticker import FormatStrFormatter
pl.gca().xaxis.set_minor_formatter(FormatStrFormatter("%.2g"))
pl.gca().yaxis.set_minor_formatter(FormatStrFormatter("%.2g"))
In [28]:
ohohfour = np.argmin(np.abs(frq-0.004))
ohohsix = np.argmin(np.abs(frq-0.006))
ohoheight = np.argmin(np.abs(frq-0.008))
peak1 = np.argmax(meanpsd[ohohfour:ohohsix]) + ohohfour
peak2 = np.argmax(meanpsd[ohohsix:ohoheight]) + ohohsix
print "Fourier frequencies of the peaks: ",frq[peak1],frq[peak2]
print "Velocity scales: ",fvelo[peak1],fvelo[peak2]
In [29]:
# See what happens if we flag these modes
mode_data1 = np.abs(dft[:, peak1])
threshold1 = np.percentile(mode_data1, 100*(1-fraction))
good1 = mode_data1 < threshold1
mean_spectrum_flagged1 = data[good1].mean(axis=0)
mode_data2 = np.abs(dft[:, peak2])
threshold2 = np.percentile(mode_data2, 100*(1-fraction))
good2 = mode_data2 < threshold2
mean_spectrum_flagged2 = data[good2].mean(axis=0)
In [30]:
h,l,p = pl.hist(np.abs(dft[:,peak1]), bins=50, histtype='step', color='b')
lines = pl.vlines(threshold1,0,h.max(),color='b',linestyle='--')
h,l,p = pl.hist(np.abs(dft[:,peak2]), bins=50, histtype='step', color='g')
lines = pl.vlines(threshold2,0,h.max(),color='g',linestyle='--')
In [39]:
pl.plot(mean_spectrum, 'k,', alpha=0.5)
pl.plot(mean_spectrum_flagged1, 'b,', alpha=0.5)
#pl.xlim(5000,10000)
pl.figure()
pl.plot(mean_spectrum-mean_spectrum_flagged1, 'b.', alpha=0.5)
#pl.xlim(5000,10000)
pl.ylim(-0.005, 0.025)
Out[39]:
In [38]:
pl.plot(mean_spectrum, 'k,', alpha=0.5)
pl.plot(mean_spectrum_flagged2, 'g,', alpha=0.5)
#pl.xlim(5000,15000)
pl.figure()
pl.plot(mean_spectrum-mean_spectrum_flagged2, 'g.', alpha=0.5)
#pl.xlim(5000,15000)
pl.ylim(0.0,0.025)
Out[38]:
Despite the power in these fourier components, there is no really obvious effect on the spectra. There SHOULD be an effect on the PSDs though...
In [41]:
meanpsdflagged1 = np.abs(dft[good1,:]).mean(axis=0)
meanpsdflagged2 = np.abs(dft[good2,:]).mean(axis=0)
In [43]:
pl.loglog(frq, meanpsd, 'k')
pl.loglog(frq, meanpsdflagged, 'r')
pl.loglog(frq, meanpsdflagged1, 'b')
pl.loglog(frq, meanpsdflagged2, 'g')
pl.xlim(3e-3,9e-3)
pl.ylim(1.5e2,4e2)
from matplotlib.ticker import FormatStrFormatter
pl.gca().xaxis.set_minor_formatter(FormatStrFormatter("%.2g"))
pl.gca().yaxis.set_minor_formatter(FormatStrFormatter("%.2g"))
The fact that we see no effect here indicates that these fourier components are present at a comparable level in all the spectra.
In [15]:
lon = np.array([h['OFF1'] for s,h in spectra])
lat = np.array([h['OFF2'] for s,h in spectra])
In [16]:
pl.plot(lon, lat, 'ks', markeredgecolor='none')
pl.plot(lon[~good], lat[~good], 'r.', alpha=0.8)
Out[16]:
In [17]:
pl.plot(lon, lat, 'ks', markeredgecolor='none')
pl.plot(lon[~good1], lat[~good1], 'g.', alpha=0.8)
Out[17]:
In [18]:
pl.plot(lon, lat, 'ks', markeredgecolor='none')
pl.plot(lon[~good2], lat[~good2], 'b.', alpha=0.8)
Out[18]:
The spatial distribution of the bad spectra is mostly random, with occasional peaks on individual scans, so there is no obvious harm in flagging them.
In [18]: