Developing the Training Set Sightlines


In [22]:
%matplotlib notebook

In [2]:
# imports
from matplotlib import pyplot as plt

from astropy.coordinates import SkyCoord, match_coordinates_sky
from astropy import units as u

from specdb.specdb import IgmSpec
from pyigm.surveys.dlasurvey import DLASurvey

Load Complete DR5


In [3]:
sdssdr5 = DLASurvey.load_SDSS_DR5(sample='all')


SDSS-DR5: Loading DLA file /Users/xavier/local/Python/pyigm/pyigm/data/DLA/SDSS_DR5/dr5_alldla.fits.gz
SDSS-DR5: Loading QSOs file /Users/xavier/local/Python/pyigm/pyigm/data/DLA/SDSS_DR5/dr5_dlagz_s2n4.fits

DLAs


In [48]:
min_NHI = np.min(sdssdr5.NHI)
min_NHI


Out[48]:
20.300000000000001

Sightlines


In [4]:
sdssdr5.sightlines[0:4]


Out[4]:
<QTable length=4>
PLATEFIBRADECFLG_BALIQSOMAGS2NZ_STARTZ_ENDZEMDX
degdeg
int32int32float64float64int16int32float64float64float64float64float64float64
2665146.93861-0.687011941019.3419990544.945950031282.396646438782.746490001682.82874989511.17426266257
26692146.22601-0.725098750419.08200073248.549805641172.200000047682.257595062262.290499925610.184012498912
270254152.23239-0.971232720919.02300071727.497638225562.306361577143.05562424663.096590042112.56581152274
271391154.149920.1475083801618.065000534118.9826297762.200000047682.255516052252.288399934770.177634457341

Coords


In [5]:
dla_coord = sdssdr5.coord

In [6]:
sl_coord = SkyCoord(ra=sdssdr5.sightlines['RA'], dec=sdssdr5.sightlines['DEC'])

Identify sightlines

Find sightlines without a DLA


In [7]:
idx, d2d, d3d = match_coordinates_sky(sl_coord, dla_coord, nthneighbor=1)

In [8]:
clear = d2d > 1*u.arcsec

In [9]:
nclear = np.sum(clear)
nclear


Out[9]:
6532

In [10]:
keep = clear

Sightlines without any hint of a BAL?

i.e. avoid FLG_BAL==1
But, it may be good to train with those in there [with and witout input DLA, of course]

In [11]:
np.max(sdssdr5.sightlines['FLG_BAL'])


Out[11]:
1

S/N?

Probably should cut at S/N=5

In [23]:
plt.clf()
ax = plt.gca()
ax.hist(sdssdr5.sightlines['S2N'][keep], bins=50, color='grey')
ax.set_xlim(0., 40.)
plt.show()



In [13]:
s2n_cut = 5.

In [14]:
gd_s2n = sdssdr5.sightlines['S2N'] > s2n_cut

In [15]:
keep = clear & gd_s2n
np.sum(keep)


Out[15]:
5034

Cut on $\Delta X$?

This would mainly remove low z quasars

In [24]:
plt.clf()
ax = plt.gca()
ax.hist(sdssdr5.sightlines['DX'][keep], bins=50, color='orange')
#ax.set_xlim(0., 40.)
plt.show()



In [25]:
plt.clf()
ax = plt.gca()
ax.scatter(sdssdr5.sightlines['ZEM'][keep], sdssdr5.sightlines['DX'][keep], color='orange')
ax.set_xlabel(r'$z_{\rm em}$')
ax.set_ylabel(r'$\Delta X$')
plt.show()


Assess sightlines

Number


In [18]:
nsight = np.sum(keep)
print("We have {:d} sightlines for the training set".format(nsight))


We have 5034 sightlines for the training set

$z_{\rm em}$

Given the distribution below, should we uniformly sample zem?  Probably..
Or should we supplement with ESI?

In [51]:
plt.clf()
ax = plt.gca()
ax.hist(sdssdr5.sightlines['ZEM'][keep], bins=50)
#ax.set_xlim(0., 40.)
ax.set_xlabel(r'$z_{\rm em}$')
plt.show()


Magnitude

These are pretty bright
But the value of adding ones with lower S/N is questionable

In [52]:
plt.clf()
ax = plt.gca()
ax.hist(sdssdr5.sightlines['MAG'][keep], bins=50, color='green')
ax.set_xlabel(r'$i$ (mag)')
plt.show()


Examine a few


In [30]:
igmsp = IgmSpec()


Using /raid/IGMSPEC_DB/IGMspec_DB_v02.hdf5 for the catalog file
Using /raid/IGMSPEC_DB/IGMspec_DB_v02.hdf5 for the DB file
Available surveys: [u'BOSS_DR12', u'HSTQSO', u'SDSS_DR7', u'KODIAQ_DR1', u'MUSoDLA', u'HD-LLS_DR1', u'2QZ', u'ESI_DLA', u'HDLA100', u'GGG', u'COS-Halos', u'HST_z2', u'COS-Dwarfs', u'XQ-100']
Database is igmspec
Created on 2016-Oct-25

In [21]:
kidx = np.where(keep)[0]

0

zem = 2.29

In [31]:
k0 = kidx[0]
s0 = sdssdr5.sightlines[k0]
s0


Out[31]:
<Row index=1>
PLATEFIBRADECFLG_BALIQSOMAGS2NZ_STARTZ_ENDZEMDX
degdeg
int32int32float64float64int16int32float64float64float64float64float64float64
26692146.22601-0.725098750419.08200073248.549805641172.200000047682.257595062262.290499925610.184012498912

In [35]:
spec, meta = igmsp.spec_from_coord((s0['RA'], s0['DEC']), isurvey=['SDSS_DR7'])
spec[0]


Your search yielded 1 match[es]
Staged 1 spectra totalling 6.4e-05 Gb
Loaded spectra
Out[35]:
<XSpectrum1D: file=none, nspec=1, select=0, wvmin=3823.84 Angstrom, wvmax=9202.38 Angstrom>

In [36]:
spec[0].plot()


100


In [37]:
k100 = kidx[100]
s100 = sdssdr5.sightlines[k100]
s100


Out[37]:
<Row index=129>
PLATEFIBRADECFLG_BALIQSOMAGS2NZ_STARTZ_ENDZEMDX
degdeg
int32int32float64float64int16int32float64float64float64float64float64float64
301267208.99597-0.4037675152118.229000091614.3788213732.200000047682.261579990392.34383988380.196775568722

In [38]:
spec100, meta = igmsp.spec_from_coord((s100['RA'], s100['DEC']), isurvey=['SDSS_DR7'])
spec100[0].plot()


Your search yielded 1 match[es]
Staged 1 spectra totalling 6.4e-05 Gb
Loaded spectra

Pick one at z=3


In [41]:
i3 = np.argmin(np.abs(sdssdr5.sightlines['ZEM'][keep]-3.))
k3 = kidx[i3]               
s3 = sdssdr5.sightlines[k3]
s3


Out[41]:
<Row index=5601>
PLATEFIBRADECFLG_BALIQSOMAGS2NZ_STARTZ_ENDZEMDX
degdeg
int32int32float64float64int16int32float64float64float64float64float64float64
1595148147.7798835.38206102308019.46999931345.451839447022.318565125412.960445404053.000449895862.17923800687

In [44]:
spec3, meta = igmsp.spec_from_coord((s3['RA'], s3['DEC']), isurvey=['SDSS_DR7'])
spec3[0].plot()


Your search yielded 1 match[es]
Staged 1 spectra totalling 6.4e-05 Gb
Loaded spectra

And one at z=4


In [45]:
i4 = np.argmin(np.abs(sdssdr5.sightlines['ZEM'][keep]-4.))
k4 = kidx[i4]     
s4 = sdssdr5.sightlines[k4]
s4


Out[45]:
<Row index=3686>
PLATEFIBRADECFLG_BALIQSOMAGS2NZ_STARTZ_ENDZEMDX
degdeg
int32int32float64float64int16int32float64float64float64float64float64float64
1158355206.6396658.41921901585319.39100074776.514816284182.858782875973.951019763954.001029968264.12456204506

In [46]:
spec4, meta = igmsp.spec_from_coord((s4['RA'], s4['DEC']), isurvey=['SDSS_DR7'])
spec4[0].plot()


Your search yielded 1 match[es]
Staged 1 spectra totalling 6.4e-05 Gb
Loaded spectra

In [ ]: