ESP: Estimating Spectra from Photometry

What is it?

ESP is a python package designed to estimate galaxy spectra only from their photometric colors. It requires a training set of spectra for a Principal Component Analysis that will produce a set of eigenspectra and coefficients that can reproduce the original spectra. These coefficients are then used to train a Gaussian Process for each coefficient where the input to each Gaussian Process are the colors of the training spectra. Therefore, when feeding the Gaussian Processes the colors of a new galaxy, they return a set of PCA coefficients that can be used to build a new estimated spectrum for the new galaxy.

Getting started

Setup ESP according to the README in the main folder of the repository.

Use PCA to create a basis set.

Here we are going to take a set of sample spectra and run the PCA. All this requires is a library of spectra. We have added an example set of spectra from the LSST sims_sed_library.



In [1]:

    
import esp
import numpy as np
import matplotlib.pyplot as plt
import os
from esp.lsst_utils import BandpassDict
%load_ext autoreload
%autoreload 2

Load the spectra and perform PCA



In [2]:

    
example_data_dir = '../data/example_data/' # Example spectral library
pca_obj = esp.pcaSED() # Initialize the pcaSED object
pca_obj.load_full_spectra(example_data_dir) # Load the spectra
pca_obj.PCA(comps=10) # Calculate using all components









    



Done loading spectra from file

Visualize the Eigenspectra

We will use ESP's plotting utilities to plot the eigenspectra in the visual wavelengths.



In [3]:

    
plotter = esp.plotUtils()
fig = plotter.plot_eigenspectra(pca_obj, 2)
fig









    Out[3]:

Save PCA results



In [4]:

    
os.mkdir('results')
pca_obj.write_output('results')

Estimation of Spectra from Photometry

This is what this software exists to do. Now that you have a set of basis spectra you can create an ESP estimation object and give it the colors from a photometric catalog. Then you can use it to estimate spectra for the objects in your catalog.

Load PCA results from file in a new `pcaSED` object

This is what you would do if you were running an analysis on a previously saved set of eigenspectra and coefficients.



In [5]:

    
new_pca_obj = esp.pcaSED()
new_pca_obj.load_pca_output('results')

Create a sample catalog

Let's start by randomly varying the colors from our original spectra library to create a sample set of colors for which we want to estimate spectra.

We can use the new pcaSED object to calculate the colors of the spectra that compose the training data for our estimation methods. We start by loading up a set of LSST bandpasses into a BandpassDict. We have saved a version from the LSST simulations team in the data folder of the repository, but these are also provided by the LSST simulation team here.



In [6]:

    
bandpass_dir = '../data/lsst_bandpasses/'
filters = ['u', 'g', 'r', 'i', 'z', 'y']
bandpass_dict = BandpassDict.loadTotalBandpassesFromFiles(bandpassNames = filters,
                                                          bandpassDir = bandpass_dir,
                                                          bandpassRoot = 'total_')

Then we can call calc_colors to calculate the training set colors.



In [8]:

    
colors = new_pca_obj.calc_colors(bandpass_dict, 10)
print(colors[0:3])









    



[[ 1.4929158   0.59936636  0.29770002  0.1690767   0.19888057]
 [ 1.24226315  0.52636294  0.24677608  0.10527891  0.07863345]
 [ 0.42452407 -0.00708825  0.01245164  0.02283399  0.02162798]]

And finally we create a new sample set of colors by adding random gaussian offsets to the original colors.



In [9]:

    
np.random.seed(42)
sample_cat_colors = colors + np.random.normal(0.0, 0.2, size=50).reshape((10, 5))

Nearest Neighbor Estimation

Included in ESP is a simple nearest neighbor interpolation to compare against the Gaussian Process results. It uses scikit-learn's k-nearest neighbor methods with a user specified k value.

Note that we also require a BandpassDict object here. We can reuse the one we created above.



In [10]:

    
nn_obj = esp.nearestNeighborEstimate(new_pca_obj, bandpass_dict, sample_cat_colors)



In [11]:

    
nn_spec = nn_obj.nn_predict(1)



In [13]:

    
nn_colors = nn_spec.calc_colors(bandpass_dict, 10)
print(nn_colors)









    



[[ 1.4929158   0.59936636  0.29770002  0.1690767   0.19888057]
 [ 1.24226315  0.52636294  0.24677608  0.10527891  0.07863345]
 [ 0.08565165 -0.22354494 -0.16804298 -0.09086358 -0.11219682]
 [ 0.42452407 -0.00708825  0.01245164  0.02283399  0.02162798]
 [ 2.07584574  0.87645962  0.50592775  0.27701911  0.28281606]
 [-0.27524941 -0.38768019 -0.28494435 -0.23936991 -0.21146007]
 [ 0.88854517  0.21089487  0.1067944   0.04039486  0.00634465]
 [ 1.4929158   0.59936636  0.29770002  0.1690767   0.19888057]
 [ 0.08565165 -0.22354494 -0.16804298 -0.09086358 -0.11219682]
 [ 0.4589896  -0.07862281  0.09024506  0.08877091  0.07480225]]

The nearestNeighborEstimate object can also take the arguments for the scikit-learn methods using the knr_args with a dictionary specifying the desired settings. For instance, we show how to change the weighting from uniform to distance dependent.



In [14]:

    
nn_spec = nn_obj.nn_predict(5, knr_args=dict(weights='distance'))



In [15]:

    
nn_colors = nn_spec.calc_colors(bandpass_dict, 10)
print(nn_colors)









    



[[ 1.44948234  0.62230859  0.33708666  0.18897176  0.19992778]
 [ 1.35087751  0.57335027  0.30710314  0.16766825  0.17430908]
 [ 0.23428443 -0.10451996 -0.0307044  -0.00499442 -0.01104218]
 [ 0.4302046  -0.0128338   0.02452903  0.0244875   0.00810127]
 [ 1.55177473  0.68197451  0.38436909  0.21636447  0.22626174]
 [ 0.0112906  -0.20954585 -0.11263455 -0.06697265 -0.05862302]
 [ 0.68901602  0.1696596   0.11687911  0.05959773  0.03705137]
 [ 1.40754883  0.60232254  0.32613501  0.18150403  0.1911944 ]
 [ 0.18323596 -0.13150027 -0.05408104 -0.01974857 -0.02437069]
 [ 0.43460892 -0.0136621   0.02830687  0.02773224  0.01155962]]

Gaussian Process Estimation

The best results using ESP should be obtained using Gaussian Processes to estimate the PCA coefficients that can be used to construct estimated spectra from colors. The gaussianProcessEstimate method does this for you. Once again you give the estimator the pcaSED object containing the eigenspectra and training coefficients, the BandpassDict for calculating colors and the colors for the objects for which you want to estimate spectra.



In [16]:

    
gp_obj = esp.gaussianProcessEstimate(new_pca_obj, bandpass_dict, sample_cat_colors)

The next step is to define the initial kernel function for the Gaussian Processes that estimate the PCA coefficients. Right now you can specify an exponential kernel using 'exp' or a squared exponential kernel using 'sq_exp'. The next two numbers are a scaling value that normalizes the variance between objects and a length parameter that defines the length scale on the kernel. If you are familiar with george and want to define your own more complicated kernel you can do that as well.



In [18]:

    
gp_kernel = gp_obj.define_kernel('exp', 1.0e-3, 1.0e-3, n_dim=len(sample_cat_colors[0]))

We then use the kernel to predict PCA coefficients that will create an estimated spectra based upon the input colors of the catalog objects. The Gaussian Process for each set of PCA coefficients uses George's optimization method to optimize the hyperparameters in the kernel.



In [20]:

    
gp_spec = gp_obj.gp_predict(gp_kernel, opt_bandpass_dict=bandpass_dict)

Finally, we can evaluate the colors the estimate spectrum give back as a way to measure how well our spectrum can reproduce the input data.



In [22]:

    
gp_colors = gp_spec.calc_colors(bandpass_dict, 10)
print(gp_colors)









    



[[  1.48259547e+00   6.34273658e-01   3.51637246e-01   1.97321181e-01
    2.06638718e-01]
 [  1.20720686e+00   5.24873317e-01   3.04178207e-01   1.73015286e-01
    1.81075159e-01]
 [  2.23148004e-01  -8.10239454e-02  -2.72290631e-02  -1.44995997e-02
   -2.40196514e-02]
 [  3.16087269e-01  -4.56999514e-02  -6.75139348e-03  -3.33504716e-03
   -2.06629060e-02]
 [  1.92675148e+00   7.88706703e-01   4.17093961e-01   2.30498172e-01
    2.40935345e-01]
 [ -1.87715089e-01  -3.62335529e-01  -2.51757218e-01  -1.73704598e-01
   -1.44281324e-01]
 [  7.67671533e-01   2.62621700e-01   1.82086743e-01   1.09516467e-01
    1.09411940e-01]
 [  1.32756024e+00   5.86262576e-01   3.34829321e-01   1.89706780e-01
    1.99055131e-01]
 [  1.19639982e-01  -1.58376073e-01  -8.73753082e-02  -5.52564616e-02
   -6.74650616e-02]
 [  3.25795909e-01  -4.18717282e-02  -4.04201270e-04   2.33524196e-03
   -1.36649579e-02]]

The gp_spec is a new pcaSED object that contains the coefficients that define the estimated spectra for each catalog object as well as the other information from the input pcaSED object that you need construct the spectra. Since it is a pcaSED object it means you can also just call reconstruct_spectra to produce the estimated spectra from the coefficients.



In [23]:

    
gp_spec.coeffs









    Out[23]:





array([[ -5.29392215e-003,   3.97502947e-004,  -5.97781593e-005,
          5.96706672e-007,   1.24511164e-009,  -5.31084407e-018,
          2.73158832e-044,  -5.49073909e-066,   1.13984502e-077,
          7.09408740e-035],
       [ -4.03598209e-003,   3.87650771e-005,  -1.04268173e-004,
         -5.26242933e-008,  -2.43345797e-012,   1.32022480e-017,
         -4.03094999e-048,   7.32808913e-074,   1.73176545e-084,
          4.79352253e-035],
       [  4.39785002e-003,  -1.25271918e-004,   5.18427065e-006,
          1.73318955e-006,  -1.06136653e-013,   4.38088631e-023,
         -3.92054757e-062,  -3.68202725e-091,  -1.49349738e-107,
         -1.10947387e-035],
       [  3.64618029e-003,  -5.37070365e-004,   2.38328629e-005,
          9.62296439e-007,   3.14418639e-011,  -5.11444263e-019,
          5.28709379e-051,   3.87384952e-074,  -4.84215160e-088,
          5.37050541e-035],
       [ -6.79193809e-003,   1.07672557e-003,   6.55241057e-005,
          3.55927310e-007,  -7.07629719e-011,   1.51831129e-019,
          2.59380307e-053,   2.78449109e-079,  -1.63859334e-094,
         -4.63134552e-036],
       [  8.98958521e-003,   1.90482837e-003,  -4.24383914e-005,
         -8.86460804e-006,  -9.74548806e-010,  -1.67577985e-015,
          2.10224689e-037,  -5.17758443e-054,  -2.12933541e-063,
         -1.73017950e-034],
       [ -8.48227096e-004,  -7.00850247e-004,  -4.30913196e-005,
         -6.80546016e-007,  -1.56230362e-011,  -4.44879773e-021,
          8.40882003e-059,  -2.50721322e-087,  -3.60800756e-102,
          5.81971540e-035],
       [ -4.71439363e-003,   2.86615210e-004,  -5.48685834e-005,
          1.68948124e-007,   9.10692955e-011,   9.32699925e-021,
          1.29724366e-052,  -1.01580300e-078,   9.60769758e-093,
          4.98645390e-035],
       [  5.61204119e-003,   1.97930435e-004,   3.58971157e-006,
          1.02594316e-005,  -1.25739590e-010,   9.84513591e-018,
         -2.01215393e-044,  -4.55548655e-064,  -1.77260145e-075,
         -1.11069026e-034],
       [  3.55049048e-003,  -5.71423682e-004,   4.65939729e-005,
         -4.40927913e-007,   3.09092821e-011,   2.84535233e-020,
         -2.79983690e-056,  -3.45519241e-083,  -8.25992654e-099,
          4.40119727e-035]])



In [24]:

    
gp_spec.reconstruct_spectra(10)









    Out[24]:





array([[  4.48352213e-05,   4.52598543e-05,   4.71503073e-05, ...,
          1.21002111e-04,   1.22372783e-04,   1.18107455e-04],
       [  8.45857546e-05,   8.49526397e-05,   8.59582642e-05, ...,
          1.04685580e-04,   1.05444766e-04,   1.01982340e-04],
       [  4.87095327e-04,   4.79709213e-04,   4.71545044e-04, ...,
          3.07910768e-05,   3.02259190e-05,   2.96785957e-05],
       ..., 
       [  6.64108858e-05,   6.66838931e-05,   6.80662039e-05, ...,
          1.14709755e-04,   1.15902988e-04,   1.11920270e-04],
       [  5.65995958e-04,   5.56072808e-04,   5.46200268e-04, ...,
          2.37984143e-05,   2.32903439e-05,   2.29166552e-05],
       [  4.18165229e-04,   4.13550985e-04,   4.06782868e-04, ...,
          3.40956608e-05,   3.34186298e-05,   3.28448122e-05]])

Finally, the gp_spec object also contains the hyperparameter settings for each set of PCA coefficients for further analysis.



In [25]:

    
np.exp(gp_spec.params)









    Out[25]:





array([[  1.29296503e-05,   1.00658054e+02],
       [  4.92717212e-07,   1.58662586e+00],
       [  4.06353554e-08,   3.22545915e-02],
       [  4.95798649e-09,   8.46472158e-03],
       [  1.58004090e-09,   8.84679628e-04],
       [  7.50512800e-10,   1.64048257e-04],
       [  4.87554603e-10,   1.47458472e-05],
       [  1.60449696e-10,   6.30120257e-06],
       [  3.74442361e-11,   4.49575522e-06],
       [  4.26342055e-29,   4.60469283e-02]])



In [ ]: