ESP is a python package designed to estimate galaxy spectra only from their photometric colors. It requires a training set of spectra for a Principal Component Analysis that will produce a set of eigenspectra and coefficients that can reproduce the original spectra. These coefficients are then used to train a Gaussian Process for each coefficient where the input to each Gaussian Process are the colors of the training spectra. Therefore, when feeding the Gaussian Processes the colors of a new galaxy, they return a set of PCA coefficients that can be used to build a new estimated spectrum for the new galaxy.
Setup ESP according to the README in the main folder of the repository.
Here we are going to take a set of sample spectra and run the PCA. All this requires is a library of spectra. We have added an example set of spectra from the LSST sims_sed_library.
In [1]:
import esp
import numpy as np
import matplotlib.pyplot as plt
import os
from esp.lsst_utils import BandpassDict
%load_ext autoreload
%autoreload 2
In [2]:
example_data_dir = '../data/example_data/' # Example spectral library
pca_obj = esp.pcaSED() # Initialize the pcaSED object
pca_obj.load_full_spectra(example_data_dir) # Load the spectra
pca_obj.PCA(comps=10) # Calculate using all components
In [3]:
plotter = esp.plotUtils()
fig = plotter.plot_eigenspectra(pca_obj, 2)
fig
Out[3]:
In [4]:
os.mkdir('results')
pca_obj.write_output('results')
In [5]:
new_pca_obj = esp.pcaSED()
new_pca_obj.load_pca_output('results')
Let's start by randomly varying the colors from our original spectra library to create a sample set of colors for which we want to estimate spectra.
We can use the new pcaSED
object to calculate the colors of the spectra that compose the training data for our estimation methods. We start by loading up a set of LSST bandpasses into a BandpassDict
. We have saved a version from the LSST simulations team in the data folder of the repository, but these are also provided by the LSST simulation team here.
In [6]:
bandpass_dir = '../data/lsst_bandpasses/'
filters = ['u', 'g', 'r', 'i', 'z', 'y']
bandpass_dict = BandpassDict.loadTotalBandpassesFromFiles(bandpassNames = filters,
bandpassDir = bandpass_dir,
bandpassRoot = 'total_')
Then we can call calc_colors
to calculate the training set colors.
In [8]:
colors = new_pca_obj.calc_colors(bandpass_dict, 10)
print(colors[0:3])
And finally we create a new sample set of colors by adding random gaussian offsets to the original colors.
In [9]:
np.random.seed(42)
sample_cat_colors = colors + np.random.normal(0.0, 0.2, size=50).reshape((10, 5))
Included in ESP is a simple nearest neighbor interpolation to compare against the Gaussian Process results. It uses scikit-learn
's k-nearest neighbor methods with a user specified k value.
Note that we also require a BandpassDict
object here. We can reuse the one we created above.
In [10]:
nn_obj = esp.nearestNeighborEstimate(new_pca_obj, bandpass_dict, sample_cat_colors)
In [11]:
nn_spec = nn_obj.nn_predict(1)
In [13]:
nn_colors = nn_spec.calc_colors(bandpass_dict, 10)
print(nn_colors)
The nearestNeighborEstimate
object can also take the arguments for the scikit-learn
methods using the knr_args
with a dictionary specifying the desired settings. For instance, we show how to change the weighting from uniform to distance dependent.
In [14]:
nn_spec = nn_obj.nn_predict(5, knr_args=dict(weights='distance'))
In [15]:
nn_colors = nn_spec.calc_colors(bandpass_dict, 10)
print(nn_colors)
The best results using ESP should be obtained using Gaussian Processes to estimate the PCA coefficients that can be used to construct estimated spectra from colors. The gaussianProcessEstimate
method does this for you. Once again you give the estimator the pcaSED
object containing the eigenspectra and training coefficients, the BandpassDict
for calculating colors and the colors for the objects for which you want to estimate spectra.
In [16]:
gp_obj = esp.gaussianProcessEstimate(new_pca_obj, bandpass_dict, sample_cat_colors)
The next step is to define the initial kernel function for the Gaussian Processes that estimate the PCA coefficients. Right now you can specify an exponential kernel using 'exp' or a squared exponential kernel using 'sq_exp'. The next two numbers are a scaling value that normalizes the variance between objects and a length parameter that defines the length scale on the kernel. If you are familiar with george and want to define your own more complicated kernel you can do that as well.
In [18]:
gp_kernel = gp_obj.define_kernel('exp', 1.0e-3, 1.0e-3, n_dim=len(sample_cat_colors[0]))
We then use the kernel to predict PCA coefficients that will create an estimated spectra based upon the input colors of the catalog objects. The Gaussian Process for each set of PCA coefficients uses George
's optimization method to optimize the hyperparameters in the kernel.
In [20]:
gp_spec = gp_obj.gp_predict(gp_kernel, opt_bandpass_dict=bandpass_dict)
Finally, we can evaluate the colors the estimate spectrum give back as a way to measure how well our spectrum can reproduce the input data.
In [22]:
gp_colors = gp_spec.calc_colors(bandpass_dict, 10)
print(gp_colors)
The gp_spec
is a new pcaSED
object that contains the coefficients that define the estimated spectra for each catalog object as well as the other information from the input pcaSED
object that you need construct the spectra. Since it is a pcaSED
object it means you can also just call reconstruct_spectra
to produce the estimated spectra from the coefficients.
In [23]:
gp_spec.coeffs
Out[23]:
In [24]:
gp_spec.reconstruct_spectra(10)
Out[24]:
Finally, the gp_spec
object also contains the hyperparameter settings for each set of PCA coefficients for further analysis.
In [25]:
np.exp(gp_spec.params)
Out[25]:
In [ ]: