In [ ]:
from __future__ import division, print_function, absolute_import, unicode_literals
import numpy as np
import astropy.coordinates as coords
import astropy.units as u
import matplotlib.pyplot as plt
%matplotlib inline

Hands-On Exercise 3: Finding RR Lyrae Candidates Using SDSS Photometric Colors


Version 0.2

As we learned earlier today, RR Lyrae stars (RRL) are a particularly interesting class of variable stars (see the talk by B. Sesar). In particular, RRL are pulsating variables (PVs) that can be used as standard(izable) candles. As RRL are relatively luminous, they can be used to measured distances in the outskirts of the Milky Way halo, and in some cases even measure the distances to other galaxies. Thus, RRL are some of the best studied variable stars, and the goal of this exercise is to identify RRL candidates from SDSS data.

In addition to identifying RRL, we will work on writing SQL queries and cross-matching sources between different data sets.


By AA Miller (c) 2016 Jul 14

Problem 1) Download SDSS data

The first step to search for candidates will be accessing the data from the SDSS archive. One way to do this is via the SDSS web-based SQL interface (you may want to bookmark this page for your future reference). It is also possible to query the SDSS database using the Python module astroquery, which is an affiliated package of astropy. To get a sense of the form of an SQL query, let's examine the example from the SDSS website:

-- This query does a table JOIN between the imaging (PhotoObj) and spectra
-- (SpecObj) tables and includes the necessary columns in the SELECT to upload
-- the results to the SAS (Science Archive Server) for FITS file retrieval.
SELECT TOP 10
   p.objid,p.ra,p.dec,p.u,p.g,p.r,p.i,p.z,
   p.run, p.rerun, p.camcol, p.field,
   s.specobjid, s.class, s.z as redshift,
   s.plate, s.mjd, s.fiberid
FROM PhotoObj AS p
JOIN SpecObj AS s ON s.bestobjid = p.objid
WHERE 
   p.u BETWEEN 0 AND 19.6
   AND g BETWEEN 0 AND 20

This query has all the elements that we have been introduced to: a SELECT statement to define the columns we want returned, a FROM statement to select which tables we want to query, a JOIN statement to select data from multiple tables, and finally a WHERE statement to place conditional constraints on the returned data. As noted in the comments (prepended with -- in SQL), this query combines imaging and spectroscopic data, to find sources with $u < 19.6$ and $g < 20$ mag. We will only be accessing photometric data today, so you will not need to worry about writing queries with JOINs.

Finally, for completeness, note that should you need to write complex or very long SDSS queries in the future, it is best to use CasJobs.

To start, we'll execute a simple example from the astroquery docs to make sure the library works as expected. (Note that this example doesn't use SQL!)


In [ ]:
from astroquery.sdss import SDSS

pos = coords.SkyCoord('0h8m05.63s +14d50m23.3s', frame='icrs')
xid = SDSS.query_region(pos, spectro=True)
print(xid)

Part A) Make an SDSS query

We will begin with a simple SDSS query to identify the colors of our favorite star (see Hands-on session 1). SDSS tables can be a little intimidating: there are over 500 columns to choose from in PhotoObjAll (the full schema for all the tables in the SDSS database can be found in the SDSS Schema Browser). Fortunately, we are only interested in a few columns for point sources. Obviously, we want to know the ra and dec, as well as the brightness in each of the $ugriz$ filters, psfMag_u, psfMag_g, psfMag_r, psfMag_i, psfMag_z, as well as the uncertainty on each of those brightness measurements (look up the name for these columns).

Another (potentially) confusing, but nevertheless very important, aspect of SDSS data is the large number of flags that are tabulated for each source. SDSS flags are typically reported via bitmasks. Working with the bitmasks can be complicated, but fortunately, SDSS provides a clean flag that essentially filters out all unreliable source detections. Thus, we will require all the queries we make to have the clause clean = 1 to include only good source detections.

While the tables are large, and the flags can be difficult to navigate, one (amazing) aspect of the SDSS database is that they provide several convenience functions for the user, commands that begin dbo, in order to simplify the process of getting the data you need. One of the most useful functions, dbo.fGetNearbyObjEq, allows a quick cross-match to access sources within some small radius. (Click on the previous link, and read point 5 to see an example query with this function.)

Problem A1 Query the SDSS database (using astroquery within python) to find the colors of our favorite star, which is located at $\alpha_\mathrm{J2000}, \delta_\mathrm{J2000} = (312.503802, -0.706603)$. [You will have to complete the query below, note - the results are returned as an astropy table]

Note that the radius is in units of arcmin. Choose a radius that only returns only our favorite star.

Hint - you will want both the magnitude measurements and the uncertainties on those measurements.


In [ ]:
# complete the code below
favSTARquery = """SELECT p.objid, p.ra, p.dec, p.psfMag_u, # [MORE THINGS FOR YOU TO INSERT HERE] 
                  FROM PhotoTag AS p , dbo.fGetNearbyObjEq(# RA, DEC, RADIUS) AS n
                  WHERE n.objID = p.objID
               """
res = SDSS.query_sql(favSTARquery)
res

Notice that the IPython notebook has a nice interface for the table data and that our features are listed in a nice easy to read format. There are a couple nice features about astropy tables. One is that we can select any individual feature by its column name:

ra = res["ra"]

Furthermore, it's also possible to slice through the entire table using Python conditional arrays. For instance, suppose we only want the query results for sources that are bright in the $u$ band, we can access that in the following way:

brightU_cond = (res["psfMag_u"] <= 16)
brightU_dat = ts[brightU_cond]

There are plenty of additional conveniences for manipulating the data in an astropy table, but we shouldn't need them to complete this problem.

Problem A2 Using the results from the SDSS query, determine $u - g$ and $g - r$ colors of our favorite star.


In [ ]:
# determine the u - g and the g - r colors of our star
ug = 
gr =

Problem A3 Notice that the SDSS position and the PTF position do not match. Calculate the distance between these two sources to ensure that you have the correct star.

Hint - make sure the distance is in arcsec.


In [ ]:
import astropy.coordinates as coords

PTFcoords  = coords.SkyCoord(312.503802, -0.706603, frame='icrs',unit='deg')
SDSScoords = coords.SkyCoord(

dist = PTFcoords.separation(SDSScoords)

print('The angular separation is {:.2f} arcsec'.format(dist.arcsec))

We will discuss the full specifications for selecting RRL below. But, one of the criteria stated in Sesar et al. 2007 is that: $$ 0.98 < (u - g) < 1.30 $$

Problem A4 Based on this criterion, would you consider out favorite star an RRL candidate?

type your answer in this markdown cell

Part B) Correcting the colors

The initial query performed above allowed us to access SDSS data on a single star. However, the information we received was a little difficult to manage (it extends beyond the length of the notebook), it did not directly include the colors for the star (we had to calculate that in python), and, of course, it was only one star (we want to search for many RRL candidates, and it would be silly to do this one star at a time). We will now build some more complicated queries to fetch all the data that we need.

Unsurprising news flash - our favorite star is in fact an RRL. Is it strange that our star is outside the color range where RRL are typically found? [take a second to think about this]

Ultimately, no, this is not strange, because the above query neglects the effects of interstellar reddenning. The presence of dust in the ISM between the Earth and astronomically observed sources, causes the observed colors of a particular source to be different from the intrinsic colors of that source. Thus, to adequately compare different sources, reddening corrections must be applied. Fortunately, SDSS has already computed these corrections and they are stored in the extinction_u, extinction_g etc. columns in PhotoObjAll. Moving forward, we will correct all color measurements using these values.

Furthermore, SQL has a nice feature where the user can define the output of the query in terms of different columns in a given table. For instance, if we wanted to know the average airmass of a source at the time of SDSS observations (note that the filtered observations are not taken simultaneously; also note - this is a somewhat silly thing to want to know), we could query for a user defined variable in the following way:

SELECT (airmass_u + airmass_g + airmass_r + airmass_i + airmass_z)/5 as airmassAvg 
...

Thus, instead out grabbing five different columns, and reading that data into Python before taking the average, that operation can be entirely performed within SQL so that only the desired information is retrieved.

Problem B1 Starting from the example of your initial query, write an improved query that selects the de-reddened $u - g, g - r, r - i, i - z$ colors from SDSS for our favorite star.

Hint - be careful about how you apply the reddening correction, the additions and subtractions can be confusing.


In [ ]:
# complete the code below
dered_query = """SELECT p.objid, p.ra, p.dec, 
                        [MORE THINGS FOR YOU TO INSERT HERE] as dered_ug, 
                        [MORE THINGS FOR YOU TO INSERT HERE] as dered_gr,
                        [MORE THINGS FOR YOU TO INSERT HERE] 
                        [MORE THINGS FOR YOU TO INSERT HERE]
                  FROM PhotoTag AS p , dbo.fGetNearbyObjEq(# RA, DEC, RADIUS) AS n
                  WHERE n.objID = p.objID
               """
res = SDSS.query_sql(dered_query)
res

Problem B2 Using the results from the SDSS query, determine de-reddened colors, $(u - g)_0$ and $(g - r)_0$, colors of our favorite star.


In [ ]:
# determine the de-reddened colors
ug = 
gr =

Problem B3 Is $(u-g)_0$ now what you might expect for an RRL star?

Part C) Complex Queries: colors for all the stars

The real power in SQL databases comes from the ability to write complex queries. Previously, we have not needed a complex query since we have utilized the SDSS database function to identify the source that matches our favorite star. Ultimately, we are interested in finding RRL candidates over the entire field that we have been studying. Thus, we will need to query the entire area of our PTF field [Field ID: 022683, CCD ID: 06] for the colors of stars (i.e. galaxies must be excluded). To do this we need a complex query, which allows multiple conditional statements for the WHERE portion of query.

As a basic example of a complex query we could search for RRL candidates using the $u - g$ condition that was specified above (this example does not have the reddening correction):

SELECT p.objid, p.ra, p.dec, p.psfMag_u - p.psfMag_g as ug
FROM PhotoTag AS p
WHERE ug > 0.98 AND ug < 1.30

The use of AND after the WHERE statement is what makes this a complex query. More conditions can be added to the query with additional ANDs. The above query can be made far more efficient with the BETWEEN condition (note - this condition can be particularly useful when searching for sources via their positions):

SELECT p.objid, p.ra, p.dec, p.psfMag_u - p.psfMag_g as ug
FROM PhotoTag AS p
WHERE ug BETWEEN 0.98 AND 1.30

Both of the above examples return the same results but the latter query runs much faster.

Problem C1 Determine the boundaries of the PTF field, so we can fetch the SDSS colors of all the stars in this field. [Note - there are several ways you can go about this]


In [ ]:
# determine the boundaries of our PTF field
ra1 = 
ra2 = 
dec1 = 
dec2 =

Problem C2 Write a complex query that returns the de-reddened colors of all the stars in our PTF field. Note that we only want photometry for sources that have reliable measurements, so one of your conditional statements should include the clean column. We also only want sources that have PTF light curves (since we will eventually cross match PTF and SDSS), so limit your query to sources with $r \le 20.5 \; \mathrm{mag}$. There are many different ways to write this query (including an SDSS database function, or the examples given above), if you have extra time try to figure out which query is the fastest.

Hint 1 - how can you be sure that you are only selecting stars in your query?

Hint 2 - if your query has fewer than ~5k sources, or many more than ~10k sources, then something about your conditional statements is incorrect.


In [ ]:
# complete the code below
colors_query = """SELECT p.objid, p.ra, p.dec, 
                        [MORE THINGS FOR YOU TO INSERT HERE]  
                        .
                        .
                        .
                  FROM PhotoTag AS p 
                  WHERE [COND1] AND [COND2] AND ...
               """
colors_res = SDSS.query_sql(colors_query)
colors_res

Problem C3 Save the astropy Table from your query into a comma separated value (csv) file titled "../data/PTF_d022683_f02_c06_SDSS_colors.csv". We will need this file for the remainding hands-on sessions.


In [ ]:
colors_res.write(

Problem 2) Identify RRL candidates

Part A

Now that we have the colors for all the stars in our PTF field we are going to slice through the data to find RRL candidates based on their de-reddened colors. First, let's begin by plotting all these stars in color-color space to see their distributions in the different colors.

Problem A1 Plot the $(u - g)_0$ and $(g - r)_0$ color-color diagram (CC diagram).

Hint - it is extremely important that this, and all plots you make are readily legible.


In [ ]:
plt.scatter(   # Note, this plot will feature a lot of data 
               # consider using edgecolor = "None" and alpha = 0.05 in your call to plt.scatter
               # to make a (potentially) even nicer looking plot, consider using plt.hexbin 
plt.xlim(
plt.ylabel(

Part B

The $(u - g)_0$ and $(g - r)_0$ CC diagram that you just created forms the basis for many studies that have been conducted with SDSS data. The plot can be divided into roughly 3 different regions, and we will learn more about some of them in the coming days.

Earlier, we learned that RRL are retricted to the following portion of color space, $0.98 < (u − g) < 1.30$, but using that criterion alone will lead to a lot of false positives. A set of five color criteria were laid out in Ivezic et al. (2005) for the selection of RRL stars: $$ \begin{equation} 0.98 < (u − g)_0 < 1.30, \\ -0.05 < D_{ug} < 0.35, \\ 0.06 < D_{gr} < 0.55, \\ -0.15 < (r - i)_0 < 0.22, \\ -0.21 < (i - z)_0 < 0.25. \end{equation} $$

where, $D_{ug} = (u - g)_0 + 0.67(g - r)_0 - 1.07$ and $D_{gr} = 0.45(u - g)_0 - (g - r)_0 - 0.12$. Now that we know the criteria for selecting RRL candidates, we will determine how many are in our PTF field.

Problem B1 Determine the number of RRL candidates in our PTF field using the criteria listed above. Also, extract the RA and Dec for these candidates so you can later examine their light curves from the PTF data.

Hint - there a multiple ways to select candidates, including a complex query to the SDSS database. However, since you already have the necessary data in hand in color_res, slicing that table will be faster than writing a new SDSS query.


In [ ]:
# there is a lot for you to fill in here
D_ug =

RRLcand = 

print("There are {:d} RRL cands in this field".format( # complete

Now that we have identified RRL cands, we should figure out where they reside in the CC diagram.

Problem B2 Recreate the plot from problem A1 and overplot the location of the RRL cands using a different symbol and color to highlight their location in the CC diagram.


In [ ]:
plt.scatter( # complete

plt.scatter( # complete

plt.legend(fancybox = True)

Problem 3 -- Challenge

If you finish early, work on the following problem, or continue working on this as homework for this evening.

Challenge Problem For each of the RRL candidates identified from SDSS colors, measure the best-fit period for the corresponding PTF light curves of those sources. Plot the phase-folded light curves of the RRL candidates. After this exercise, how many (and which) of these sources do you now think are genuine RRL stars?


In [ ]:
# lots of gaps to fill for this problem
import 

source_mjds, source_mags, source_magerrs = source_lightcurve(

plt.scatter(

References