A First Look at the SDSS Photometric "Galaxy" Catalog

  • The Sloan Digital Sky Survey imaged over 10,000 sq degrees of sky (about 25% of the total), automatically detecting, measuring and cataloging millions of "objects".
  • While the primary data products of the SDSS was (and still are) its spectroscopic surveys, the photometric survey provides an important testing ground for dealing with pure imaging surveys like those being carried out by DES and that is planned with LSST.
  • Let's download part of the SDSS photometric object catalog and explore it.

SDSS data release 12 (DR12) is described at the SDSS3 website and in the survey paper by Alam et al 2015.

We will use the SDSS DR12 SQL query interface. For help designing queries, the sample queries page is invaluable, and you will probably want to check out the links to the "schema browser" at some point as well. Notice the "check syntax only" button on the SQL query interface: this is very useful for debugging SQL queries.

Small test queries can be executed directly in the browser. Larger ones (involving more than a few tens of thousands of objects, or that involve a lot of processing) should be submitted via the CasJobs system. Try the browser first, and move to CasJobs when you need to.


In [1]:
%load_ext autoreload
%autoreload 2

In [3]:
import numpy as np
import SDSS
import pandas as pd
import matplotlib
%matplotlib inline

In [4]:
objects = "SELECT top 10000 \
ra, \
dec, \
dered_u as u, \
dered_g as g, \
dered_r as r, \
dered_i as i, \
petroR50_i AS size \
FROM PhotoObjAll \
WHERE \
((type = '3' OR type = '6') AND \
 ra > 185.0 AND ra < 185.2 AND \
 dec > 15.0 AND dec < 15.2)"
print objects


SELECT top 10000 ra, dec, dered_u as u, dered_g as g, dered_r as r, dered_i as i, petroR50_i AS size FROM PhotoObjAll WHERE ((type = '3' OR type = '6') AND  ra > 185.0 AND ra < 185.2 AND  dec > 15.0 AND dec < 15.2)

In [5]:
# Download data. This can take a while...
sdssdata = SDSS.select(objects)
sdssdata


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-5-38e651c78ad3> in <module>()
      1 # Download data. This can take a while...
----> 2 sdssdata = SDSS.select(objects)
      3 sdssdata

/home/adrian/Phys366_StatMeth/StatisticalMethods/examples/SDSScatalog/SDSS.py in select(sql)
      3 def select(sql):
      4     from StringIO import StringIO # To read a string like a file
----> 5     import mechanize
      6     url = "http://skyserver.sdss3.org/dr12/en/tools/search/sql.aspx"
      7     br = mechanize.Browser()

ImportError: No module named mechanize

Notice:

  • Some values are large and negative - indicating a problem with the automated measurement routine. We will need to deal with these.
  • Sizes are "effective radii" in arcseconds. The typical resolution ("point spread function" effective radius) in an SDSS image is around 0.7".

Let's save this download for further use.


In [5]:
!mkdir -p downloads
sdssdata.to_csv("downloads/SDSSobjects.csv")

Visualizing Data in N-dimensions

This is, in general, difficult.

Looking at all possible 1 and 2-dimensional histograms/scatter plots helps a lot.

Color coding can bring in a 3rd dimension (and even a 4th). Interactive plots and movies are also well worth thinking about.

Here we'll follow a multi-dimensional visualization example due to Josh Bloom at UC Berkeley:


In [6]:
# We'll use astronomical g-r color  as the colorizer, and then plot 
# position, magnitude, size and color against each other.

data = pd.read_csv("downloads/SDSSobjects.csv",usecols=["ra","dec","u","g",\
                                                "r","i","size"])

# Filter out objects with bad magnitude or size measurements:
data = data[(data["u"] > 0) & (data["g"] > 0) & (data["r"] > 0) & (data["i"] > 0) & (data["size"] > 0)]

# Log size, and g-r color, will be more useful:
data['log_size'] = np.log10(data['size'])
data['g-r_color'] = data['g'] - data['r']

# Drop the things we're not so interested in:
del data['u'], data['g'], data['r'], data['size']

data.head()


Out[6]:
ra dec i log_size g-r_color
0 185.039442 15.048475 17.07611 0.802124 6.00185
1 185.039745 15.056901 24.27987 -0.605933 -1.39018
2 185.115121 15.068760 20.76789 -0.121693 0.75215
3 185.039157 15.142600 17.06890 0.499455 1.05669
4 185.039157 15.142600 17.13940 0.472507 1.07595

In [7]:
# Get ready to plot:
pd.set_option('display.max_columns', None)
# !pip install --upgrade seaborn 
import seaborn as sns
sns.set()

In [8]:
def plot_everything(data,colorizer,vmin=0.0,vmax=10.0):
    # Truncate the color map to retain contrast between faint objects.
    norm = matplotlib.colors.Normalize(vmin=vmin, vmax=vmax)
    cmap = matplotlib.cm.jet
    m = matplotlib.cm.ScalarMappable(norm=norm, cmap=cmap)
    plot = pd.scatter_matrix(data, alpha=0.2,figsize=[15,15],color=m.to_rgba(data[colorizer]))
    return

plot_everything(data,'g-r_color',vmin=-1.0, vmax=3.0)


Size-magnitude

Let's zoom in and look at the objects' (log) sizes and magnitudes.


In [9]:
zoom = data.copy()
del zoom['ra'],zoom['dec'],zoom['g-r_color']
plot_everything(zoom,'i',vmin=15.0, vmax=21.5)


Q: What features do you notice in this plot?

Talk to your neighbor for a minute or two about all the things that might be going on, and be ready to point things out to the class.


In [ ]: