A First Look at the SDSS Photometric "Galaxy" Catalog

  • The Sloan Digital Sky Survey imaged over 10,000 sq degrees of sky (about 25% of the total), automatically detecting, measuring and cataloging millions of "objects".
  • While the primary data products of the SDSS was (and still are) its spectroscopic surveys, the photometric survey provides an important testing ground for dealing with pure imaging surveys like those being carried out by DES and that is planned with LSST.
  • Let's download part of the SDSS photometric object catalog and explore it.

SDSS data release 12 (DR12) is described at the SDSS3 website and in the survey paper by Alam et al 2015.

We will use the SDSS DR12 SQL query interface. For help designing queries, the sample queries page is invaluable, and you will probably want to check out the links to the "schema browser" at some point as well. Notice the "check syntax only" button on the SQL query interface: this is very useful for debugging SQL queries.

Small test queries can be executed directly in the browser. Larger ones (involving more than a few tens of thousands of objects, or that involve a lot of processing) should be submitted via the CasJobs system. Try the browser first, and move to CasJobs when you need to.


In [10]:
%load_ext autoreload
%autoreload 2


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

In [11]:
from __future__ import print_function
import numpy as np
import SDSS
import pandas as pd
import matplotlib
%matplotlib inline

In [12]:
objects = "SELECT top 10000 \
ra, \
dec, \
type, \
dered_u as u, \
dered_g as g, \
dered_r as r, \
dered_i as i, \
petroR50_i AS size \
FROM PhotoObjAll \
WHERE \
((type = '3' OR type = '6') AND \
 ra > 185.0 AND ra < 185.2 AND \
 dec > 15.0 AND dec < 15.2)"
print (objects)


SELECT top 10000 ra, dec, type, dered_u as u, dered_g as g, dered_r as r, dered_i as i, petroR50_i AS size FROM PhotoObjAll WHERE ((type = '3' OR type = '6') AND  ra > 185.0 AND ra < 185.2 AND  dec > 15.0 AND dec < 15.2)

In [13]:
# Download data. This can take a while...
sdssdata = SDSS.select(objects)
sdssdata


Out[13]:
ra dec type u g r i size
0 185.104500 15.076783 6 23.93621 22.860210 23.515700 24.054850 0.862570
1 185.103931 15.009721 3 24.29502 21.258540 20.069270 19.503720 1.833879
2 185.104045 15.013108 6 23.02020 22.781240 24.082850 22.922130 -9999.000000
3 185.029622 15.000569 3 23.13610 23.839410 22.034750 21.218000 1.158018
4 185.030353 15.037621 3 23.10468 23.252300 22.236040 22.151280 -9999.000000
5 185.069606 15.108364 3 25.64520 23.091820 22.321930 21.988100 0.953011
6 185.070180 15.168047 6 26.16938 25.131100 22.714440 25.725830 -9999.000000
7 185.104247 15.049747 6 25.41220 23.512220 22.668890 22.710130 0.769923
8 185.104513 15.059163 3 25.38387 23.326820 21.724700 20.640760 1.123005
9 185.169109 15.141563 6 24.36791 24.995240 24.715620 18.083400 0.783298
10 185.168937 15.150393 6 24.60732 23.056880 25.174350 24.297150 -9999.000000
11 185.168867 15.156847 3 22.89884 24.242060 21.775570 22.643720 0.764053
12 185.142776 15.138419 6 24.11364 24.847180 22.584850 23.668150 0.235861
13 185.142390 15.196530 6 22.71120 20.009290 18.503360 17.695920 0.785834
14 185.142380 15.196531 6 23.05053 20.002360 18.500390 17.700720 0.780532
15 185.029565 15.072514 6 20.36992 19.007290 18.395420 18.203240 14.138170
16 185.029546 15.072507 6 20.44737 19.038330 18.433300 18.257150 0.568954
17 185.142599 15.128546 3 23.33476 23.258080 22.255700 22.264410 1.205970
18 185.143181 15.137649 6 24.46177 24.336100 22.813440 23.488260 0.133696
19 185.142388 15.196555 6 22.16118 19.980980 18.491460 17.708610 0.598621
20 185.142369 15.196548 6 22.34409 19.969700 18.490170 17.699830 0.610676
21 185.142987 15.067263 6 21.53541 20.837600 20.393400 20.368400 4.825226
22 185.142990 15.067267 6 21.56060 20.913500 20.496940 20.471370 0.637678
23 185.103930 15.009731 3 21.76995 21.087640 19.851200 19.248380 1.100747
24 185.103930 15.009731 3 21.95971 21.087640 19.851200 19.248380 1.100747
25 185.142354 15.110229 3 24.24521 23.745400 21.645410 20.798560 0.817345
26 185.143063 15.109992 6 25.41504 23.535110 21.786360 21.801430 0.598746
27 185.104482 15.059340 3 23.08536 23.376790 21.880150 20.667800 0.939030
28 185.030229 15.012726 6 25.83741 23.977870 25.528600 24.844340 -9999.000000
29 185.168604 15.187115 3 21.90811 21.996150 22.141640 21.242390 5.406465
... ... ... ... ... ... ... ... ...
2447 185.084091 15.077797 3 21.56499 21.144660 20.863570 20.390990 1.073938
2448 185.083516 15.014812 6 22.84028 22.515410 21.779100 21.334210 0.804840
2449 185.083215 15.146584 3 22.18934 21.291250 19.835430 19.260870 1.006691
2450 185.171958 15.123057 6 22.62804 22.221490 22.016660 21.875100 0.289114
2451 185.171492 15.143947 3 12.65860 9.581748 8.614206 8.332913 4.384937
2452 185.171488 15.143942 3 12.63452 16.098600 11.595090 15.277470 4.456294
2453 185.171509 15.143950 3 12.61089 9.609232 8.643597 8.344616 4.510882
2454 185.171509 15.143950 3 12.60215 9.592584 8.629543 8.335053 4.518852
2455 185.115018 15.078589 6 22.48607 21.866440 22.104810 23.357950 0.029568
2456 185.084008 15.065666 6 20.62765 19.751300 19.454200 19.330710 0.614099
2457 185.083686 15.183006 6 21.91978 19.251690 18.135040 17.681740 0.628691
2458 185.083706 15.183014 6 21.72219 19.247880 18.133570 17.678110 0.631972
2459 185.010262 15.100717 6 21.81659 24.488180 24.018200 24.065830 0.384759
2460 185.083089 15.144409 6 24.21687 24.650300 22.695870 25.091690 0.086733
2461 185.048294 15.099660 3 22.46678 20.639790 19.501630 19.051940 1.708054
2462 185.048294 15.099660 3 22.34695 20.639790 19.501630 19.051940 1.708054
2463 185.047432 15.114498 6 23.57134 22.139150 21.834050 21.755760 0.524460
2464 185.084046 15.065663 6 20.58062 19.709460 19.445490 19.348670 0.793510
2465 185.084053 15.065663 6 20.51167 19.690000 19.421420 19.326190 8.149798
2466 185.083708 15.183003 6 21.77857 19.286630 18.138770 17.673620 0.732809
2467 185.083718 15.183003 6 21.85649 19.285770 18.135460 17.674800 0.731617
2468 185.171483 15.025601 3 22.00554 21.736230 21.034760 20.778830 1.121808
2469 185.146734 15.118959 6 23.72149 21.682940 20.642210 20.254400 0.769458
2470 185.146566 15.124328 3 23.30293 22.005350 21.417640 21.466960 -9999.000000
2471 185.146150 15.199498 3 22.66912 22.496380 21.608710 21.084070 1.032514
2472 185.171968 15.050929 6 23.81077 23.084750 21.480300 20.711710 0.817662
2473 185.171937 15.106745 3 23.82541 22.374110 22.088520 21.065000 0.760733
2474 185.171506 15.022855 3 23.98929 21.790070 20.059560 19.388560 1.363000
2475 185.171172 15.063561 3 21.99070 21.078450 20.052710 21.811340 8.163319
2476 185.170966 15.036737 6 22.74667 22.695270 22.405020 22.143920 -9999.000000

2477 rows × 8 columns

Notice:

  • Some values are large and negative - indicating a problem with the automated measurement routine. We will need to deal with these.
  • Sizes are "effective radii" in arcseconds. The typical resolution ("point spread function" effective radius) in an SDSS image is around 0.7".

Let's save this download for further use.


In [14]:
!mkdir -p downloads
sdssdata.to_csv("downloads/SDSSobjects.csv")

Visualizing Data in N-dimensions

This is, in general, difficult.

Looking at all possible 1 and 2-dimensional histograms/scatter plots helps a lot.

Color coding can bring in a 3rd dimension (and even a 4th). Interactive plots and movies are also well worth thinking about.

Here we'll follow a multi-dimensional visualization example due to Josh Bloom at UC Berkeley:


In [15]:
# We'll use astronomical g-r color  as the colorizer, and then plot 
# position, magnitude, size and color against each other.

data = pd.read_csv("downloads/SDSSobjects.csv",usecols=["ra","dec","u","g",\
                                                "r","i","size"])

# Filter out objects with bad magnitude or size measurements:
data = data[(data["u"] > 0) & (data["g"] > 0) & (data["r"] > 0) & (data["i"] > 0) & (data["size"] > 0)]

# Log size, and g-r color, will be more useful:
data['log_size'] = np.log10(data['size'])
data['g-r_color'] = data['g'] - data['r']

# Drop the things we're not so interested in:
del data['u'], data['g'], data['r'], data['size']

data.head()


Out[15]:
ra dec i log_size g-r_color
0 185.104500 15.076783 24.05485 -0.064206 -0.65549
1 185.103931 15.009721 19.50372 0.263371 1.18927
3 185.029622 15.000569 21.21800 0.063715 1.80466
5 185.069606 15.108364 21.98810 -0.020902 0.76989
7 185.104247 15.049747 22.71013 -0.113552 0.84333

In [16]:
# Get ready to plot:
pd.set_option('display.max_columns', None)
# !pip install --upgrade seaborn 
import seaborn as sns
sns.set()

In [17]:
def plot_everything(data,colorizer,vmin=0.0,vmax=10.0):
    # Truncate the color map to retain contrast between faint objects.
    norm = matplotlib.colors.Normalize(vmin=vmin, vmax=vmax)
    cmap = matplotlib.cm.jet
    m = matplotlib.cm.ScalarMappable(norm=norm, cmap=cmap)
    plot = pd.scatter_matrix(data, alpha=0.2,figsize=[15,15],color=m.to_rgba(data[colorizer]))
    return

plot_everything(data,'g-r_color',vmin=-1.0, vmax=3.0)


Size-magnitude

Let's zoom in and look at the objects' (log) sizes and magnitudes.


In [18]:
zoom = data.copy()
del zoom['ra'],zoom['dec'],zoom['g-r_color']
plot_everything(zoom,'i',vmin=15.0, vmax=21.5)


Q: What features do you notice in this plot?

Talk to your neighbor for a minute or two about all the things that might be going on, and be ready to point things out to the class.