A First Look at the SDSS Photometric "Galaxy" Catalog

  • The Sloan Digital Sky Survey imaged over 10,000 sq degrees of sky (about 25% of the total), automatically detecting, measuring and cataloging millions of "objects".
  • While the primary data products of the SDSS was (and still are) its spectroscopic surveys, the photometric survey provides an important testing ground for dealing with pure imaging surveys like those being carried out by DES and that is planned with LSST.
  • Let's download part of the SDSS photometric object catalog and explore it.

SDSS data release 12 (DR12) is described at the SDSS3 website and in the survey paper by Alam et al 2015.

We will use the SDSS DR12 SQL query interface. For help designing queries, the sample queries page is invaluable, and you will probably want to check out the links to the "schema browser" at some point as well. Notice the "check syntax only" button on the SQL query interface: this is very useful for debugging SQL queries.

Small test queries can be executed directly in the browser. Larger ones (involving more than a few tens of thousands of objects, or that involve a lot of processing) should be submitted via the CasJobs system. Try the browser first, and move to CasJobs when you need to.


In [10]:
%load_ext autoreload
%autoreload 2


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

In [11]:
import numpy as np
import SDSS
import pandas as pd
import matplotlib
%matplotlib inline

In [12]:
objects = "SELECT top 10000 \
ra, \
dec, \
type, \
dered_u as u, \
dered_g as g, \
dered_r as r, \
dered_i as i, \
petroR50_i AS size \
FROM PhotoObjAll \
WHERE \
((type = '3' OR type = '6') AND \
 ra > 185.0 AND ra < 185.2 AND \
 dec > 15.0 AND dec < 15.2)"
print objects


SELECT top 10000 ra, dec, type, dered_u as u, dered_g as g, dered_r as r, dered_i as i, petroR50_i AS size FROM PhotoObjAll WHERE ((type = '3' OR type = '6') AND  ra > 185.0 AND ra < 185.2 AND  dec > 15.0 AND dec < 15.2)

In [13]:
# Download data. This can take a while...
sdssdata = SDSS.select(objects)
sdssdata


Out[13]:
ra dec type u g r i size
0 185.104482 15.059340 3 23.08536 23.376790 21.880150 20.66780 0.939030
1 185.034948 15.026003 6 19.16464 17.983400 17.444010 17.23920 0.608007
2 185.034967 15.026010 6 19.13234 17.978250 17.439840 17.22829 18.455450
3 185.034238 15.031530 6 19.96964 17.426190 16.081630 15.49693 0.606615
4 185.034257 15.031537 3 19.91641 17.420710 16.078250 15.49137 28.067300
5 185.144354 15.147240 3 25.24765 23.857720 22.315320 21.73093 0.730428
6 185.034674 15.044998 6 24.15186 22.770340 23.263200 22.80923 0.745783
7 185.171483 15.025601 3 22.00554 21.736230 21.034760 20.77883 1.121808
8 185.171172 15.063561 3 21.99070 21.078450 20.052710 21.81134 8.163319
9 185.170966 15.036737 6 22.74667 22.695270 22.405020 22.14392 -9999.000000
10 185.034297 15.188846 3 22.80773 23.138470 22.342850 21.55695 0.969052
11 185.034279 15.160494 3 24.42778 22.248400 21.623540 20.82874 1.182909
12 185.034959 15.026000 6 19.12823 17.973030 17.477270 17.22804 0.687639
13 185.104500 15.076783 6 23.93621 22.860210 23.515700 24.05485 0.862570
14 185.171968 15.050929 6 23.81077 23.084750 21.480300 20.71171 0.817662
15 185.171937 15.106745 3 23.82541 22.374110 22.088520 21.06500 0.760733
16 185.034506 15.181244 3 25.29939 22.520600 21.909250 21.89507 1.083006
17 185.144182 15.147234 3 23.44359 23.459080 21.632500 20.72270 2.053757
18 185.143397 15.000424 6 24.82401 23.983580 22.895010 21.71220 0.652893
19 185.171506 15.022855 3 23.98929 21.790070 20.059560 19.38856 1.363000
20 185.035103 15.097675 6 24.40351 22.078730 20.681450 19.09828 0.728575
21 185.035227 15.013557 6 24.18616 21.348960 19.881640 19.07122 0.687393
22 185.034433 15.181255 3 23.01478 22.726590 21.674800 21.19698 1.505182
23 185.034579 15.143307 3 21.99415 22.539670 21.660930 21.22481 1.464107
24 185.171939 15.050963 6 24.77677 22.702960 21.354420 20.62455 0.634682
25 185.171920 15.123017 3 19.94064 19.160240 18.535020 28.88894 -9999.000000
26 185.171524 15.022868 3 24.23270 21.721910 20.140360 19.54562 0.893092
27 185.171512 15.143962 3 12.46776 9.559992 8.614470 11.87753 3.990055
28 185.171512 15.143965 3 12.46340 9.559937 8.614264 11.85494 4.122030
29 185.171531 15.143972 3 12.40582 9.567160 8.620600 11.95000 4.253851
... ... ... ... ... ... ... ... ...
2447 185.053431 15.080240 6 23.03788 22.003260 21.564660 21.48404 0.991410
2448 185.053262 15.142897 3 24.73906 22.183430 21.518830 21.20530 1.759358
2449 185.053262 15.142897 3 25.45567 22.183430 21.518830 21.20530 1.759358
2450 185.132380 15.030587 3 20.92007 19.872640 19.052540 18.67648 1.463706
2451 185.093784 15.124106 6 19.53112 18.607810 18.327750 18.21097 0.742270
2452 185.093791 15.124106 6 19.52049 18.604040 18.317890 18.20651 0.744036
2453 185.053830 15.053567 6 25.54181 25.539680 22.776880 23.92971 -9999.000000
2454 185.053629 15.054853 6 23.85557 24.279640 22.805660 24.65496 -9999.000000
2455 185.132359 15.070656 3 25.87104 25.011520 24.083430 21.20392 0.750912
2456 185.131641 15.110426 6 25.03502 25.180610 24.087230 21.87996 0.471935
2457 185.131779 15.171269 6 26.22074 24.613190 23.139790 20.99911 0.547384
2458 185.054072 15.125439 6 22.86715 21.701130 21.406760 21.35566 0.389780
2459 185.053397 15.080214 6 23.35620 22.437060 21.441580 21.25305 0.726784
2460 185.132465 15.070647 3 20.51569 21.211200 21.280830 19.77906 4.888654
2461 185.132743 15.037235 3 23.50838 24.739530 22.631350 21.44369 0.740849
2462 185.132121 15.198075 6 23.77910 24.560890 25.554020 21.65249 0.403492
2463 185.093343 15.159540 6 23.37502 23.226460 23.138480 24.12525 -9999.000000
2464 185.093248 15.058431 6 23.59816 22.951700 23.636430 23.89342 -9999.000000
2465 185.167454 15.063702 3 24.29908 22.406010 22.043480 25.14104 -9999.000000
2466 185.167270 15.066095 3 22.34697 21.961100 21.269050 20.92851 1.229776
2467 185.168235 15.187504 3 22.33530 -9999.000000 22.167600 21.86952 0.756736
2468 185.167954 15.003770 3 23.22597 23.113860 21.626360 21.15303 5.034228
2469 185.168224 15.187654 3 26.48117 23.781440 22.627780 20.83910 2.415193
2470 185.093748 15.124122 6 19.58636 18.642430 18.321820 18.22530 0.623525
2471 185.093769 15.124130 6 19.56442 18.637420 18.315640 18.21874 0.628939
2472 185.197521 15.039781 3 25.94781 23.185490 21.452710 20.46318 0.617621
2473 185.167863 15.003808 3 23.32225 23.381040 22.139900 21.61577 0.778119
2474 185.132727 15.037253 3 23.48824 24.641450 22.637780 21.55171 0.685787
2475 185.132123 15.198082 6 23.67965 24.402040 25.757230 21.70107 0.394497
2476 185.093558 15.006783 3 22.70092 23.113340 21.895820 21.30913 0.595542

2477 rows × 8 columns

Notice:

  • Some values are large and negative - indicating a problem with the automated measurement routine. We will need to deal with these.
  • Sizes are "effective radii" in arcseconds. The typical resolution ("point spread function" effective radius) in an SDSS image is around 0.7".

Let's save this download for further use.


In [14]:
!mkdir -p downloads
sdssdata.to_csv("downloads/SDSSobjects.csv")

Visualizing Data in N-dimensions

This is, in general, difficult.

Looking at all possible 1 and 2-dimensional histograms/scatter plots helps a lot.

Color coding can bring in a 3rd dimension (and even a 4th). Interactive plots and movies are also well worth thinking about.

Here we'll follow a multi-dimensional visualization example due to Josh Bloom at UC Berkeley:


In [15]:
# We'll use astronomical g-r color  as the colorizer, and then plot 
# position, magnitude, size and color against each other.

data = pd.read_csv("downloads/SDSSobjects.csv",usecols=["ra","dec","u","g",\
                                                "r","i","size"])

# Filter out objects with bad magnitude or size measurements:
data = data[(data["u"] > 0) & (data["g"] > 0) & (data["r"] > 0) & (data["i"] > 0) & (data["size"] > 0)]

# Log size, and g-r color, will be more useful:
data['log_size'] = np.log10(data['size'])
data['g-r_color'] = data['g'] - data['r']

# Drop the things we're not so interested in:
del data['u'], data['g'], data['r'], data['size']

data.head()


Out[15]:
ra dec i log_size g-r_color
0 185.104482 15.059340 20.66780 -0.027320 1.49664
1 185.034948 15.026003 17.23920 -0.216092 0.53939
2 185.034967 15.026010 17.22829 1.266125 0.53841
3 185.034238 15.031530 15.49693 -0.217086 1.34456
4 185.034257 15.031537 15.49137 1.448201 1.34246

In [7]:
# Get ready to plot:
pd.set_option('display.max_columns', None)
# !pip install --upgrade seaborn 
import seaborn as sns
sns.set()

In [8]:
def plot_everything(data,colorizer,vmin=0.0,vmax=10.0):
    # Truncate the color map to retain contrast between faint objects.
    norm = matplotlib.colors.Normalize(vmin=vmin, vmax=vmax)
    cmap = matplotlib.cm.jet
    m = matplotlib.cm.ScalarMappable(norm=norm, cmap=cmap)
    plot = pd.scatter_matrix(data, alpha=0.2,figsize=[15,15],color=m.to_rgba(data[colorizer]))
    return

plot_everything(data,'g-r_color',vmin=-1.0, vmax=3.0)


Size-magnitude

Let's zoom in and look at the objects' (log) sizes and magnitudes.


In [9]:
zoom = data.copy()
del zoom['ra'],zoom['dec'],zoom['g-r_color']
plot_everything(zoom,'i',vmin=15.0, vmax=21.5)


Q: What features do you notice in this plot?

Talk to your neighbor for a minute or two about all the things that might be going on, and be ready to point things out to the class.


In [ ]: