In [1]:

    
%pylab inline
from pandas.io.parsers import read_csv
d = read_csv('NGC6181.txt', index_col=0) # made with SQL_NGC6181.txt









    



Populating the interactive namespace from numpy and matplotlib



In [2]:

    
# How good were the photo-z's for the objects identified 
# as satellites of NGC 6181?  I.e. how many sigma's away 
# from z_host (where "sigma" is the photoZ error)?
objid_sat = [1237662698115433445, 1237662661610767569, 
             1237662698115432544, 1237662662147571761]
z_true = 0.007922 # z-spec of NGC 6181

# There are two flavors of SDSS photoz's: one based on a
# nearest neighbor algorithm (NN), and one based on a 
# random forest (RF) algorithm.
nsigma_nn = np.abs((d.zNN-z_true)/d.zNN_err)
nsigma_rf = np.abs((d.zRF-z_true)/d.zRF_err)

print 'N_sigma_photozNN: '
print nsigma_nn[objid_sat]
print ' '
print 'N_sigma_photozRF: '
print nsigma_rf[objid_sat]









    



N_sigma_photozNN: 
1237662698115433445    0.716082
1237662661610767569    1.152024
1237662698115432544    1.552936
1237662662147571761    0.955990
dtype: float64
 
N_sigma_photozRF: 
1237662698115433445    0.536328
1237662661610767569    1.059515
1237662698115432544    1.768980
1237662662147571761    0.855470
dtype: float64

The takeaway is that the SDSS photoz's were within 1.8$\sigma$ of the host redshift. By what factor could we reduce the number of targets if we require that both flavors of photoZ are within some number of sigma of the host redshift?



In [3]:

    
nsigma = 3.0 # maximum (photoz-zHost)/photoz_error
r_max = 20.5 # maximum r mag
gr_max = 1.3 # maximum g-r color
ri_max = 0.7 # maximum r-i color

# Baseline cuts are (r mag, g-r color, r-i color)
wh_baseline = np.where((d.r<r_max) & 
                       ((d.g-d.r)<gr_max) & 
                       ((d.r-d.i)<ri_max))[0]

# New cuts are Baseline + nsigma_photoz.
wh_target = np.where((d.r<r_max) & 
                     ((d.g-d.r)<gr_max) & 
                     ((d.r-d.i)<ri_max) &
                     (nsigma_nn<nsigma) &
                     (nsigma_rf<nsigma))[0]

n_baseline = len(wh_baseline)
n_target = len(wh_target)
print '%i objects in baseline sample.'%n_baseline
print '%i objects in new sample.'%n_target
print 'Reduction of %0.2f'%(1.*n_baseline/n_target)









    



2247 objects in baseline sample.
702 objects in new sample.
Reduction of 3.20

The reduction factor is even better if you only keep objects within 2.0$\sigma$ of the host redshift, and the reduction factors don't seem to depend much on the maximum $r$ magnitude of the sample:

$r_{\rm{max}}$	$\sigma_{\rm{photoZ}}$	Reduction factor
19.5	2.0	6.6
20.5	2.0	6.9
19.5	3.0	3.4
20.5	3.0	3.2

I'm sure there's a tradeoff here. If you cut on $\sigma_{\rm{photoZ}}$, you'll reduce the number of targets by a factor of 3-7 (and perhaps another factor of 2 if you cut on SDSS's star/galaxy classification), but at the expense of completeness in the final satellite galaxy sample.