Fit binned-floor acquisition probability model in 2019-08

This is the acquisition and probability model calculated in 2019-08.

It is NOT promoted to flight due to lack of change from 2018-11.

Copied from the 2018-11 notebook, modified slightly and re-run.

Changes

Factored out the date and model name to single values at the top.
Made the data start time be a fixed 4.5 years before end time.
Fixed the calculation of probability confidence intervals in the comparison to flight data at 120 arcsec box size.
Added another temperature bin in the flight comparison plot to show higher temperature data.
Used chandra_aca.star_probs to compute the new model instead of a local replica of that code.

Key features of the model for color != 1.5 stars (with good mag estimates)

Incorporates ASVT data for t_ccd >= -10 C in order to provide reasonable estimates of probability in regimes with little or no flight data.
Fits a quadradatic model for p_fail (probit) as a function of t_ccd in a series of magnitude bins. The mag bins are driven by magnitudes of ASVT simulated data.
Model now includes a floor parameter that sets a hard lower limit on p_fail. This is seen in data and represents other factors that cause acquisition failure independent of t_ccd. In other words, even for an arbitrarily cold CCD there will still be a small fraction of acquisition failures. For flight data this can include spoilers or an ionizing radiation flag.
As in past models, the p_fail model is adjusted by a box_delta term which applies a search-box dependent offset in probit space. The box_delta term is defined to have a value of 0.0 for box halfwidth = 120.
The global model (for arbitrary mag) is computed by linearly interpolating the binned quadratic coefficients as a function of mag. The previous flight model (spline) did a global mag - t_ccd fit using a 5-element spline in the mag direction.

Key features of the model for color == 1.5 stars (with poor mag estimates)

Post AGASC 1.7, there is inadequate data to independently perform the binned fitting.
Instead assume a magnitude error distribution which is informed by examining the observed distribution of dmag = mag_obs - mag_aca (observed - catalog). This turns out to be well-represented by an exp(-abs(dmag) / dmag_scale) distribution. This contrasts with a gaussian that scales as exp(dmag^2).
Use the assumed mag error distribution and sample the color != 1.5 star probabilities accordingly and compute the weighted mean failure probability.
Flight data show a steeper falloff for dmag > 0 (stars observed to be fainter than expected) than for dmag < 0. As noted by JC this likely includes a survival effect that stars which are actually much fainter don't get acquired and do not get into the sample. Indeed using the observed distribution gives a poor fit to flight data, so dmag_scale for dmag > 0 was arbitrarily increased from 2.8 to 4.0 in order to better fit flight data.

Model details

In order to get a good match to flight data for faint stars near -11 C, it was necessary to apply an ad-hoc correction to ASVT data for mag > 10.1. The correction effectively made the model assume smaller search box sizes, so for the canonical 120 arcsec box the model p_fail is slightly increased relative to the raw failure rate from ASVT.
The mag = 8.0 data from ASVT show a dependence on search-box size that is flipped from usual. There are more failures for smaller search boxes, though we are dealing with small number statistics (up to 3 fails per bin). This caused problems in the fitting, so for this bin the box_delta term was simplied zeroed out and a good fit was obtained in the automatic fit process. Since p_fail is quite low in all cases this has little practical impact either way.
Fitting now uses binomial statistics to compute the fit statistic during model parameter optimization. Previously it was using a poisson statistic which is similar except near 1.0. The poisson statistic is built in to Sherpa and was easier, but the new binomial statistic is formally correct and behaves better for probabilities near 1.0.

Paradigm shift for production model implementation

This model is complicated, and the color = 1.5 star case is computationally intensive.
Instead of transfering the analytic algorithm and fit values into chandra_aca.star_probs for production use, take a new approach of generating a 3-d grid of p_fail (in probit space) as a function of mag, t_ccd, and halfwidth. Do this for color != 1.5 and color = 1.5.
The ranges are 5.0 <= mag <= 12.0, -16 <= t_ccd <= -1, and 60 <= halfwidth <= 180. Values outside that range are clipped.
This separates the model generation from the production model calculation.
Gridded 3-d linear interpolation is used in chandra_aca and is quite fast.
The gridded value files are about 150 kb, and make it easy to generate new models without changing code in chandra_aca (except for a hard-coded value for the default model).



In [1]:

    
import sys
import os
from itertools import count
from pathlib import Path

# Include utils.py for asvt_utils
sys.path.insert(0, str(Path(os.environ['HOME'], 'git', 'skanb', 'pea-test-set')))
import utils as asvt_utils

import numpy as np
import matplotlib.pyplot as plt
from astropy.table import Table, vstack
from astropy.time import Time
import tables
from scipy import stats
from scipy.interpolate import CubicSpline
from Chandra.Time import DateTime
from astropy.table import Table
from chandra_aca.star_probs import (get_box_delta, broadcast_arrays, 
                                    acq_success_prob, grid_model_acq_prob)
from chandra_aca import star_probs

%matplotlib inline



In [2]:

    
MODEL_DATE = '2019-08'
MODEL_NAME = f'grid-floor-{MODEL_DATE}'



In [3]:

    
np.random.seed(0)



In [4]:

    
SKA = Path(os.environ['SKA'])

Get acq stats data and clean



In [5]:

    
# Make a map of AGASC_ID to AGACS 1.7 MAG_ACA.  The acq_stats.h5 file has whatever MAG_ACA
# was in place at the time of planning the loads.
# Define new term `red_mag_err` which is used here in place of the 
# traditional COLOR1 == 1.5 test.
with tables.open_file(str(SKA / 'data' / 'agasc' / 'miniagasc_1p7.h5'), 'r') as h5:
    agasc_mag_aca = h5.root.data.col('MAG_ACA')
    agasc_id = h5.root.data.col('AGASC_ID')
    has_color3 = h5.root.data.col('RSV3') != 0  # 
    red_star = np.isclose(h5.root.data.col('COLOR1'), 1.5)
    mag_aca_err = h5.root.data.col('MAG_ACA_ERR') / 100
    red_mag_err = red_star & ~has_color3  # MAG_ACA, MAG_ACA_ERR is potentially inaccurate



In [6]:

    
agasc1p7_idx = {id: idx for id, idx in zip(agasc_id, count())}
agasc1p7 = Table([agasc_mag_aca, mag_aca_err, red_mag_err], 
                 names=['mag_aca', 'mag_aca_err', 'red_mag_err'], copy=False)



In [7]:

    
acq_file = str(SKA / 'data' / 'acq_stats' / 'acq_stats.h5')
with tables.open_file(str(acq_file), 'r') as h5:
    cols = h5.root.data.cols
    names = {'tstart': 'guide_tstart',
             'obsid': 'obsid',
             'obc_id': 'acqid',
             'halfwidth': 'halfw',
             'warm_pix': 'n100_warm_frac',
             'mag_aca': 'mag_aca',
             'mag_obs': 'mean_trak_mag',
             'known_bad': 'known_bad',
             'color': 'color1',
            'img_func': 'img_func', 
            'ion_rad': 'ion_rad',
            'sat_pix': 'sat_pix',
             'agasc_id': 'agasc_id',
             't_ccd': 'ccd_temp',
            'slot': 'slot'}
    acqs = Table([getattr(cols, h5_name)[:] for h5_name in names.values()],
                 names=list(names.keys()))



In [8]:

    
year_q0 = 1999.0 + 31. / 365.25  # Jan 31 approximately
acqs['year'] = Time(acqs['tstart'], format='cxcsec').decimalyear.astype('f4')
acqs['quarter'] = (np.trunc((acqs['year'] - year_q0) * 4)).astype('f4')



In [9]:

    
# Create 'fail' column, rewriting history as if the OBC always
# ignore the MS flag in ID'ing acq stars.
#
# CHECK: is ion_rad being ignored on-board?
# Answer: Not as of 2019-09
#
obc_id = acqs['obc_id']
obc_id_no_ms = (acqs['img_func'] == 'star') & ~acqs['sat_pix'] & ~acqs['ion_rad']
acqs['fail'] = np.where(obc_id | obc_id_no_ms, 0.0, 1.0)

# Re-map acq_stats database magnitudes for AGASC 1.7
acqs['mag_aca'] = [agasc1p7['mag_aca'][agasc1p7_idx[agasc_id]] for agasc_id in acqs['agasc_id']]
acqs['red_mag_err'] = [agasc1p7['red_mag_err'][agasc1p7_idx[agasc_id]] for agasc_id in acqs['agasc_id']]
acqs['mag_aca_err'] = [agasc1p7['mag_aca_err'][agasc1p7_idx[agasc_id]] for agasc_id in acqs['agasc_id']]

# Add a flag to distinguish flight from ASVT data
acqs['asvt'] = False



In [10]:

    
# Filter for year and mag
#
year_max = Time(f'{MODEL_DATE}-01').decimalyear
year_min = year_max - 4.5
acq_ok = ((acqs['year'] > year_min) & (acqs['year'] < year_max) & 
          (acqs['mag_aca'] > 7.0) & (acqs['mag_aca'] < 11) &
          (~np.isclose(acqs['color'], 0.7)))



In [11]:

    
# Filter known bad obsids.  NOTE: this is no longer doing anything, but
# consider updating the list of known bad obsids or obtaining programmically?

print('Filtering known bad obsids, start len = {}'.format(np.count_nonzero(acq_ok)))
bad_obsids = [
    # Venus
    2411,2414,6395,7306,7307,7308,7309,7311,7312,7313,7314,7315,7317,7318,7406,583,
    7310,9741,9742,9743,9744,9745,9746,9747,9749,9752,9753,9748,7316,15292,16499,
    16500,16501,16503,16504,16505,16506,16502,
    ]
for badid in bad_obsids:
    acq_ok = acq_ok & (acqs['obsid'] != badid)
print('Filtering known bad obsids, end len = {}'.format(np.count_nonzero(acq_ok)))









    



Filtering known bad obsids, start len = 62138
Filtering known bad obsids, end len = 62138

Get ASVT data and make it look more like acq stats data



In [12]:

    
peas = Table.read('pea_analysis_results_2018_299_CCD_temp_performance.csv', format='ascii.csv')
peas = asvt_utils.flatten_pea_test_data(peas)
peas = peas[peas['ccd_temp'] > -10.5]



In [13]:

    
# Version of ASVT PEA data that is more flight-like
fpeas = Table([peas['star_mag'], peas['ccd_temp'], peas['search_box_hw']],
              names=['mag_aca', 't_ccd', 'halfwidth'])
fpeas['year'] = np.random.uniform(2019.0, 2019.5, size=len(peas))
fpeas['color'] = 1.0
fpeas['quarter'] = (np.trunc((fpeas['year'] - year_q0) * 4)).astype('f4')
fpeas['fail'] = 1.0 - peas['search_success']
fpeas['asvt'] = True
fpeas['red_mag_err'] = False
fpeas['mag_obs'] = 0.0

Combine flight acqs and ASVT data



In [14]:

    
data_all = vstack([acqs[acq_ok]['year', 'fail', 'mag_aca', 't_ccd', 'halfwidth', 'quarter', 
                                'color', 'asvt', 'red_mag_err', 'mag_obs'], 
                   fpeas])
data_all.sort('year')

Compute box probit delta term based on box size



In [15]:

    
# Adjust probability (in probit space) for box size. 
data_all['box_delta'] = get_box_delta(data_all['halfwidth'])

# Put in an ad-hoc penalty on ASVT data that introduces up to a -0.3 shift
# on probit probability.  It goes from 0.0 for mag < 10.1 up to 0.3 at mag=10.4.
ok = data_all['asvt']
box_delta_tweak = (data_all['mag_aca'][ok] - 10.1).clip(0, 0.3)
data_all['box_delta'][ok] -= box_delta_tweak

# Another ad-hoc tweak: the mag=8.0 data show more failures at smaller
# box sizes.  This confounds the fitting.  For this case only just
# set the box deltas to zero and this makes the fit work.
ok = data_all['asvt'] & (data_all['mag_aca'] == 8)
data_all['box_delta'][ok] = 0.0



In [16]:

    
data_all = data_all.group_by('quarter')
data_all0 = data_all.copy()  # For later augmentation with simulated red_mag_err stars
data_mean = data_all.groups.aggregate(np.mean)

Model definition



In [17]:

    
def t_ccd_normed(t_ccd):
    return (t_ccd + 8.0) / 8.0

def p_fail(pars, 
           t_ccd, tc2=None,
           box_delta=0, rescale=True, probit=False):
    """
    Acquisition probability model

    :param pars: p0, p1, p2 (quadratic in t_ccd) and floor (min p_fail)
    :param t_ccd: t_ccd (degC) or scaled t_ccd if rescale is False.
    :param tc2: (scaled t_ccd) ** 2, this is just for faster fitting
    :param box_delta: delta p_fail for search box size
    :param rescale: rescale t_ccd to about -1 to 1 (makes P0, P1, P2 better-behaved)
    :param probit: return probability as probit instead of 0 to 1.
    """
    p0, p1, p2, floor = pars

    tc = t_ccd_normed(t_ccd) if rescale else t_ccd
    
    if tc2 is None:
        tc2 = tc ** 2
    
    # Make sure box_delta has right dimensions
    tc, box_delta = np.broadcast_arrays(tc, box_delta)

    # Compute the model.  Also clip at +10 to avoid values that are
    # exactly 1.0 at 64-bit precision.
    probit_p_fail = (p0 + p1 * tc + p2 * tc2 + box_delta).clip(floor, 10)

    # Possibly transform from probit to linear probability
    out = probit_p_fail if probit else stats.norm.cdf(probit_p_fail)
    return out

def p_acq_fail(data=None):
    """
    Sherpa fit function wrapper to ensure proper use of data in fitting.
    """
    if data is None:
        data = data_all
        
    tc = t_ccd_normed(data['t_ccd'])
    tc2 = tc ** 2
    box_delta = data['box_delta']
    
    def sherpa_func(pars, x=None):
        return p_fail(pars, tc, tc2, box_delta, rescale=False)

    return sherpa_func

Model fitting functions



In [18]:

    
def calc_binom_stat(data, model, staterror=None, syserror=None, weight=None, bkg=None):
    """
    Calculate log-likelihood for a binomial probability distribution
    for a single trial at each point.
    
    Defining p = model, then probability of seeing data == 1 is p and
    probability of seeing data == 0 is (1 - p).  Note here that ``data``
    is strictly either 0.0 or 1.0, and np.where interprets those float
    values as False or True respectively.
    """
    fit_stat = -np.sum(np.log(np.where(data, model, 1.0 - model)))    
    return fit_stat, np.ones(1)



In [19]:

    
def fit_poly_model(data):
    from sherpa import ui
    
    comp_names = ['p0', 'p1', 'p2', 'floor']

    data_id = 1
    ui.set_method('simplex')
    
    # Set up the custom binomial statistics
    ones = np.ones(len(data))
    ui.load_user_stat('binom_stat', calc_binom_stat, lambda x: ones)
    ui.set_stat(binom_stat)

    # Define the user model
    ui.load_user_model(p_acq_fail(data), 'model')
    ui.add_user_pars('model', comp_names)
    ui.set_model(data_id, 'model')
    ui.load_arrays(data_id, np.array(data['year']), np.array(data['fail'], dtype=np.float))

    # Initial fit values from fit of all data
    fmod = ui.get_model_component('model')

    # Define initial values / min / max
    # This is the p_fail value at t_ccd = -8.0
    fmod.p0 = -2.605
    fmod.p0.min = -10
    fmod.p0.max = 10

    # Linear slope of p_fail
    fmod.p1 = 2.5
    fmod.p1.min = 0.0
    fmod.p1.max = 10
    
    # Quadratic term.  Only allow negative curvature, and not too much at that.
    fmod.p2 = 0.0
    fmod.p2.min = -1
    fmod.p2.max = 0

    # Floor to p_fail.
    fmod.floor = -2.6
    fmod.floor.min = -2.6
    fmod.floor.max = -0.5

    ui.fit(data_id)

    return ui.get_fit_results()

Plotting and validation



In [20]:

    
def plot_fails_mag_aca_vs_t_ccd(mag_bins, data_all, year0=2015.0):
    ok = (data_all['year'] > year0) & ~data_all['fail'].astype(bool)
    da = data_all[ok]
    fuzzx = np.random.uniform(-0.3, 0.3, len(da))
    fuzzy = np.random.uniform(-0.125, 0.125, len(da))
    plt.plot(da['t_ccd'] + fuzzx, da['mag_aca'] + fuzzy, '.C0', markersize=4)

    ok = (data_all['year'] > year0) & data_all['fail'].astype(bool)
    da = data_all[ok]
    fuzzx = np.random.uniform(-0.3, 0.3, len(da))
    fuzzy = np.random.uniform(-0.125, 0.125, len(da))
    plt.plot(da['t_ccd'] + fuzzx, da['mag_aca'] + fuzzy, '.C1', markersize=4, alpha=0.8)
    
    # plt.xlim(-18, -10)
    # plt.ylim(7.0, 11.1)
    x0, x1 = plt.xlim()
    for y in mag_bins:
        plt.plot([x0, x1], [y, y], '-', color='r', linewidth=2, alpha=0.8)
    plt.xlabel('T_ccd (C)')
    plt.ylabel('Mag_aca')
    plt.title(f'Acq successes (blue) and failures (orange) since {year0}')
    plt.grid()



In [21]:

    
def plot_fit_grouped(data, group_col, group_bin, log=False, colors='br', label=None, probit=False):
    
    group = np.trunc(data[group_col] / group_bin)
    data = data.group_by(group)
    data_mean = data.groups.aggregate(np.mean)
    len_groups = np.diff(data.groups.indices)
    data_fail = data_mean['fail']
    model_fail = np.array(data_mean['model'])
    
    fail_sigmas = np.sqrt(data_fail * len_groups) / len_groups
    
    # Possibly plot the data and model probabilities in probit space
    if probit:
        dp = stats.norm.ppf(np.clip(data_fail + fail_sigmas, 1e-6, 1-1e-6))
        dm = stats.norm.ppf(np.clip(data_fail - fail_sigmas, 1e-6, 1-1e-6))
        data_fail = stats.norm.ppf(data_fail)
        model_fail = stats.norm.ppf(model_fail)
        fail_sigmas = np.vstack([data_fail - dm, dp - data_fail])
            
    plt.errorbar(data_mean[group_col], data_fail, yerr=fail_sigmas, 
                 fmt='.' + colors[1], label=label, markersize=8)
    plt.plot(data_mean[group_col], model_fail, '-' + colors[0])
    
    if log:
        ax = plt.gca()
        ax.set_yscale('log')



In [22]:

    
def mag_filter(mag0, mag1):
    ok = (data_all['mag_aca'] > mag0) & (data_all['mag_aca'] < mag1)
    return ok



In [23]:

    
def t_ccd_filter(t_ccd0, t_ccd1):
    ok = (data_all['t_ccd'] > t_ccd0) & (data_all['t_ccd'] < t_ccd1)
    return ok



In [24]:

    
def wp_filter(wp0, wp1):
    ok = (data_all['warm_pix'] > wp0) & (data_all['warm_pix'] < wp1)
    return ok

Define magnitude bins for fitting and show data



In [25]:

    
mag_centers = np.array([6.3, 8.1, 9.1, 9.55, 9.75, 10.0, 10.25, 10.55, 10.75, 11.0])
mag_bins = (mag_centers[1:] + mag_centers[:-1]) / 2
mag_means = np.array([8.0, 9.0, 9.5, 9.75, 10.0, 10.25, 10.5, 10.75])



In [26]:

    
for m0, m1, mm in zip(mag_bins[:-1], mag_bins[1:], mag_means):
    ok = (data_all['asvt'] == False) & (data_all['mag_aca'] >= m0) & (data_all['mag_aca'] < m1)
    print(f"m0={m0:.2f} m1={m1:.2f} mean_mag={data_all['mag_aca'][ok].mean():.2f} vs. {mm}")









    



m0=7.20 m1=8.60 mean_mag=8.05 vs. 8.0
m0=8.60 m1=9.32 mean_mag=8.98 vs. 9.0
m0=9.32 m1=9.65 mean_mag=9.48 vs. 9.5
m0=9.65 m1=9.88 mean_mag=9.76 vs. 9.75
m0=9.88 m1=10.12 mean_mag=9.99 vs. 10.0
m0=10.12 m1=10.40 mean_mag=10.23 vs. 10.25
m0=10.40 m1=10.65 mean_mag=10.50 vs. 10.5
m0=10.65 m1=10.88 mean_mag=10.75 vs. 10.75



In [27]:

    
plt.figure(figsize=(10, 14))
for subplot, halfwidth in enumerate([60, 80, 100, 120, 140, 160, 180]):
    plt.subplot(4, 2, subplot + 1)
    ok = (data_all['halfwidth'] > halfwidth - 10) & (data_all['halfwidth'] <= halfwidth + 10)
    plot_fails_mag_aca_vs_t_ccd(mag_bins, data_all[ok])
    plt.title(f'Acq success (blue) fail (orange) box={halfwidth}')
plt.tight_layout()

Color != 1.5 fit (this is MOST acq stars)



In [28]:

    
# fit = fit_sota_model(data_all['color'] == 1.5, ms_disabled=True)
mask_no_1p5 = ((data_all['red_mag_err'] == False) & 
               (data_all['t_ccd'] > -18) &
               (data_all['t_ccd'] < -0.5))



In [29]:

    
mag0s, mag1s = mag_bins[:-1], mag_bins[1:]
fits = {}
masks = []
for m0, m1 in zip(mag0s, mag1s):
    print(m0, m1)
    mask = mask_no_1p5 & mag_filter(m0, m1) #  & t_ccd_filter(-10.5, 0)
    print(np.count_nonzero(mask))
    masks.append(mask)
    fits[m0, m1] = fit_poly_model(data_all[mask])









    



7.2 8.6
16850
WARNING: imaging routines will not be available, 
failed to import sherpa.image.ds9_backend due to 
'RuntimeErr: DS9Win unusable: Could not find ds9 on your PATH'
Dataset               = 1
Method                = neldermead
Statistic             = binom_stat
Initial fit statistic = 594.808
Final fit statistic   = 489.751 at function evaluation 640
Data points           = 16850
Degrees of freedom    = 16846
Change in statistic   = 105.057
   model.p0       -2.82214    
   model.p1       1.89806     
   model.p2       -0.967379   
   model.floor    -2.6        
WARNING: parameter value model.floor is at its minimum boundary -2.6
8.6 9.325
19168
Dataset               = 1
Method                = neldermead
Statistic             = binom_stat
Initial fit statistic = 1917.9
Final fit statistic   = 1742.39 at function evaluation 670
Data points           = 19168
Degrees of freedom    = 19164
Change in statistic   = 175.511
   model.p0       -2.55334    
   model.p1       2.35371     
   model.p2       -0.705211   
   model.floor    -2.14837    
9.325 9.65
10188
Dataset               = 1
Method                = neldermead
Statistic             = binom_stat
Initial fit statistic = 3025.98
Final fit statistic   = 2260.44 at function evaluation 652
Data points           = 10188
Degrees of freedom    = 10184
Change in statistic   = 765.535
   model.p0       -2.26195    
   model.p1       3.87921     
   model.p2       -0.999994   
   model.floor    -1.66518    
9.65 9.875
6953
Dataset               = 1
Method                = neldermead
Statistic             = binom_stat
Initial fit statistic = 3921.12
Final fit statistic   = 2161.78 at function evaluation 598
Data points           = 6953
Degrees of freedom    = 6949
Change in statistic   = 1759.34
   model.p0       -0.784509   
   model.p1       2.28222     
   model.p2       -5.83471e-07
   model.floor    -1.52313    
9.875 10.125
6602
Dataset               = 1
Method                = neldermead
Statistic             = binom_stat
Initial fit statistic = 5693.39
Final fit statistic   = 2489.26 at function evaluation 602
Data points           = 6602
Degrees of freedom    = 6598
Change in statistic   = 3204.13
   model.p0       -0.190854   
   model.p1       1.97771     
   model.p2       -0.248602   
   model.floor    -1.50779    
10.125 10.4
4641
Dataset               = 1
Method                = neldermead
Statistic             = binom_stat
Initial fit statistic = 8210.43
Final fit statistic   = 2053.26 at function evaluation 585
Data points           = 4641
Degrees of freedom    = 4637
Change in statistic   = 6157.17
   model.p0       0.810736    
   model.p1       2.10795     
   model.p2       -0.885583   
   model.floor    -1.29356    
10.4 10.65
2767
Dataset               = 1
Method                = neldermead
Statistic             = binom_stat
Initial fit statistic = 8480.61
Final fit statistic   = 1139.14 at function evaluation 831
Data points           = 2767
Degrees of freedom    = 2763
Change in statistic   = 7341.47
   model.p0       1.48231     
   model.p1       1.95199     
   model.p2       -1          
   model.floor    -0.901686   
10.65 10.875
2034
Dataset               = 1
Method                = neldermead
Statistic             = binom_stat
Initial fit statistic = 7731.07
Final fit statistic   = 548.938 at function evaluation 707
Data points           = 2034
Degrees of freedom    = 2030
Change in statistic   = 7182.13
   model.p0       1.84089     
   model.p1       1.29966     
   model.p2       -0.684655   
   model.floor    -1.4808



In [30]:

    
colors = [f'kC{i}' for i in range(9)]

plt.figure(figsize=(13, 4))
for subplot in (1, 2):
    plt.subplot(1, 2, subplot)
    probit = (subplot == 2)
    for m0_m1, color, mask, mag_mean in zip(list(fits), colors, masks, mag_means):
        fit = fits[m0_m1]
        data = data_all[mask]
        data['model'] = p_acq_fail(data)(fit.parvals)        
        plot_fit_grouped(data, 't_ccd', 2.0, 
                         probit=probit, colors=[color, color], label=str(mag_mean))
    plt.grid()
    if probit:
        plt.ylim(-3.5, 2.5)
    plt.ylabel('Probit(p_fail)' if probit else 'p_fail')
    plt.xlabel('T_ccd');
    plt.legend(fontsize='small')



In [31]:

    
# This computes probabilities for 120 arcsec boxes, corresponding to raw data
t_ccds = np.linspace(-16, -0, 20)
plt.figure(figsize=(13, 4))

for subplot in (1, 2):
    plt.subplot(1, 2, subplot)
    probit = (subplot == 2)
    for m0_m1, color, mag_mean in zip(list(fits), colors, mag_means):
        fit = fits[m0_m1]
        probs = p_fail(fit.parvals, t_ccds)
        if probit:
            probs = stats.norm.ppf(probs)
        plt.plot(t_ccds, probs, label=f'{mag_mean:.2f}')

    plt.legend()
    plt.xlabel('T_ccd')
    plt.ylabel('P_fail' if subplot == 1 else 'Probit(p_fail)')
    plt.title('P_fail for halfwidth=120')
    plt.grid()



In [32]:

    
mag_bin_centers = np.concatenate([[5.0], mag_means, [13.0]])
fit_parvals = []
for fit in fits.values():
    fit_parvals.extend(fit.parvals)

fit_parvals = np.array(fit_parvals).reshape(-1, 4)
parvals_mag12 = [[5, 0, 0, 0]]
parvals_mag5 = [[-5, 0, 0, -3]]
fit_parvals = np.concatenate([parvals_mag5, fit_parvals, parvals_mag12])
fit_parvals = fit_parvals.transpose()
for ps, parname in zip(fit_parvals, fit.parnames):
    plt.plot(mag_bin_centers, ps, '.-', label=parname)

plt.legend(fontsize='small')
plt.title('Model coefficients vs. mag')
plt.xlabel('Mag_aca')
plt.grid()

Define model for color=1.5 stars

Post AGASC 1.7, there is inadequate data to independently perform the binned fitting.
Instead assume a magnitude error distribution which is informed by examining the observed distribution of dmag = mag_obs - mag_aca (observed - catalog). This turns out to be well-represented by an exp(-abs(dmag) / dmag_scale) distribution. This contrasts with a gaussian that scales as exp(dmag^2).
Use the assumed mag error distribution and sample the color != 1.5 star probabilities accordingly and compute the weighted mean failure probability.

Examine distribution of mag error for color=1.5 stars



In [33]:

    
def plot_mag_errs(acqs, red_mag_err):
    ok = ((acqs['red_mag_err'] == red_mag_err) & 
          (acqs['mag_obs'] > 0) & 
          (acqs['img_func'] == 'star'))
    dok = acqs[ok]
    dmag = dok['mag_obs'] - dok['mag_aca']
    plt.figure(figsize=(14, 4.5))
    plt.subplot(1, 3, 1)
    plt.plot(dok['mag_aca'], dmag, '.')
    plt.plot(dok['mag_aca'], dmag, ',', alpha=0.3)
    plt.xlabel('mag_aca (catalog)')
    plt.ylabel('Mag err')
    plt.title('Mag err (observed - catalog) vs mag_aca')
    plt.xlim(5, 11.5)
    plt.ylim(-4, 2)
    plt.grid()
    
    plt.subplot(1, 3, 2)
    plt.hist(dmag, bins=np.arange(-3, 4, 0.2), log=True);
    plt.grid()
    plt.xlabel('Mag err')
    plt.title('Mag err (observed - catalog)')
    plt.xlim(-4, 2)
    
    plt.subplot(1, 3, 3)
    plt.hist(dmag, bins=100, cumulative=-1, normed=True)
    plt.xlim(-1, 1)
    plt.xlabel('Mag err')
    plt.title('Mag err (observed - catalog)')
    plt.grid()



In [34]:

    
plot_mag_errs(acqs, red_mag_err=True)
plt.subplot(1, 3, 2)
plt.plot([-2.8, 0], [1, 7000], 'r');
plt.plot([0, 4.0], [7000, 1], 'r');
plt.xlim(-4, 4);

Define an analytical approximation for distribution with ad-hoc positive tail



In [35]:

    
# Define parameters / metadata for floor model
FLOOR = {'fit_parvals': fit_parvals,
         'mag_bin_centers': mag_bin_centers}



In [36]:

    
def calc_1p5_mag_err_weights():
    x = np.linspace(-2.8, 4, 18)
    ly = 3.8 * (1 - np.abs(x) / np.where(x > 0, 4.0, 2.8))
    y = 10 ** ly
    return x, y / y.sum()



In [37]:

    
FLOOR['mag_errs_1p5'], FLOOR['mag_err_weights_1p5'] = calc_1p5_mag_err_weights()



In [38]:

    
plt.semilogy(FLOOR['mag_errs_1p5'], FLOOR['mag_err_weights_1p5'])
plt.grid()

Global model for arbitrary mag, t_ccd, color, and halfwidth



In [39]:

    
def floor_model_acq_prob(mag, t_ccd, color=0.6, halfwidth=120, probit=False):
    """
    Acquisition probability model

    :param mag: Star magnitude(s)
    :param t_ccd: CCD temperature(s)
    :param color: Star color (compared to 1.5 to decide which p_fail model to use)
    :param halfwidth: Search box size (arcsec)
    :param probit: Return probit of failure probability
    
    :returns: acquisition failure probability
    """

    parvals = FLOOR['fit_parvals']
    mag_bin_centers = FLOOR['mag_bin_centers']
    mag_errs_1p5 = FLOOR['mag_errs_1p5']
    mag_err_weights_1p5 = FLOOR['mag_err_weights_1p5']

    # Make sure inputs have right dimensions
    is_scalar, t_ccds, mags, halfwidths, colors = broadcast_arrays(t_ccd, mag, halfwidth, color)
    box_deltas = get_box_delta(halfwidths) 

    p_fails = []
    for t_ccd, mag, box_delta, color in zip(t_ccds.flat, mags.flat, box_deltas.flat, colors.flat):
        if np.isclose(color, 1.5):
            pars_list = [[np.interp(mag + mag_err_1p5, mag_bin_centers, ps) for ps in parvals]
                         for mag_err_1p5 in mag_errs_1p5]
            weights = mag_err_weights_1p5
            if probit:
                raise ValueError('cannot use probit=True with color=1.5 stars')
        else:
            pars_list = [[np.interp(mag, mag_bin_centers, ps) for ps in parvals]]
            weights = [1]

        pf = sum(weight * p_fail(pars, t_ccd, box_delta=box_delta, probit=probit)
                 for pars, weight in zip(pars_list, weights))
        p_fails.append(pf)
    
    out = np.array(p_fails).reshape(t_ccds.shape)
    return out



In [40]:

    
mags, t_ccds = np.mgrid[8.75:10.75:30j, -16:-4:30j]
plt.figure(figsize=(13, 4))
for subplot, color in enumerate([1.0, 1.5]):
    plt.subplot(1, 2, subplot + 1)
    p_fails = floor_model_acq_prob(mags, t_ccds, probit=False, color=color)

    cs = plt.contour(t_ccds, mags, p_fails, levels=[0.05, 0.1, 0.2, 0.5, 0.75, 0.9], 
                     colors=['g', 'g', 'b', 'c', 'm', 'r'])
    plt.clabel(cs, inline=1, fontsize=10)
    plt.grid()
    plt.xlim(-17, -4)
    plt.ylim(8.5, 11.0)
    plt.xlabel('T_ccd (degC)')
    plt.ylabel('Mag_ACA')
    plt.title(f'Failure probability color={color}');



In [41]:

    
mags = np.linspace(8, 11, 301)
plt.figure()
for t_ccd in np.arange(-16, -0.9, 1):
    p_fails = floor_model_acq_prob(mags, t_ccd, probit=True)
    plt.plot(mags, p_fails)
plt.grid()
plt.xlim(8, 11)









    Out[41]:





(8, 11)

Compare to flight data for halfwidth=120

Selecting only data with halfwidth=120 is a clean, model-independent way to compare the model to raw flight statistics.

Setup functions to get appropriate data



In [42]:

    
# NOTE this is in chandra_aca.star_probs as of version 4.27

from scipy.stats import binom

def binom_ppf(k, n, conf, n_sample=1000):
    """
    Compute percent point function (inverse of CDF) for binomial, where
    the percentage is with respect to the "p" (binomial probability) parameter
    not the "k" parameter.
    
    The following example returns the 1-sigma (0.17 - 0.84) confidence interval
    on the true binomial probability for an experiment with 4 successes in 5 trials.
    
    Example::
    
      >>> binom_ppf(4, 5, [0.17, 0.84])
      array([ 0.55463945,  0.87748177])
      
    :param k: int, number of successes (0 < k <= n)
    :param n: int, number of trials
    :param conf: float, array of floats, percent point values
    :param n_sample: number of PMF samples for interpolation
    
    :return: percent point function values corresponding to ``conf``
    """
    ps = np.linspace(0, 1, n_sample)
    vals = binom.pmf(k=k, n=n, p=ps)
    return np.interp(conf, xp=np.cumsum(vals) / np.sum(vals), fp=ps)



In [43]:

    
binom_ppf(4, 5, [0.17, 0.84])









    Out[43]:





array([ 0.55463945,  0.87748177])



In [44]:

    
n = 156
k = 127
binom_ppf(k, n, [0.17, 0.84])









    Out[44]:





array([ 0.78004095,  0.84058371])



In [45]:

    
def calc_binned_pfail(data_all, mag, dmag, t_ccd, dt, halfwidth=120):
    da = data_all[~data_all['asvt'] & (data_all['halfwidth'] == halfwidth)]
    fail = da['fail'].astype(bool)
    ok = (np.abs(da['mag_aca'] - mag) < dmag) & (np.abs(da['t_ccd'] - t_ccd) < dt)
    n_fail = np.count_nonzero(fail[ok])
    n_acq = np.count_nonzero(ok)
    p_fail = n_fail / n_acq
    p_fail_lower, p_fail_upper = binom_ppf(n_fail, n_acq, [0.17, 0.84])
    mean_t_ccd = np.mean(da['t_ccd'][ok])
    mean_mag = np.mean(da['mag_aca'][ok])
    return p_fail, p_fail_lower, p_fail_upper, mean_t_ccd, mean_mag, n_fail, n_acq



In [46]:

    
halfwidth = 120
pfs_list = []
for mag in (10.0, 10.3, 10.55):
    pfs = []
    for t_ccd in np.linspace(-15, -10, 6):
        pf = calc_binned_pfail(data_all, mag, 0.2, t_ccd, 0.5, halfwidth=halfwidth)
        pfs.append(pf)
        print(f'mag={mag} mean_mag_aca={pf[4]:.2f} t_ccd=f{pf[3]:.2f} p_fail={pf[-2]}/{pf[-1]}={pf[0]:.2f}')
    pfs_list.append(pfs)









    



mag=10.0 mean_mag_aca=10.00 t_ccd=f-14.92 p_fail=72/1056=0.07
mag=10.0 mean_mag_aca=10.00 t_ccd=f-14.08 p_fail=83/1131=0.07
mag=10.0 mean_mag_aca=10.01 t_ccd=f-13.00 p_fail=120/1248=0.10
mag=10.0 mean_mag_aca=10.02 t_ccd=f-11.93 p_fail=87/671=0.13
mag=10.0 mean_mag_aca=10.03 t_ccd=f-11.05 p_fail=63/333=0.19
mag=10.0 mean_mag_aca=10.02 t_ccd=f-10.11 p_fail=48/170=0.28
mag=10.3 mean_mag_aca=10.26 t_ccd=f-14.92 p_fail=74/612=0.12
mag=10.3 mean_mag_aca=10.26 t_ccd=f-14.09 p_fail=84/642=0.13
mag=10.3 mean_mag_aca=10.26 t_ccd=f-13.00 p_fail=163/745=0.22
mag=10.3 mean_mag_aca=10.26 t_ccd=f-11.93 p_fail=144/440=0.33
mag=10.3 mean_mag_aca=10.25 t_ccd=f-11.08 p_fail=76/182=0.42
mag=10.3 mean_mag_aca=10.26 t_ccd=f-10.08 p_fail=59/106=0.56
mag=10.55 mean_mag_aca=10.50 t_ccd=f-14.93 p_fail=53/223=0.24
mag=10.55 mean_mag_aca=10.49 t_ccd=f-14.08 p_fail=61/231=0.26
mag=10.55 mean_mag_aca=10.50 t_ccd=f-12.98 p_fail=132/292=0.45
mag=10.55 mean_mag_aca=10.50 t_ccd=f-11.96 p_fail=96/185=0.52
mag=10.55 mean_mag_aca=10.47 t_ccd=f-11.11 p_fail=43/60=0.72
mag=10.55 mean_mag_aca=10.44 t_ccd=f-10.11 p_fail=16/27=0.59

Compare model to flight for color != 1.5 stars



In [47]:

    
def plot_floor_and_flight(color, halfwidth=120):

    # This computes probabilities for 120 arcsec boxes, corresponding to raw data
    t_ccds = np.linspace(-16, -6, 20)
    mag_acas = np.array([9.5, 10.0, 10.25, 10.5, 10.75])

    for ii, mag_aca in enumerate(reversed(mag_acas)):
        flight_probs = 1 - acq_success_prob(date='2018-05-01T00:00:00', 
                                            t_ccd=t_ccds, mag=mag_aca, color=color, halfwidth=halfwidth)
        new_probs = floor_model_acq_prob(mag_aca, t_ccds, color=color, halfwidth=halfwidth)
        plt.plot(t_ccds, flight_probs, '--', color=f'C{ii}')
        plt.plot(t_ccds, new_probs, '-', color=f'C{ii}', label=f'mag_aca={mag_aca}')

    if color != 1.5:
        # pf1, pf2 have p_fail, p_fail_lower, p_fail_upper, mean_t_ccd, mean_mag_aca, n_fail, n_acq
        for pfs, clr in zip(pfs_list, ('C3', 'C2', 'C1')):
            for pf in pfs:
                yerr = np.array([pf[0] - pf[1], pf[2] - pf[0]]).reshape(2, 1)
                plt.errorbar(pf[3], pf[0], xerr=0.5, yerr=yerr, color=clr)

    # plt.xlim(-16, None)
    plt.legend()
    plt.xlabel('T_ccd')
    plt.ylabel('P_fail')
    plt.title(f'P_fail (color={color}: new (solid) and flight (dashed)')
    plt.grid()



In [48]:

    
plot_floor_and_flight(color=1.0)

Compare model to flight for color = 1.5 stars



In [49]:

    
plt.figure(figsize=(13, 4))
plt.subplot(1, 2, 1)
for m0, m1, color in [(9, 9.5, 'C0'), (9.5, 10, 'C1'), (10, 10.3, 'C2'), (10.3, 10.7, 'C3')]:
    ok = data_all['red_mag_err'] & mag_filter(m0, m1) & t_ccd_filter(-16, -10)
    data = data_all[ok]
    data['model'] = floor_model_acq_prob(data['mag_aca'], data['t_ccd'], color=1.5, halfwidth=data['halfwidth'])
    plot_fit_grouped(data, 't_ccd', 2.0, 
                     probit=False, colors=[color, color], label=f'{m0}-{m1}')
plt.ylim(0, 1.0)
plt.legend(fontsize='small')
plt.grid()
plt.xlabel('T_ccd')
plt.title('COLOR1=1.5 acquisition probabilities')

plt.subplot(1, 2, 2)
plot_floor_and_flight(color=1.5)

Write model as a 3-d grid to a gzipped FITS file



In [50]:

    
def write_model_as_fits(model_name,
                        comment=None,
                        mag0=5, mag1=12, n_mag=141,  # 0.05 mag spacing
                        t_ccd0=-16, t_ccd1=-1, n_t_ccd=31,  # 0.5 degC spacing
                        halfw0=60, halfw1=180, n_halfw=7,  # 20 arcsec spacing
                        ):
    from astropy.io import fits
    
    mags = np.linspace(mag0, mag1, n_mag)
    t_ccds = np.linspace(t_ccd0, t_ccd1, n_t_ccd)
    halfws = np.linspace(halfw0, halfw1, n_halfw)
    mag, t_ccd, halfw = np.meshgrid(mags, t_ccds, halfws, indexing='ij')

    print('Computing probs, stand by...')
    
    # COLOR = 1.5 (stars with poor mag estimates)
    p_fails = floor_model_acq_prob(mag, t_ccd, halfwidth=halfw, probit=False, color=1.5)
    p_fails_probit_1p5 = stats.norm.ppf(p_fails)

    # COLOR not 1.5 (most stars)
    p_fails_probit = floor_model_acq_prob(mag, t_ccd, halfwidth=halfw, probit=True, color=1.0)
    
    hdu = fits.PrimaryHDU()
    if comment:
        hdu.header['comment'] = comment
    hdu.header['date'] = DateTime().fits
    hdu.header['mdl_name'] = model_name
    hdu.header['mag_lo'] = mags[0]
    hdu.header['mag_hi'] = mags[-1]
    hdu.header['mag_n'] = len(mags)
    hdu.header['t_ccd_lo'] = t_ccds[0]
    hdu.header['t_ccd_hi'] = t_ccds[-1]
    hdu.header['t_ccd_n'] = len(t_ccds)
    hdu.header['halfw_lo'] = halfws[0]
    hdu.header['halfw_hi'] = halfws[-1]
    hdu.header['halfw_n'] = len(halfws)

    hdu1 = fits.ImageHDU(p_fails_probit.astype(np.float32))
    hdu1.header['comment'] = 'COLOR1 != 1.5 (good mag estimates)'
    
    hdu2 = fits.ImageHDU(p_fails_probit_1p5.astype(np.float32))
    hdu2.header['comment'] = 'COLOR1 == 1.5 (poor mag estimates)'

    hdus = fits.HDUList([hdu, hdu1, hdu2])
    hdus.writeto(f'{model_name}.fits.gz', overwrite=True)



In [51]:

    
comment = f'Created with fit_acq_model-{MODEL_DATE}-binned-poly-binom-floor.ipynb in aca_stats repository'
write_model_as_fits(MODEL_NAME, comment=comment)









    



Computing probs, stand by...



In [52]:

    
# Fudge the chandra_aca.star_probs global STAR_PROBS_DATA_DIR temporarily
# in order to load the dev model that was just created locally
_dir_orig = star_probs.STAR_PROBS_DATA_DIR
star_probs.STAR_PROBS_DATA_DIR = '.'
grid_model_acq_prob(model=MODEL_NAME)
star_probs.STAR_PROBS_DATA_DIR = _dir_orig



In [53]:

    
# Remake standard plot comparing grouped data to model, but now use
# chandra_aca.star_probs grid_model_acq_prob function with the newly
# generated 3-d FITS model that we just loaded.

colors = [f'kC{i}' for i in range(9)]

plt.figure(figsize=(13, 4))
for subplot in (1, 2):
    plt.subplot(1, 2, subplot)
    probit = (subplot == 2)
    for m0_m1, color, mask, mag_mean in zip(list(fits), colors, masks, mag_means):
        fit = fits[m0_m1]
        data = data_all[mask]
        data['model'] = 1 - grid_model_acq_prob(data['mag_aca'], data['t_ccd'],
                                                halfwidth=data['halfwidth'],
                                                model=MODEL_NAME)
        plot_fit_grouped(data, 't_ccd', 2.0, 
                         probit=probit, colors=[color, color], label=str(mag_mean))
    plt.grid()
    if probit:
        plt.ylim(-3.5, 2.5)
    plt.ylabel('Probit(p_fail)' if probit else 'p_fail')
    plt.xlabel('T_ccd');
    plt.legend(fontsize='small')









    



/Users/aldcroft/miniconda3/envs/ska3/lib/python3.6/site-packages/chandra_aca/star_probs.py:298: UserWarning: 
Model grid-floor-2019-08 computed between t_ccd <= -16.0 <= -1.0, clipping input t_ccd(s) outside that range.
  .format(model, name, val_lo, val_hi, name))
/Users/aldcroft/miniconda3/envs/ska3/lib/python3.6/site-packages/chandra_aca/star_probs.py:298: UserWarning: 
Model grid-floor-2019-08 computed between halfw <= 60.0 <= 180.0, clipping input halfw(s) outside that range.
  .format(model, name, val_lo, val_hi, name))



In [54]:

    
# Check chandra_aca implementation vs. native model from this notebook
mags = np.linspace(5, 12, 40)
t_ccds = np.linspace(-16, -1, 40)
halfws = np.linspace(60, 180, 7)
mag, t_ccd, halfw = np.meshgrid(mags, t_ccds, halfws, indexing='ij')

# First color != 1.5
#  Notebook
nb_probs = floor_model_acq_prob(mag, t_ccd, halfwidth=halfw, probit=True, color=1.0)
#  Chandra_aca.  Note that grid_model returns p_success, so need to negate it.
ca_probs = -grid_model_acq_prob(mag, t_ccd, halfwidth=halfw, probit=True, color=1.0, 
                                model=MODEL_NAME)

assert nb_probs.shape == ca_probs.shape
print('Max difference is {:.3f}'.format(np.max(np.abs(nb_probs - ca_probs))))
assert np.allclose(nb_probs, ca_probs, rtol=0, atol=0.1)









    



Max difference is 0.067



In [55]:

    
d_probs = (nb_probs - ca_probs)[:, :, 3]
plt.imshow(d_probs, origin='lower', extent=[-16, -1, 5, 12], aspect='auto', cmap='jet')
plt.colorbar();
plt.title('Delta between probit p_fail: analytical vs. gridded');



In [56]:

    
mags = np.linspace(8, 11, 200)
plt.figure()
for ii, t_ccd in enumerate(np.arange(-16, -0.9, 2)):
    p_fails = floor_model_acq_prob(mags, t_ccd, probit=True)
    plt.plot(mags, p_fails, color=f'C{ii}')
    p_success = grid_model_acq_prob(mags, t_ccd, probit=True, model=MODEL_NAME)
    plt.plot(mags, -p_success, color=f'C{ii}')
plt.grid()
plt.xlim(8, 11)









    Out[56]:





(8, 11)

Generate regression data for chandra_aca

The real testing is done here with a copy of the functions from chandra_aca, but now generate some regression test data as a smoke test that things are working on all platforms.



In [57]:

    
mags = [9, 9.5, 10.5]
t_ccds = [-10, -5]
halfws = [60, 120, 160]
mag, t_ccd, halfw = np.meshgrid(mags, t_ccds, halfws, indexing='ij')

probs = floor_model_acq_prob(mag, t_ccd, halfwidth=halfw, probit=True, color=1.0)
print(repr(probs.round(3).flatten()))

probs = floor_model_acq_prob(mag, t_ccd, halfwidth=halfw, probit=False, color=1.5)
probs = stats.norm.ppf(probs)
print(repr(probs.round(3).flatten()))









    



array([-2.148, -2.148, -2.148, -2.148, -1.77 , -1.483, -1.665, -1.665,
       -1.665, -1.503, -0.948, -0.661,  0.377,  0.932,  1.218,  1.519,
        2.074,  2.36 ])
array([-1.623, -1.506, -1.433, -1.301, -1.038, -0.869, -1.155, -0.965,
       -0.867, -0.697, -0.385, -0.207,  0.369,  0.741,  0.92 ,  1.121,
        1.466,  1.63 ])