Chandra X-Ray Observatory ACA Statistics Project

Here's a basic idea of the problem, and a little bit of the data exploration we've done to explore the problem.


In [14]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import bayesRegress as bR
from sklearn import linear_model
import astropy
import functions as f
import statsmodels.api as sm

from numpy import genfromtxt
from astropy.time import Time

import warnings
warnings.filterwarnings('ignore')

Main Idea

  • The Chandra X-Ray Observatory was designed as a 5 year mission and this past July it celebrated its 15th year of observation.
  • This continued success doesn't come without cost. Engineers are constantly trying to come up with better ways to model the behavior of certain elements of the spacecraft to ensure future success.
  • One of the key science properties of Chandra is its sub-arcsecond imaging resolution.
  • This resolution is dependent on a "star tracking camera", the aspect camera assembly (ACA) which has seen degraded performance through the years.
  • This performance has been linked to magnitude of the stars it is observing as well as the estimated warm pixel fraction. They currently use a model utilizing these two factors only.

Goals of the Project

  • Main Goal: Create a formal model to accurately predict the "risk" of a safing action for a planned observation
    • Create a model that accurately estimates the probability that a given star will succeed or fail in its acquisition.
      • Include posterior predictive distribution for a given star
    • Integrate prior knowledge of a particular acquisition star with the ensemble properties of "similar" stars
    • Extend point predictions and intervals for individuals stars to entire star catalogs. For use in the star catalog review process.
    • Develop production code for integration into the current review process.
  • Mainly concerned with models incorporating Mag & Warm Pixel Fraction
  • Note: Mission Planning Constraints currently limit planning between magnitude 6 and 10.6 stars unless otherwise approved by an engineer. These should be the major regions where the model fits

Definitions & Covariates

  • Magnitude, Mag - Magnitude of the star being observed by the star tracker, nominally between 6.0-11.0
  • Warm Pixel Fraction - Proxy that captures the non-uniformity of pixel brightness and ionizing radiation damage to the ACA CCD. Warm Pixel Fraction is the fraction (~5%) of pixels above a "faint" star, where a faint star is defined as a mag 10.6 star. Note: Early in the mission it was possible to control warm pixel fraction with thermal systems. The thermal controls have since been saturated.
  • Many other covariates included in the dataset: Color of star, y-angle and z-angle on CCD (spatial characteristics), Julian Date of observation (temporal changes), etc.

Methods Planned/Attempted

  • Probit/Logistic Regression to obtain Maximum Likelihood Estimates using IRLS
    • Statsmodels software in Python. Gives similar results as R.
  • Bayesian Probit Regression from a 2006 paper by Holmes and Held.
    • Coded my own function for this.
    • Runtime may become prohibitive. Dataset is ~160,000 pts...
      • ~15,000 data points (1 years data) takes about 25 mins to run for 3000 iterations

Bayesian Software Attempted

  • PyStan - The Stan Bayesian Sampler. Took just as long as my script. Sometimes it would crash my laptop. Not completely familiar with their language to know if it was my fault or theirs.
  • PyMC - A python based MCMC sampler. Wouldn't converge for Logistic Regression, and took just as long as PyStan and my function. Some of their functions are still in beta and not quite ready for production, so this isn't an option.
  • R's bayesglm - Stable and quick. Only an approximate Bayesian Solution. Currently I'm not sure how to estimate priors for my coefficients that would make them anything other than MLEs, so I'm not sure this is an issue at the moment. Other problem is the desire to have the functions written in Python for ease of integration into the current workflow.

Data Visualized

Please ignore the code and look at the pretty pictures...

Joint Distribution of Warm Pixel Fraction and Star Magnitude.

Distribution of Failures in Red, Distribution of Successes in Blue


In [15]:
#Setup
#Loading data from file
acq_data = np.load('data/acq_table.npy')

#Adding fields required for analysis
acq_data = f.add_column(acq_data, 'tstart_jyear' , np.zeros(len(acq_data)))
acq_data = f.add_column(acq_data, 'tstart_quarter' , np.zeros(len(acq_data)))
acq_data = f.add_column(acq_data, 'mag_floor' , np.zeros(len(acq_data)))
acq_data = f.add_column(acq_data, 'year' , np.zeros(len(acq_data)))
acq_data = f.add_column(acq_data, 'success' , np.zeros(len(acq_data)))
acq_data['tstart_jyear'] = Time(acq_data['tstart'], format='cxcsec').jyear
acq_data['year'] = np.floor(acq_data.tstart_jyear)
acq_data['mag_floor'] = np.floor(acq_data['mag'])
acq_data['success'][acq_data['obc_id']=="NOID"] = 1.
# acq_data['tstart_quarter'] = f.quarter_bin(acq['tstart_jyear'])

for acq in acq_data:
    acq.tstart_quarter = f.quarter_bin(acq.tstart_jyear)

In [16]:
#Subsetting the data to a particular year
yr2013 = f.smlset(acq_data, 'year', 2013)
yr2013_success = f.smlset(yr2013, 'obc_id', 'ID')
yr2013_fail = f.smlset(yr2013, 'obc_id', 'NOID')

#Subsetting the data to a particular year
yr2005 = f.smlset(acq_data, 'year', 2005)
yr2005_success = f.smlset(yr2005, 'obc_id', 'ID')
yr2005_fail = f.smlset(yr2005, 'obc_id', 'NOID')

In [17]:
F = plt.figure()
p1, = plt.plot(yr2005_success['mag'], yr2005_success['warm_pix'], marker='.', linestyle='')
p2, = plt.plot(yr2005_fail['mag'], yr2005_fail['warm_pix'], marker='x', linestyle='', color='red')
plt.legend([p1,p2],['Successes', "Failures"])
plt.title('Star Acquisitions for the Year 2005')
plt.xlabel('Star Magnitude')
plt.ylabel('Warm Pixel Fraction')
plt.ylim((.04,0.18))
plt.xlim((5.5,11.5))
F.set_size_inches(10,6)
plt.show()



In [18]:
F = plt.figure()
p1, = plt.plot(yr2013_success['mag'], yr2013_success['warm_pix'], marker='.', linestyle='')
p2, = plt.plot(yr2013_fail['mag'], yr2013_fail['warm_pix'], marker='x', linestyle='', color='red')
plt.legend([p1,p2],['Successes', "Failures"])
plt.title('Star Acquisitions for the Year 2013')
plt.xlabel('Star Magnitude')
plt.ylabel('Warm Pixel Fraction')
plt.ylim((.04,0.18))
plt.xlim((5.5,11.5))
F.set_size_inches(10,6)
plt.show()



In [19]:
#Subsetting the data by the floor of star magnitude
mag8 = f.smlset(acq_data, 'mag_floor', 8.0)
mag9 = f.smlset(acq_data, 'mag_floor', 9.0)
mag10 = f.smlset(acq_data, 'mag_floor', 10.0)

def fails_by_quarter(arr):
    quarters = np.unique(arr['tstart_quarter'])
    obs_counts = []
    failure_counts = []
    warm_pix_avgs = []
    for q in quarters:
        obs_inquarter = f.smlset(arr, 'tstart_quarter', q)
        failures = len(f.smlset(obs_inquarter, 'obc_id', 'NOID'))
        warm_pix = np.mean(obs_inquarter['warm_pix'])
        counts = len(obs_inquarter)
        failure_counts.append(float(failures))
        warm_pix_avgs.append(float(warm_pix))
        obs_counts.append(float(counts))
    failure_rate = np.array(failure_counts) / np.array(obs_counts)
    return [quarters, failure_rate, warm_pix_avgs]

mag8_byquarter = fails_by_quarter(mag8)
mag9_byquarter = fails_by_quarter(mag9)
mag10_byquarter = fails_by_quarter(mag10)

def plot_failures(out, title_name, fname):
    F, ax1 = plt.subplots()
    ax1.plot(out[0],out[1], marker='o', linestyle="")
    ax1.set_xlabel('Quarter')
    ax1.set_ylabel('Acq Fail Rate (%)')
    ax1.set_title(title_name)
    
    ax2 = ax1.twinx()
    ax2.plot(out[0],out[2], marker='', linestyle="-",color="r")
    ax2.set_ylabel('N100 Warm Pixel CCD Fraction', color="r")

    F.set_size_inches(8,4)
    plt.show()

Warm Pixel Fraction and Proportion of Stars Failed

9 Magnitude Stars


In [20]:
plot_failures(mag9_byquarter, '9th Magnitude Stars', 'plots/mag9_byquarter.pdf')


10 Magnitude Stars


In [21]:
plot_failures(mag10_byquarter, '10th Magnitude Stars',  'plots/mag10_byquarter.pdf')


Distributions of Successes and Failures across Star Magnitude

2005


In [22]:
nbins = np.arange(5.5,11.75,0.25)

F = plt.figure()
p1 = plt.hist(yr2005_success['mag'], bins=nbins, color='b', alpha=0.5, label="Successes")
p2 = plt.hist(yr2005_fail['mag'], bins=nbins, color='r', alpha=0.5, label="Failures")
lgd = plt.legend()
F.set_size_inches(10,6)
plt.xlim((5.5,11.5))
plt.xlabel('Star Magnitude')
plt.ylabel('Frequency')
plt.title('Star Acquisitions for the Year 2005')
plt.show()


2013


In [23]:
nbins = np.arange(6,11.75,0.25)

F = plt.figure()
p1 = plt.hist(yr2013_success['mag'], bins=nbins, color='b', alpha=0.5, label="Successes")
p2 = plt.hist(yr2013_fail['mag'], bins=nbins, color='r', alpha=0.5, label="Failures")
lgd = plt.legend()
F.set_size_inches(10,6)
plt.xlim((6.,11.))
plt.xlabel('Star Magnitude')
plt.ylabel('Frequency')
plt.title('Star Acquisitions for the Year 2013')
plt.show()


Spatial Distribution of Failures

No apparent spatial distribution of the failures


In [24]:
def plot_failedheat(subset, fname):
    F = plt.figure()
    heatmap, xedges, yedges = np.histogram2d(subset.yang,subset.zang, bins=100)
    extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
    plt.imshow(heatmap, extent=extent)
    F.set_size_inches(8,8)
    plt.colorbar()
    plt.show()
plot_failedheat(acq_data[acq_data['obc_id']=="NOID"], 'plots/failedheat.pdf')


Current Model

  • Binary Probit Regression
    • Second order polynomial for Magnitude
    • Logrithmic Transformation of Warm Pixel Fraction
    • Interaction between Warm Pixel Fraction and Magnitude
    • Time and Trigonometric Transformations for Time (tstart_jyear)

Output from R... Python wasn't agreeing with me

14 Years worth of data... To predict 6 months...

> summary(probit_fit)

Call:
glm(formula = failure ~ poly(mag, 2) + poly(mag, 2) * log(warm_pix) + 
    sin(tstart_jyear) + cos(tstart_jyear) + tstart_jyear, family = binomial(link = "probit"), 
    data = trainingdata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6948  -0.2392  -0.1441  -0.0996   3.4847  

Coefficients:
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                  24.536851   6.019266   4.076 4.57e-05 ***
poly(mag, 2)1               487.577158  24.299838  20.065  < 2e-16 ***
poly(mag, 2)2               194.598080  19.850342   9.803  < 2e-16 ***
log(warm_pix)                 0.392143   0.035133  11.162  < 2e-16 ***
sin(tstart_jyear)            -0.057573   0.010216  -5.635 1.75e-08 ***
cos(tstart_jyear)            -0.064687   0.010073  -6.422 1.35e-10 ***
tstart_jyear                 -0.012751   0.002962  -4.304 1.68e-05 ***
poly(mag, 2)1:log(warm_pix) 120.816454   8.624619  14.008  < 2e-16 ***
poly(mag, 2)2:log(warm_pix)  21.937600   7.095100   3.092  0.00199 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 46819  on 166372  degrees of freedom
Residual deviance: 37458  on 166364  degrees of freedom
AIC: 37476

Number of Fisher Scoring iterations: 7

Current Model Fits

Model was fit with data from 2000 - 2014. The black line is observed data from 2014 - 2014.50. Red Line is the model with 95% confidence interval for the estimates. Green Line is the current model that the Chandra team is using.


In [25]:
from IPython.display import Image
Image(filename="/Users/bvegetabile/git/aca_stats/2d_contour_overlay.png")


Out[25]:

In [26]:
Image(filename="/Users/bvegetabile/git/aca_stats/modelfit_wpf_9.png")


Out[26]:

In [27]:
Image(filename="/Users/bvegetabile/git/aca_stats/modelfit_wpf_10.png")


Out[27]:

In [28]:
Image(filename="/Users/bvegetabile/git/aca_stats/modelfit_wpf_11.png")


Out[28]:

In [29]:
Image(filename="/Users/bvegetabile/git/aca_stats/modelfit_wpf_12.png")


Out[29]:

In [30]:
Image(filename="/Users/bvegetabile/git/aca_stats/modelfit_wpf_13.png")


Out[30]:

In [31]:
Image(filename="/Users/bvegetabile/git/aca_stats/modelfit_wpf_14.png")


Out[31]:

In [32]:
Image(filename="/Users/bvegetabile/git/aca_stats/modelfit_wpf_15.png")


Out[32]:

In [33]:
Image(filename="/Users/bvegetabile/git/aca_stats/modelfit_wpf_16.png")


Out[33]: