Milan Williams and Phil Marshall, June 2016
In this notebook we work through the PyCS
"demo1" tutorial, to show how the PyCS
package enables the estimation of a lens time delay from example light curve data. The original tutorial is in the form of a set of 6 scripts, that can be viewed on the PyCS
website here. The demo1 code itself can be browsed in the PyCS
GitHub repository here.
The "demo1" tutorial uses a 4-image light curve dataset that comes with the PyCS
repository. Let's download this and use PyCS
to analyze it. If you haven't yet followed the SLTimer
installation instructions you should do that before attempting to import pycs
.
In [1]:
import os, urllib
from __future__ import print_function
import pycs
%matplotlib inline
We need to grab rdbfile
(the demo1 dataset) from webdir
(the appropriate PyCS
GitHub folder). We only need to download rdbfile
if it doesn't already exist.
In [2]:
webdir = 'https://raw.githubusercontent.com/COSMOGRAIL/PyCS/master/demo/demo1/data/'
rdbfile = 'trialcurves.txt'
url = os.path.join(webdir, rdbfile)
if not os.path.isfile(rdbfile):
urllib.urlretrieve(url, rdbfile)
!wc -l $rdbfile
First lets read in the data from the rdbfile, in this case from a simple text file with a one-line header. (Other formats are supported as well.)
In [3]:
lcs = [
pycs.gen.lc.rdbimport(rdbfile, 'A', 'mag_A', 'magerr_A', "Trial"),
pycs.gen.lc.rdbimport(rdbfile, 'B', 'mag_B', 'magerr_B', "Trial"),
pycs.gen.lc.rdbimport(rdbfile, 'C', 'mag_C', 'magerr_C', "Trial"),
pycs.gen.lc.rdbimport(rdbfile, 'D', 'mag_D', 'magerr_D', "Trial")
]
We'll want to plot each light curve with a different color:
In [4]:
pycs.gen.mrg.colourise(lcs)
If we shift the data by the "true" time shifts, the light curves will line up nicely for display purposes. We will infer the time delays from the data later in this notebook.
In [5]:
lcs[1].shifttime(-5.0)
lcs[2].shifttime(-20.0)
lcs[3].shifttime(-70.0)
Now to display our plot!
In [6]:
pycs.gen.lc.display(lcs, figsize=(20, 7), jdrange=(53900, 55500))
To save this figure to a file:
In [7]:
pycs.gen.lc.display(lcs, filename="fig_trialcurves_true-shifted.pdf")
It's useful to store the lightcurve objects in a pickle for easy re-starting - here's the function for doing that:
In [8]:
pycs.gen.util.writepickle(lcs, "trialcurves_true-shifted.pkl")
In further scripts, you can now import the data by reading this file.
In [9]:
lcs = pycs.gen.util.readpickle("trialcurves_true-shifted.pkl")
We will now undo these shifts, and from now on we will "forget" about the true delays.
In [10]:
for l in lcs:
l.resetshifts()
You can do a variety of things with this file to find out more information. For example, the longinfo()
method will provide you with the number of points, gap length, shifts, median, mean, maximum, minimum, and the colour it is plotted in.
In [11]:
for l in lcs: print(l.longinfo())
Another thing we can do is export the data into a text file called "out_trialcurves.txt". In this case, since we have not altered the original data, this file will contain the same information as trialcurves.txt.
In [12]:
pycs.gen.util.multilcsexport(lcs, "out_trialcurves.txt", separator="\t", verbose=True, properties=None)
Let's now fit a spline model to our lightcurve data. At first we'll ignore microlensing and just set up a single spline curve to capture the intrinsic AGN variability. Then we'll add in additional curves to model the microlensing, and refine the fit.
PyCS
contains library functions for optimizing such a model, but the optimization has to be carried out according to a user-defined schedule. Shown here is a simple attempt to get a multi-purpose schedule for optimizing a free-knot spline model: we optimize the spline three times, twice roughly with two different knotsteps, and then once to fine-tune the parameters.
In [13]:
def spl(lcs):
spline = pycs.spl.topopt.opt_rough(lcs, nit=5, knotstep=50)
for l in lcs:
l.resetml()
spline = pycs.spl.topopt.opt_rough(lcs, nit=5, knotstep=30)
spline = pycs.spl.topopt.opt_fine(lcs, nit=10, knotstep=20)
return spline
This optimization function (and each subroutine it calls) returns a spline
object.
In [14]:
spline = spl(lcs)
Now let's print out the measured time delays, computed from the current time shifts of each curve. Note that the lcs
objects now contain shifts provided to them by the spline
model.
In [15]:
basic_time_delays = pycs.gen.lc.getnicetimedelays(lcs, separator="\n", sorted=True)
print("Time Delays (no microlensing):")
print(basic_time_delays)
We can now redisplay our light curve plot, but with the spline model overlaid and the inferred shifts applied. Remember, it won't look very good, as the curves do not overlap without a microlensing model.
In [16]:
pycs.gen.lc.display(lcs, [spline], knotsize=0.01, figsize=(20, 7), jdrange=(53900, 55500))
In [17]:
pycs.gen.polyml.addtolc(lcs[1], nparams=2, autoseasonsgap=600.0)
pycs.gen.polyml.addtolc(lcs[2], nparams=3, autoseasonsgap=600.0)
pycs.gen.polyml.addtolc(lcs[3], nparams=3, autoseasonsgap=600.0)
Now, let's try the model optimization again. The result should be much better!
In [18]:
spline = spl(lcs)
In [19]:
pycs.gen.lc.display(lcs, [spline], knotsize=0.01, figsize=(20, 7), jdrange=(53900, 55500))
The new time delays should show the difference once microlensing is factored in. We'll compare our previous time delay output to this output.
In [20]:
polynomial_microlensing_time_delays = pycs.gen.lc.getnicetimedelays(lcs, separator="\n", sorted=True)
print("Time Delays (microlensing included, with polynomials):")
print(polynomial_microlensing_time_delays)
print("cf. Time Delays (no microlensing):")
print(basic_time_delays)
In [21]:
pycs.gen.splml.addtolc(lcs[0], knotstep=150)
pycs.gen.splml.addtolc(lcs[1], knotstep=150)
pycs.gen.splml.addtolc(lcs[2], knotstep=150)
pycs.gen.splml.addtolc(lcs[3], knotstep=150)
Let's re-optimize and see what happens to the time delays:
In [22]:
spline = spl(lcs)
In [23]:
spline_microlensing_time_delays = pycs.gen.lc.getnicetimedelays(lcs, separator="\n", sorted=True)
print("Time Delays (microlensing included, with splines):")
print(spline_microlensing_time_delays)
print("cf. Time Delays (microlensing included, with polynomials):")
print(polynomial_microlensing_time_delays)
We see differences, but the results from the two types of microlensing model are much closer together than either of them are with the time delays from the "no microlensing" model.
What do these spline microlensing models look like?
In [24]:
pycs.gen.lc.display(lcs, [spline], knotsize=0.01, figsize=(20, 7), jdrange=(53900, 55500))
Let's save this to a PDF figure:
In [25]:
pycs.gen.lc.display(lcs, [spline], knotsize=0.01, figsize=(20, 7), jdrange=(53900, 55500),filename="fig_modelfit.pdf")
Now that we have a well-optimized model, lets save it and the shifted light curves to a pickle file that we can use later.
In [26]:
pycs.gen.util.writepickle((lcs, spline), "optspline.pkl")
Error estimation with PyCS
is performed by re-sampling the data, re-optimizing the model, and accumulating statistics about the resulting time delays.
To evaluate the intrinsic variance of the optimizer, we make Ncopies
copies of the dataset, and re-optimize the model for each one. To probe the width of the likelihood function itself, we generate and use Nmocks
synthetic datasets, with slightly different time delays. For the latter step, we need to read in an optimized model and its shifted light curves. In each case we'll need to collect together the many optimized light curves' time delays at the end, and compute some statistics.
First, though, we need to clear out any previous simulations we did, because PyCS
works on all files in a simulation folder, and we don't want to mix up any old files with new ones.
In [27]:
!\rm -rfv sims_copies sims_mocks
!\rm -rfv sims_copies_opt_spl sims_copies_opt_disp sims_copies_opt_regdiff
!\rm -rfv sims_mocks_opt_spl sims_mocks_opt_disp sims_mocks_opt_regdiff
In [28]:
n, npkl = 1, 4
Ncopies = n*npkl
print("Making",Ncopies,"copies of the original dataset:")
pycs.sim.draw.multidraw(lcs, onlycopy=True, n=n, npkl=npkl, simset="copies")
Now the synthetic light curves:
In [29]:
(modellcs, modelspline) = pycs.gen.util.readpickle("optspline.pkl")
We want each synthetic light curve to have slightly different underlying microlensing signals as well as a slightly different set of time delays. This is achieved via a set of "tweak" functions, that implement some small scale extrinsic variability when we generate the synthetic curves. Note that the control parameters beta
, sigma
, and fmin
have to be asserted, these values can be changed.
In [30]:
def Atweakml(lcs):
return pycs.sim.twk.tweakml(lcs, beta=-1.5, sigma=0.25, fmin=1/500.0, fmax=None, psplot=False)
def Btweakml(lcs):
return pycs.sim.twk.tweakml(lcs, beta=-1.0, sigma=0.9, fmin=1/500.0, fmax=None, psplot=False)
def Ctweakml(lcs):
return pycs.sim.twk.tweakml(lcs, beta=-1.0, sigma=1.5, fmin=1/500.0, fmax=None, psplot=False)
def Dtweakml(lcs):
return pycs.sim.twk.tweakml(lcs, beta=-0.0, sigma=4.5, fmin=1/500.0, fmax=None, psplot=False)
At this place, only if you know what you are doing, you can manually adjust the microlensing or the delays. If not, run this script. This will run n
simulations for each element in the pickle file.
In [31]:
n, npkl = 1, 4
Nmocks = n*npkl
truetsr = 8.0
print("Making",Nmocks,"synthetic datasets, varying time delays by +/-",truetsr/2.0,"days")
pycs.sim.draw.saveresiduals(modellcs, modelspline)
pycs.sim.draw.multidraw(modellcs, modelspline, n=n, npkl=npkl, simset="mocks",
truetsr=truetsr, tweakml=[Atweakml, Btweakml, Ctweakml, Dtweakml])
Note that we chose the name of our set of synthetic light curves "1Kset1" via the simset
kwarg. "truetsr=8.0" means that the synthetic curves will get random true time shifts in a range of about 8.0 days around the time shifts of the model lcs
objects.
In [32]:
pycs.sim.run.multirun("copies", lcs, spl, optset="spl", tsrand=10.0, keepopt=True)
Now, we will run the free-knot spline technique on each of the Nmocks
synthetic light curve datasets we made earlier - again, this will take a while. Note the keepopt=True
kwarg: this will make it easier to read the residuals from the synthetic curves with the residuals from the observed data. You can change the name "optset" to anything, though it should best reflect the name of the full method, including the settings of the microlensing.
In [33]:
tsrand = 1.0
pycs.sim.run.multirun("mocks", lcs, spl, optset="spl", tsrand=tsrand, keepopt=True)
Note: The two lines of code above do not (always) run, possibly due to a bug that should be reported to PyCS. Here is the resulting error message.
RuntimeError: Knot spacing min = 7.283257, epsilon = 10.000000
First, let's get the results from the copies of the observed light curves.
In [34]:
dataresults = [
pycs.sim.run.collect("sims_copies_opt_spl", "blue", "Free-knot spline technique")
]
Now, we can turn this into a simple histogram that will give the instrinic variance. It will be saved to a file called "fig_instrinsicvariance.pdf", for readability. The option dataout=True
will save the delay point estimate, to be used below.
In [37]:
pycs.sim.plot.hists(dataresults, r=5.0, nbins=100, showqs=False,
filename="fig_intrinsicvariance.pdf", dataout=True)
We read the results obtained on the synthetic curves.
In [38]:
simresults = [
pycs.sim.run.collect("sims_mocks_opt_spl", "blue", "Free-knot spline technique")
]
Now we can perform the error analysis. This will be saved to a file called "fig_measvstrue.pdf". The option dataout=True
will save the random and systematic error, to be used below.
In [39]:
pycs.sim.plot.measvstrue(simresults, errorrange=3.5, r=5.0, nbins = 1, binclip=True, binclipr=20.0,
plotpoints=False, filename="fig_measvstrue.pdf", dataout=True)
With the same data we can also show the relationship between measurements. This will be written to a file called "fig_covplot.pdf".
In [40]:
pycs.sim.plot.covplot(simresults, filename="fig_covplot.pdf")
Finally we group the information saved by these steps to get the results in form of a summary plot. Let's define our variables.
In [43]:
dataresults = [(pycs.gen.util.readpickle("sims_copies_opt_spl_delays.pkl"),
pycs.gen.util.readpickle("sims_mocks_opt_spl_errorbars.pkl"))]
Now we can display our plot! It will be saved to a file called "fig_delays.pdf".
In [44]:
pycs.sim.plot.newdelayplot(dataresults, rplot=6.0, displaytext=True,
filename = "fig_delays.pdf", refshifts=[{"colour":"gray", "shifts":(0, -5, -20, -70)}])
PyCS
supports a number of different time delay estimation methods, as well as the free-knot spline curve-shifting model. Let's try a couple more of them now, and compare with the spline model (which we already optimized and explored). We'll start the other two the same way, on the original data and with a reasonable set of guessed initial shifts, for a fair comparison.
In [45]:
import pycs.regdiff
for l in lcs:
l.resetshifts()
lcs[1].shifttime(-7.0)
lcs[2].shifttime(-22.0)
lcs[3].shifttime(-65.0)
print(pycs.gen.lc.getnicetimedelays(lcs, separator="\n", sorted=True))
def regdiff(lcs):
return pycs.regdiff.multiopt.opt_ts(lcs, pd=5, scale=200.0, verbose=True)
def disp(lcs):
return pycs.disp.topopt.opt_full(lcs, rawdispersionmethod, nit=5, verbose=True)
In [46]:
rawdispersionmethod = lambda lc1, lc2 : pycs.disp.disps.linintnp(lc1, lc2, interpdist = 30.0)
dispersionmethod = lambda lc1, lc2 : pycs.disp.disps.symmetrize(lc1, lc2, rawdispersionmethod)
Next, let's factor in our microlensing models (polynomials).
In [47]:
pycs.gen.polyml.addtolc(lcs[0], nparams=2, autoseasonsgap = 60.0)
pycs.gen.polyml.addtolc(lcs[1], nparams=2, autoseasonsgap = 60.0)
pycs.gen.polyml.addtolc(lcs[2], nparams=2, autoseasonsgap = 60.0)
pycs.gen.polyml.addtolc(lcs[3], nparams=2, autoseasonsgap = 60.0)
Now we will run using the dispersion technique on our plain copies data, created earlier.
In [50]:
pycs.sim.run.multirun("copies", lcs, disp, optset="disp", tsrand=10.0)
And now we will run the dispersion technique on our synthetic light curve data.
In [51]:
pycs.sim.run.multirun("mocks", lcs, disp, optset="disp", tsrand=10.0)
In [52]:
pycs.sim.run.multirun("copies", lcs, regdiff, optset="regdiff", tsrand=10.0)
And then we run the regression difference technique on our synthetic light curve data.
In [53]:
pycs.sim.run.multirun("mocks", lcs, regdiff, optset="regdiff", tsrand=10.0)
In [54]:
dataresults = [
pycs.sim.run.collect("sims_copies_opt_spl", "blue", "Free-knot spline technique"),
pycs.sim.run.collect("sims_copies_opt_disp", "red", "Dispersion-like technique"),
pycs.sim.run.collect("sims_copies_opt_regdiff", "green", "Regression difference technique")
]
In [55]:
pycs.sim.plot.hists(dataresults, r=5.0, nbins=100, showqs=False,
filename="fig_intrinsicvariance.pdf", dataout=True)
In [56]:
simresults = [
pycs.sim.run.collect("sims_mocks_opt_spl", "blue", "Free-knot spline technique"),
pycs.sim.run.collect("sims_mocks_opt_disp", "red", "Dispersion-like technique"),
pycs.sim.run.collect("sims_mocks_opt_regdiff", "green", "Regression difference technique")
]
In [57]:
pycs.sim.plot.measvstrue(simresults, errorrange=3.5, r=5.0, nbins = 1, binclip=True, binclipr=20.0,
plotpoints=False, filename="fig_measvstrue.pdf", dataout=True)
In [58]:
pycs.sim.plot.covplot(simresults, filename="fig_covplot.pdf")
In [59]:
disp = (pycs.gen.util.readpickle("sims_copies_opt_disp_delays.pkl"),
pycs.gen.util.readpickle("sims_mocks_opt_disp_errorbars.pkl"))
regdiff = (pycs.gen.util.readpickle("sims_copies_opt_regdiff_delays.pkl"),
pycs.gen.util.readpickle("sims_mocks_opt_regdiff_errorbars.pkl"))
spl = (pycs.gen.util.readpickle("sims_copies_opt_spl_delays.pkl"),
pycs.gen.util.readpickle("sims_mocks_opt_spl_errorbars.pkl"))
Our final plot! Again, over-writing ("fig_delays.pdf"), but summarizing the performance of the three methods.
In [60]:
pycs.sim.plot.newdelayplot([disp, regdiff, spl], rplot=6.0, displaytext=True,
filename = "fig_delays.pdf", refshifts=[{"colour":"gray", "shifts":(0, -5, -20, -70)}])