In [40]:

    
from __future__ import print_function, division
import thinkbayes2
import thinkplot
%matplotlib inline

Ignore the first few cells for now -- they are experiments I am working on related to the prior.



In [41]:

    
mu = 1
pmf = thinkbayes2.MakeExponentialPmf(mu, high=1.0)
thinkplot.Pdf(pmf)









    



Warning: Brewer ran out of colors.

Ignore



In [42]:

    
mu = 5
pmf = thinkbayes2.MakeExponentialPmf(mu, high=1.0)
thinkplot.Pdf(pmf)

Ignore



In [43]:

    
metapmf = thinkbayes2.Pmf()
for lam, prob in pmf.Items():
    if lam==0: continue
    pmf = thinkbayes2.MakeExponentialPmf(lam, high=30)
    metapmf[pmf] = prob
    
interarrival = thinkbayes2.MakeMixture(metapmf)
thinkplot.Pdf(interarrival)

Ok, let's start here. Suppose we know $\lambda$. We can compute the distribution of interarrival times (times between logins).



In [44]:

    
lam = 0.1    # average arrival rate in logins per day
interarrival = pmf = thinkbayes2.MakeExponentialPmf(lam, high=90)
thinkplot.Pdf(interarrival)

If we observe someone, we are more likely to land during a longer interval.



In [45]:

    
observed = interarrival.Copy()
for val, prob in observed.Items():
    observed[val] *= val
observed.Normalize()

print(interarrival.Mean(), observed.Mean())
thinkplot.Pdf(observed)









    



9.76490343621 19.9050636319

If we land during an intererval of duration $x$, the time since last login is uniform between 0 and $x$. So the distribution of time since last login (timesince) is a mixture of uniform distributions.



In [46]:

    
metapmf = thinkbayes2.Pmf()
for time, prob in observed.Items():
    if time == 0:
        continue
    pmf = thinkbayes2.MakeUniformPmf(0, time, 101)
    metapmf[pmf] = prob
    
timesince = thinkbayes2.MakeMixture(metapmf)
print(timesince.Mean())
thinkplot.Pdf(timesince)









    



9.95253181595

The data is in the form of "time since last login", so we need to be able to look up a time, $t$, and get the probability density at $t$. But we have a PMF with lots of discrete times in it, so we can't just look it up. One option: Compute the CDF, generate a sample, and estimate the PDF by KDE:



In [47]:

    
cdf = thinkbayes2.Cdf(timesince)
thinkplot.Cdf(cdf)









    Out[47]:





{'xscale': 'linear', 'yscale': 'linear'}

Get a sample:



In [48]:

    
sample = cdf.Sample(10000)

Estimate the PDF:



In [49]:

    
pdf = thinkbayes2.EstimatedPdf(sample)
thinkplot.Pdf(pdf)

Second option: use numerical differentiation to compute the derivative of the CDF, which is the PDF:



In [50]:

    
import scipy
import numpy
xs = numpy.linspace(0, 90, 101)
ys = [scipy.misc.derivative(cdf.Prob, x) for x in xs]

Numerical differentiation is more accurate, especially near zero. The value at zero is wrong: there are ways we could fix it, but it's not necessary because we won't get zero as data.



In [51]:

    
thinkplot.plot(xs, ys)









    



Warning: Brewer ran out of colors.



In [51]: