Suppose ${( t_i, x_i)}^N_{k=1}$ is a sample of some continuous process ${( X_t )}_{t\in T}$ where $X_i = X(t_i)$.
The crossing times are defined as follows: for a given base grid spacing $\delta>0$ and for $n,k\geq0$ they are the stopping times $$T^n_{k+1} = \inf\Big\{ t > T^n_k\,: \, \big | X_t - X_{T^n_k} \big | \geq \delta 2^n \Big\}$$ with $T^n_0 = 0$. The forcing of the zero-th xcrossing time to $0$ is done to align the grid with the process, so in effect, without losss of generality we may consider processes, which start at the origin.
The parameter $\delta$ is the spacing of the finest grid, with respect to which the leaves of the crossing tree are computed. Parent nodes of these leaves represent crossings of a coraser grid, namely $2\delta$. The choice of $\delta$ affects the tree in the following way in the case of a sampled process: THIS SECTION MUST BE DISCUSSED
By construction, for a binary crossing grid, $N^l_k$, the number of subcrossings in any complete crossing between $T^l_k$ and $T^l_{k+1}$ is always an even number. This is due to the fact, that each crossing is registered as soon as two unidirectional subcrossings are encoutered, as seen by the following.
Suppose $T^{n+1}_k$ and $T^n_m$ are aligned in that $T^{n+1}_k=T^n_m$, and $T^{n+1}_{k+1}<+\infty$, i.e. the $n+1$ grid crossing is complete. First of all $T^n_{m+1},T^n_{m+2}\leq T^{n+1}_{k+1}$, since otherwise the process would have crossed left the $\pm\delta 2^n$ band before it left the $\pm \delta 2^{n+1}$ band, which is twice as wide.
The process is continuous which means that almost surely for any $\epsilon>0$ there is $\eta>0$ such that for any $t$ with $\big|t-T^n_{m+1}\big| < \eta$ it holds that $\big |X_{T^n_{m+1}}-X_t\big|<\epsilon$. Also for every $t\in\big[T^n_k, T^n_{k+1}\big)$ it cannot be otherwise but $\big | X_t - X_{T^n_k} \big | < \delta 2^n$. In particular, for $\epsilon = \frac{1}{2}\delta 2^n$ it means that for a small while after $T^n_{m+1}$ the process is almost surely still within the $\pm \delta 2^{n+1}$ band. Therefore $T^n_{m+1} < T^{n+1}_{k+1}$.
Thus it is true that $$1\leq \bigg | \frac{1}{\delta 2^n} \big(X_{T^n_{m+1}} - X_{T^n_m}\big)\bigg | < 2$$. The next crossing of $\pm\delta 2^n$ takes palce at $T^n_{m+2}$ and there are two possibilities:
Since the crossing is complete, $T^{n+1}_{k+1}<+\infty$ implies that sooner that later a crossing of $\pm\delta 2^{n+1}$ grid occurs, in which case $T^{n+1}_{k+1}=T^n_{m+2p}$, meaning that there were exactly $2p$ subcrossings.
Let's test the hypothesis that the number of subcrossings follows a distribution similar to geometric. Recall that $G$ is a geometrically distributed random variable, $G\sim \text{Geom}(\theta)$, if $\mathbb{P}(G=k) = {(1-\theta)}^{k-1}\theta$ for all $k\geq1$. Other properties of geometrically dsitributed random variables include
The number of offspring of any $\delta 2^n$-grid crossing is the number of subcrossings of a finer grid with spacing $\delta 2^{n-1}$ during a typical crossing.
Our hypothesis is that $N$, the number of offspring, follows a geometric distribution on the even numbers, i.e. $\frac{1}{2}N\sim \text{Geom}(\theta)$. To test it we utilize a truncated geometric distribution with a fixed threshold $\bar{k}$, beacuse this approach naturally and easily handles unbounded random variables. Truncating the data effectively means that two kinds of observations are registered:
Therefore for even integers $k=2,4,\ldots\bar{k}$ the distribution of the number of offspring is given by $$\mathbb{P}(N=k) = {(1-\theta)}^{\frac{k}{2}-1}\theta\,1_{k<\bar{k}} + {(1-\theta)}^{\frac{\bar{k}}{2}-1}\,1_{k = \bar{k}}$$ where $1_{(\cdot)}$ -- is the $0-1$ indicator.
Suppose ${(g_i)}_{i=1}^N$ a sample of independent geometric random variables, with distribution truncated by $\bar{k}$. Due to truncation there is a finite numebr of distinct values that are observed in any given sample. Therefore it is convenient to represent every sample in an equivalent value-frequency form: ${\Big(j, f_j\Big)}$ for even $j$ not greater than $\bar{k}$ and $f_j = \bigg\lvert\big\{\big.i\,\big\rvert\,g_i=j\big\}\bigg\rvert$ -- the number of observations with the specified value. Without the loss of generality, $f_j$ can be set to zero for those $j$ that were not observed.
The log-likelihood function is given by $$\ln\mathcal{L} = \sum_{j\neq \bar{k}} f_j {\Big(\frac{j}{2}-1\Big)} \ln {(1-\theta)} + \sum_{j\neq \bar{k}} f_j \ln \theta + f_{\bar{k}} {\Big(\frac{\bar{k}}{2} - 1\Big)} \ln {(1-\theta)}$$ where the summation is done over the all possible distinct values. The first-order condition on optimal $\theta$ is given by $$\frac{d}{d \theta} \ln\mathcal{L}\,:\quad-\sum_j f_j {\Big(\frac{j}{2}-1\Big)} \frac{1}{1-\theta} + \sum_{j\neq \bar{k}} f_j \frac{1}{\theta} = 0$$ This is equivalent to $\frac{S-N}{1-\theta} = \frac{N-f_{\bar{k}}}{\theta}$, where $S = \sum_j \frac{j}{2} f_j$ and $N = \sum_j f_j$ becasue the frequencies sum up to the the total number of observations. Therefore the desired ML estimator of the probability parameter $\theta$ is $$\hat{\theta} = \frac{N-f_{\bar{k}}}{S-f_{\bar{k}}}$$
I strongly suspec that the MLE of the truncated geometric distribution is biased. If so, this renders it useless for monte-carlo estimation purposes. Clearly the bias should depend on the truncation threshold, and $\hat{\theta}_T \to \hat{\theta}$ as $T\to \infty$.
The procedure below is performs the specified $M$ number of MonteCralo simulations of the sample paths of the process specified by the generator object. The paratmeters $K$ and $T$ determine the truncation performed while collecting the crossing tree data.
For example $K=4$ menas that subcrossings of size up to and including 4 are recorded in full detail, while every subcrossing of larger size is agregated and stored in the ``tail''.
Similarly for the parameter $T$, which set the threshold level of the crossing tree starting from the leaf level and up, beyond which the data is strored in aggregate manner.
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from synthfbmcircul import synth_fbm
from hsssi_processes import *
from Weierstrass import synth_Weier, synth_Weier_2
from crossing_tree import *
from monte_carlo import *
In [ ]:
genr = synth_fbm( 2**21, .65 )
%lprun -f synth_fgn.__call__ T, X = genr( )
In [ ]:
mht, mhp, mhx, mex = xtree_build( T, X, delta = 2**(np.floor(np.log2(np.max(X)-np.min(X)))-8) )
# mht, mhp, mhx, mex = xtree_build( T, X, delta = None )
In [ ]:
#(/\, \/, ±1)
# inx = ( mex[8][:,2] + 1 ) // 2
# print mex[8]#[:,inx]
# wed = np.where(mex[8][:,2]>0, mex[8][:,0], mex[8][:,1])
# vee = np.where(mex[8][:,2]<0, mex[8][:,0], mex[8][:,1])
In [ ]:
lex= mex[8]
lex[ lex[:,2]<0, :2 ]
In [ ]:
hatvee_cond = np.zeros( (len( mex ), 2, 2 ), dtype = np.int )
for i, lex in enumerate( mex, 0 ) :
if lex.shape[0]<1 : continue
hatvee_cond[i,:,0] = lex[ lex[:,2]>0, :2 ].sum( axis = 0 )
hatvee_cond[i,:,1] = lex[ lex[:,2]<0, :2 ].sum( axis = 0 )
In [ ]:
print hatvee_cond.sum(axis = 2)
print [len(t) for t in mhx]
print [len(t) for t in mex]
lex = mex[4]
print lex[ lex[:,2]>0, :2 ].sum( axis = 0 )
print lex[ lex[:,2]<0, :2 ].sum( axis = 0 )
print hatvee_cond[4,:]
In [ ]:
hatvee_cond[1,0,:]
In [ ]:
## Import the complimentary chi squared distribution
from scipy.stats import chisqprob as cdchisq
## Choose the levels to test the hypothesis on
levels = range( -6, -2+1, 1 )
## The total number of crossings of \delta 2^n grid
Nn = np.array( [ len( mht[ l ] ) for l in levels ] ).reshape( (len(levels), -1 ) )
## The number of subcrossings of size \delta 2^{n-1} that
## make up the k-th crossing of n-th level.
Zn = [ mhx[ l ] for l in levels ]
## Compute the empirical distribution of the number of subcrossings in the pooled sample
bins, f_pool = np.unique( np.concatenate( tuple( Zn ) ), return_counts = True )
p_pool = f_pool / np.sum( f_pool, dtype = np.float64 )
In [ ]:
p_lvl = np.zeros( ( len( Nn ), len( bins ) ), dtype = np.float )
for l, z in enumerate( Zn, 0 ) :
cat, freq = np.unique( z, return_counts = True )
p_lvl[ l, np.searchsorted( bins, cat ) ] = freq / np.sum( freq, dtype = np.float64 )
In [ ]:
stat = np.sum( Nn * ( ( p_lvl - p_pool )**2 / p_pool ) )
pv = cdchisq( stat, ( len( levels ) - 1 ) * ( len( bins ) - 1 ) )
print stat, pv
In [ ]:
## Given the agregated value-count data in X, compute the MLE of \theta of the geometric distribution.
## I think the MLE is biased... This renders averaging across simulations useselss.(
def mle_theta( x ) :
N = np.sum( x )
S = np.dot( x, np.arange( 1, len( x ) + 1, dtype = np.float ) )
return ( N+1 - x[ -1 ] ) / ( S+1 - x[ -1 ] )
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from synthfbmcircul import synth_fbm
from hsssi_processes import *
from Weierstrass import synth_Weier, synth_Weier_2
from crossing_tree import *
from monte_carlo import *
In [ ]:
## The Monte Carlo kernel performs a single experiment and returns its
## K'xT slice of the result.
def monte_carlo_kernel( generator, K, T, delta = None ) :
from scipy.stats import chisqprob as cdchisq
## K is the maximal number of subcrossings beyond which the data
## on subcrossings is agregateed into the tail. (truncation parameter)
K = 2 * ( K // 2 + 1 )
## T is the number of detailed levels of the crossing tree to stored in
## output. Any crossings of grids coarser than the treshold are
## aggregated.
distr = np.zeros( ( T+1, K // 2 ), dtype = np.int )
## Generate a replication of the process
t, x = generator( )
## G. Decrouez 2015-02-12: the selection of the spacing of the finest grid
## based on the scale of the process is crucial as it allows comparison
## of the crossing tree between different sample paths.
if delta is None :
# sigma = np.std( np.diff( x ) )
## Set the base scale so that the constructed tree is 6 layers deep.
# delta = 2**( np.floor( np.log2( np.max( x ) - np.min( x ) ) ) - 6 )
delta = None
## 2015-04-08 : I think 6-level deep crossing tree is too poor for any analysis
## which is why it is necessary to take the variance into account
## Get the hitting times and points
ht, hp, subx, excur = xtree_build_superfast( t, x, delta )
## Pool all the crossing counts together and construct the distribution matrix
## ToDo
## ChiSquare test for distributional match: choose the levels to test the hypothesis on
levels = range( -5, -3+1, 1 )
## The total number of crossings of \delta 2^n grid
Nn = np.array( [ len( ht[ l ] ) for l in levels ] ).reshape( (len(levels), -1 ) )
## The number of subcrossings of size \delta 2^{n-1} that
## make up the k-th crossing of n-th level.
Zn = [ subx[ l ] for l in levels ]
## Compute the empirical distribution of the number of subcrossings in the pooled sample
bins, f_pool = np.unique( np.concatenate( tuple( Zn ) ), return_counts = True )
p_pool = f_pool / np.sum( f_pool, dtype = np.float64 )
## Calculate the empirical probabilites at each level
p_lvl = np.zeros( ( len( Nn ), len( bins ) ), dtype = np.float )
for l, z in enumerate( Zn, 0 ) :
cat, freq = np.unique( z, return_counts = True )
p_lvl[ l, np.searchsorted( bins, cat ) ] = freq / np.sum( freq, dtype = np.float64 )
## Run the chi_square test on each level of the tree
stat = np.sum( Nn * ( ( p_lvl - p_pool )**2 / p_pool ) )
dof = len( levels ) * ( len( bins ) - 1 )
## Now summarize the distribution: Simplify
## ToDo
for level, xing in enumerate( subx[1:], 0 ) :
## Count the absolute frequencies
c, f = np.unique( xing, return_counts = True )
## We collect the data on the distribution truncated by K (including).
## Thus truncate the number of subcrossings by K -- everything less is
## detailed, otherwise -- agregated into tail
c = np.minimum( c, K )
## Similarly truncate the level by T + 1
if level >= T : level = T
## Fill the offspring distribution matrix
for c, f in zip( c, f ) :
distr[ level, c // 2 - 1 ] += f
## Use straightforward bivariate regression implementation.
## We could also use ML estimators, which is much better!
## See the handwritten notes.
## Estimate the conditional distribution of excursions
## lvl x (++ --) x (+- -+) # (/\, \/, +/-1) == (u,d,s)
hatvee = np.zeros( (len( excur ), 2, 2 ), dtype = np.int )
for i, lex in enumerate( excur, 0 ) :
if lex.shape[0] < 1 : continue
hatvee[i,0,:] = lex[ lex[:,2]>0, :2 ].sum( axis = 0 )
hatvee[i,1,:] = lex[ lex[:,2]<0, :2 ].sum( axis = 0 )
## Estimate the probability distribution of the number of offspring
## ToDo
return distr, (stat,dof, cdchisq(stat, dof)), hatvee
Let's have a look at the estiate of the parameter of the offspring distribution. We set up the grid with base spacing equal to $2^{-6}$, as has been done in Geoffrey's paper.
The continuous processes are confined to the $[0,1]$ interval and their smaple paths are observed at discrete knots of the uniformly spaced lattice ${\big(t_i\big)}_{i=0}^{N-1}$ with $t_0=0$ and $t_1=1$. In effect every single Monte Carlo replication deals with a sample discretized path ${\big(t_k, x_k\big)}_{k=0}^{N-1}$ where $x_k = X(t_k)$.
In [ ]:
## Set the size of the monte carlo experiment
M = 10
## Determine how many levels of the crosssing tree to keep
T = 5
## And set the truncation threshold so that subcrossings of size greater
## than or equal to 34 are put in the tail of the offspring distribution.
K = 32
## Set the time-precision of the sampled paths
N = 2**18
## Set the studied Hurst exponent
H = .95
## The base grid spacing is going to be 2^{-6}
# delta = 2**(-6)
## Setting \delta to zero construct the tree for adaptive
## base spacing equal to the standard deviation of the increments
In [ ]:
## Study the Fractional Brownian motion.
%lprun -f xtree_build_superfast res_fbm = monte_carlo_serial( synth_fbm( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
# res_fbm = monte_carlo_parallel( synth_fbm( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
In [ ]:
distr = np.concatenate( [ [d] for d, _, _ in res_fbm[0] ] )
chisq = np.concatenate( [ [c] for _, c, _ in res_fbm[0] ] )
hatvee = np.concatenate( [ p for _, _, p in res_fbm[0] ] )
In [ ]:
distr
In [ ]:
## Import the complimentary chi squared distribution
from scipy.stats import chisqprob as cdchisq
pv = np.array( [p for s,d,p in chisq], dtype = np.float )
print np.sum( pv < 0.05, dtype = np.float ) / len( pv )
In [ ]:
chisq
In [ ]:
plt.plot(pv)
In [ ]:
def get_probs( res ) :
## Compute the empirical distribution function of the number of offspring
return np.apply_along_axis( lambda f : f * 1.0 / np.sum( f ), 2, res )
## Compute the probabilities
probs = get_probs( distr )
## In homage to the original code, call the mean probabilities and errors by PrZ and StZ
PrZ = np.mean( probs, axis = 0 )
StZ = np.std( probs, axis = 0 )
## Create an appropriate array of offspring values
offs = np.arange( 2, 2 * probs.shape[ 2 ] + 1, 2, np.float )
## Create a new plot
plt.figure( 1, figsize = ( 8, 6 ) )
theta = 2**(1.0-1.0/H)
offs_prb = (1.0-theta)**(offs//2-1)*theta
plt.plot( offs, np.log( offs_prb ), "k-", linewidth = 2, markersize = 15 )
## Plot the logarithm of the probability of observing a given number of subcrossings
for i, c in zip( range( 0, probs.shape[ 2 ] ), ["r", "g", "b", "y"] ) :
plt.plot( offs, np.log( PrZ[i,:] ), "o" + c )
# plt.errorbar( offs, PrZ[i,:], yerr = StZ[i,:], color = c, linewidth = 0 )
plt.show( )
In [ ]:
# P(-+|u) = P(+-|d) = 1/sqrt(\mu)
pu = list( ); pd = list( )
for _,_,p in res_fbm[0]:
pp = p[1,:]+p[2,:]+p[3,:]
tt = pp.sum(axis=1)
print pp
print tt
In [ ]:
## Examine the properties of the Weierstrass-Mandelbrot process
# %lprun -f monte_carlo_kernel res_wei = monte_carlo_serial( synth_Weier_2( N, 1.2, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
res_wei = monte_carlo_parallel( synth_Weier( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
In [ ]:
## Finally have a look at the Rosenblatt process
# res_ros = monte_carlo_parallel( synth_Rosenblatt( N // (2**6), H, K = 2**6 ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
In [ ]:
## Save the results for later reuse
import time as tm
## Make a file with a timestamped name
outfile = "./MC m%d p%d h%.3f %s.npz" % ( M, np.log2(N), H, tm.strftime("%Y%m%d-%H%M") )
## Save the copy of the monte carlo output
np.savez( outfile, fbm = res_fbm, wei = res_wei )
# np.savez( outfile, fbm = res_fbm, wei = res_wei, ros = res_ros )
In [ ]:
## Set the size of the monte carlo experiment
M = 100
## Determine how many levels of the crosssing tree to keep
T = 5
## And set the truncation threshold so that subcrossings of size greater
## than or equal to 34 are put in the tail of the offspring distribution.
K = 32
## Set the time-precision of the sampled paths
N = 2**18
## Set the studied Hurst exponent
H = .85
## The base grid spacing is going to be 2^{-6}
# delta = 2**(-6)
## Setting \delta to zero construct the tree for adaptive
## base spacing equal to the standard deviation of the increments
In [ ]:
## Study the Fractional Brownian motion.
res_fbm = monte_carlo_parallel( synth_fbm( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
In [ ]:
## Examine the properties of the Weierstrass-Mandelbrot process
res_wei = monte_carlo_parallel( synth_Weier( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
In [ ]:
## Save the results for later reuse
import time as tm
## Make a file with a timestamped name
outfile = "./MC m%d p%d h%.3f %s.npz" % ( M, np.log2(N), H, tm.strftime("%Y%m%d-%H%M") )
## Save the copy of the monte carlo output
np.savez( outfile, fbm = res_fbm, wei = res_wei )
# np.savez( outfile, fbm = res_fbm, wei = res_wei, ros = res_ros )
In [ ]:
def get_probs( res ) :
## Compute the empirical distribution function of the number of offspring
return np.apply_along_axis( lambda f : f * 1.0 / np.sum( f ), 2, res )
This procedure plots the empirical probability of observing a set number of offspring at each node of thw crossing tree.
In [ ]:
def mc_plot_empirical_probs( res, H, title = None ) :
## Compute the probabilities
probs = get_probs( res )
## In homage to the original code, call the mean probabilities and errors by PrZ and StZ
PrZ = np.mean( probs, axis = 0 )
StZ = np.std( probs, axis = 0 )
## Create an appropriate array of offspring values
offs = np.arange( 2, 2 * probs.shape[ 2 ] + 1, 2, np.float )
## Create a new plot
plt.figure( 1, figsize = ( 8, 6 ) )
if title is not None :
plt.title( title )
theta = 2**(1.0-1.0/H)
offs_prb = (1.0-theta)**(offs//2-1)*theta
plt.plot( offs, np.log( offs_prb ), "k-", linewidth = 2, markersize = 15 )
## Plot the logarithm of the probability of observing a given number of subcrossings
for i, c in zip( range( 0, probs.shape[ 2 ] ), ["r", "g", "b", "y"] ) :
plt.plot( offs, np.log( PrZ[i,:] ), "o" + c )
# plt.errorbar( offs, PrZ[i,:], yerr = StZ[i,:], color = c, linewidth = 0 )
plt.show( )
In [ ]:
mc_plot_empirical_probs( res_fbm, H, "fBM %0.2f" % ( H ) )
In [ ]:
import re
outfiles = [ './MC m100 p18 h0.500 20150226-1252.npz', './MC m100 p18 h0.850 20150226-1301.npz']
for outfile in outfiles :
opt = dict( ( x.group(1).upper(), float(x.group(2)) )
for x in re.finditer( r"\b([a-z])([-+]?(?:\d+(?:\.\d*)?|\.\d+))\b", outfile ) )
with np.load( outfile ) as f_res :
res_fbm, res_wei = f_res['fbm'], f_res['wei']
# res_ros = f_res['ros']
mc_plot_empirical_probs( res_fbm, opt['H'], "fBM %0.2f" % ( opt['H'] ) )
mc_plot_empirical_probs( res_wei, opt['H'], "WM %0.2f" % ( opt['H'] ) )
# mc_plot_empirical_probs( res_ros, "Rosenblat %0.2f" % ( H ) )
In [ ]:
res_fbm
In [ ]:
## The MC results array is MxTxK. Fixing axes 0 and 1, compute
## the ML estimate of the parameter over the axis 2 -- the within
## tree-level smaple of the offspring distribution.
theta = np.apply_along_axis( mle_theta, 2, res_fbm )
# mle_theta(DD[:,0,0])
# np.save( "./run 2b22 2015-02-14", DD_par )
# XX_par = np.load( "./run 2b22 2015-02-14.npy" )
# np.max(np.abs(DD_par - XX_par))
In [ ]:
plt.plot( theta[:,0], "r-")
plt.plot( theta[:,1], "b-")
plt.plot( theta[:,2], "k-")
1/(1-np.log(np.mean(theta, axis=0)))
In [ ]:
plt.plot( 1/(1-np.log(theta[:,0])), "r-")
plt.plot( 1/(1-np.log(theta[:,1])), "b-")
plt.plot( 1/(1-np.log(theta[:,2])), "k-")
In [ ]:
def mean_subx( x ) :
return np.dot( x, 2*np.arange( 1, len( x ) + 1, dtype = np.float ) ) / np.sum( x )
moff = np.apply_along_axis( mean_subx, 2, res_fbm )
plt.plot( moff[:,0], "k-")
plt.plot( moff[:,1], "b-")
plt.plot( moff[:,2], "r-")
In [ ]:
DD_par[:,2,:]
In [ ]:
theta2 = np.apply_along_axis(mle_theta, 2, DD_par2)
plt.plot( theta2[:,0], "r-")
plt.plot( theta2[:,1], "b-")
plt.plot( theta2[:,2], "k-")
1/(1-np.log(np.mean(theta2, axis=0)))
# DD_par2[:,:,3]
In [ ]:
def scaffolding_plot( t, x ) :
ht, hp, xt, exc = xtree_build( t, x )
l = len( ht ) - 5
plt.figure( figsize = (15, 8) )
plt.plot( t, x, "y-")
# for p in tp[l] : plt.axhline( p, linewidth = 1, color = 'k' )
plt.step( ht[l+0], hp[l+0], "g>" ) #, where = 'mid' )
plt.step( ht[l+1], hp[l+1], "b>" ) #, where = 'mid' )
plt.step( ht[l+2], hp[l+2], "r>" ) #, where = 'mid' )
plt.step( ht[l+3], hp[l+3], "k>" ) #, where = 'mid' )
plt.show( )
# return [zip(*np.unique( x, return_counts = True)) for x in xt ]
In [ ]:
genr = synth_Weier(N, H)
t,x = genr()
plt.plot(t,x, "y-")
scaffolding_plot(t,x)
In [ ]:
## The mean number of offspring
## The empirical distribution of the subcrossings
##
However, for the estimation purposes it is useful to worj with the complementary CDF, instead of the ``density'' itself: $$\mathbb{P}(N\geq k) = {(1-\theta)}^k$$ this way no distinction between $k<\bar{k}$ of $k=\bar{k}$ needs to be done.
For this purpose, let's estimate the following regression model: $$\ln \hat{p}_k \sim C + \beta k$$ where $\hat{p}_k$ is the empirical probability that a crossing had no less than $k$ subcrossings and $\beta = \ln \sqrt{(1-\theta)}$.
It is entirely posible that the distribution's shape depend on the depth. If it does, then there is no self similarity.
In [ ]:
from scipy.stats import chi2_contingency as chisq_test
def ctable( x, y ) :
## This procedure makes a contingeny table out of x and y (they should be discrete)
lxy = np.append( x, y ).reshape( len( x ),2)
lxy = lxy.view(dtype = np.dtype([('x', x.dtype), ('y', y.dtype)]))
lc, lf = np.unique( lxy, return_counts = True )
print lc
xc = np.unique( x ) ; yc = np.unique( y )
txy = np.zeros( ( len( xc ), len( yc ) ), dtype = np.int )
vix = lc.view( ( np.int, len( lc.dtype ) ) )
ix = np.searchsorted( xc, vix[:,0] )
iy = np.searchsorted( yc, vix[:,1] )
txy[[ix, iy]] = lf
return txy
In [ ]:
# scaffolding_plot( *genr( ) )
def run() :
monte_carlo( genr, M=100, K = 16, T = 3 )
%lprun -f xtree_integer_crossings_fast run()
In [ ]:
# plt.plot(d[0,:], "rx", d[1,:], "bx", d[2,:], "kx")
# f ~ a^(k-1)(1-a)
# l
plt.plot( range(2, 13, 2), np.log(np.mean( d, 1 )) )
In [ ]:
m = 0
for s in a[1:]:
plt.step( s[0], np.log( s[1] * 2**m ), "b-")
m+=1
plt.step( a[1][0], np.log( a[1][1] ), "r-")
In [ ]:
N = 2**10
H = .5
# draw(synth_fbm( N, H ))
# draw(synth_Weier( N, H ))
# draw(synth_Rosenblatt( N, H ))
# draw(synth_Hermite3( N, H ))
# draw(synth_Hermite4( N, H ))
In [ ]:
## Test the process generators
def draw( gen, seed = None ) :
t, u = gen( seed )
t, v = gen( seed )
plt.plot( t, u, "r-", t, v, "b-" )
plt.show()
In [ ]:
genr = synth_fbm( 2**20, .5 )
In [ ]:
T, X = genr()
In [ ]:
mht, mhp, mhx, mex = xtree_build( T, X, delta = 2**(np.floor(np.log2(np.max(X)-np.min(X)))-8) )
In [ ]:
from scipy.stats import chisquare as chisq
xlevels = mhx[-6:-4+1]
xpool = np.concatenate( tuple( xlevels ) )
pc, pf = np.unique( xpool, return_counts = True )
xf = list()
for x in xlevels :
c, f = np.unique( x, return_counts = True )
zf = np.zeros( pf.shape, np.int )
zf[ np.searchsorted( pc, c ) ] = f
xf.append( zf )
[ chisq(x, pf) for x in xf ]
In [ ]:
print [len( t ) for t in mht]
print [x.sum() for x in mhx[1:]]
In [ ]:
print np.allclose( np.sum( np.abs( mex[2][:,:-1] ), axis = 1 ) * 2+2, mhx[2])
mex[5][:,:-1]
In [ ]:
from Weierstrass import *
N = 2**20
T, X = genr()
# T, X = synthweier( N, 0.7, 1.2, 1000, deterministic = True, seed = None )
# T = np.arange(N, dtype=np.float)/ (N-1)
delta = np.std( np.diff( X ), ddof = 1 )
Z = ( X - X[ 0 ] ) / delta
delta
lht, lhp, lhx, exc = xtree_integer_crossings_fast( T, Z )
In [ ]:
ht, hp, hx, exc = xtree_integer_crossings( T, Z )
print np.sum(np.abs(ht-lht))+np.sum(np.abs(hp-lhp))
# lht, lhp, lhx, exc = xtree_super_crossing( lht, lhp, 2 )
In [ ]:
print len(lht), len(lhp)
lhp[ np.append( [False], lhp[1:]==lhp[:-1]) ]
print len(T), len(Z)
np.std( np.diff( lht ), ddof = 1 )-5.3e-05
scaffolding_plot( T, X )
In [ ]:
def test(M=10) :
genr = synth_fbm( 2**12, .5 )
for i in xrange(M) :
T, X = genr()
mht, mhp, mhx = xtree_build( T, X, delta = 2**(-6) )
%lprun -f xtree_integer_crossings_fast test( 50 )
In [ ]:
genr = synth_fbm( 2**10, .5 )
for i in xrange(100) :
T, X = genr()
j2 = int( np.floor( np.log2( max( X ) - min( X ) ) ) )
base = float( 2**(j2-6) )
mht, mhp, mhx = xtree_build( T, X, delta = base )
ght, ghp, ghx = f_get_w( T, X, delta = 1, levels = range( j2-6, j2+1 ), deleteFirst=False )
for m,g in zip( mhp, ghp ) :
assert( not [ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ])
for m,g in zip( mht, ght ) :
assert( not [ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ])
for m,g in zip( mhx[1:], ghx ) :
assert( not [ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ])
# print [len(x)-len(y) for x, y in zip( ghp, mhp )]
# print [len(x)-len(y) for x, y in zip( ght, mht )]
# print [len(x)-len(y) for x, y in zip( ghx, mhx[1:] )]
# print [ np.nanmax(np.abs(m/g-1)) for m,g in zip( mht, ght ) ]
# print [[ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ] for m,g in zip( mhp, ghp )]
# print [[ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ] for m,g in zip( mht, ght )]
# print [[ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ] for m,g in zip( mhx[1:], ghx )]