Studying self-similarity using the crossing tree

Nazarov Ivan

Supervisor: DeCrouez GG

Crossing times

Suppose ${( t_i, x_i)}^N_{k=1}$ is a sample of some continuous process ${( X_t )}_{t\in T}$ where $X_i = X(t_i)$.

The crossing times are defined as follows: for a given base grid spacing $\delta>0$ and for $n,k\geq0$ they are the stopping times $$T^n_{k+1} = \inf\Big\{ t > T^n_k\,: \, \big | X_t - X_{T^n_k} \big | \geq \delta 2^n \Big\}$$ with $T^n_0 = 0$. The forcing of the zero-th xcrossing time to $0$ is done to align the grid with the process, so in effect, without losss of generality we may consider processes, which start at the origin.

The parameter $\delta$ is the spacing of the finest grid, with respect to which the leaves of the crossing tree are computed. Parent nodes of these leaves represent crossings of a coraser grid, namely $2\delta$. The choice of $\delta$ affects the tree in the following way in the case of a sampled process: THIS SECTION MUST BE DISCUSSED

  1. The grid with too low a value of $\delta$ would be crossed by straight line segments between each pair consecutive sample observations. This could poison the distribution with some unfavouralbe yet unknown mixture and lead to excessive number of seemingly meaningless crossings.
  2. Too large $\delta$ lead to a very poor and under sampled crossing tree.

By construction, for a binary crossing grid, $N^l_k$, the number of subcrossings in any complete crossing between $T^l_k$ and $T^l_{k+1}$ is always an even number. This is due to the fact, that each crossing is registered as soon as two unidirectional subcrossings are encoutered, as seen by the following.

Suppose $T^{n+1}_k$ and $T^n_m$ are aligned in that $T^{n+1}_k=T^n_m$, and $T^{n+1}_{k+1}<+\infty$, i.e. the $n+1$ grid crossing is complete. First of all $T^n_{m+1},T^n_{m+2}\leq T^{n+1}_{k+1}$, since otherwise the process would have crossed left the $\pm\delta 2^n$ band before it left the $\pm \delta 2^{n+1}$ band, which is twice as wide.

The process is continuous which means that almost surely for any $\epsilon>0$ there is $\eta>0$ such that for any $t$ with $\big|t-T^n_{m+1}\big| < \eta$ it holds that $\big |X_{T^n_{m+1}}-X_t\big|<\epsilon$. Also for every $t\in\big[T^n_k, T^n_{k+1}\big)$ it cannot be otherwise but $\big | X_t - X_{T^n_k} \big | < \delta 2^n$. In particular, for $\epsilon = \frac{1}{2}\delta 2^n$ it means that for a small while after $T^n_{m+1}$ the process is almost surely still within the $\pm \delta 2^{n+1}$ band. Therefore $T^n_{m+1} < T^{n+1}_{k+1}$.

Thus it is true that $$1\leq \bigg | \frac{1}{\delta 2^n} \big(X_{T^n_{m+1}} - X_{T^n_m}\big)\bigg | < 2$$. The next crossing of $\pm\delta 2^n$ takes palce at $T^n_{m+2}$ and there are two possibilities:

  1. the process crossed a new $\pm\delta 2^{n+1}$ grid line: $\big|X_{T^n_{m+2}} - X_{T^n_m}\big|\geq 2\cdot\delta 2^n$. In this case $T^{n+1}_{k+1}\leq T^n_{m+2}$ and there are two subcrosings.
  2. the process moved back to the level $X_{T^n_m}$, which does not incurr a crossing of $\pm\delta 2^{n+1}$ grid line, yet is registered by the $\pm\delta 2^n$ grid. In this case $T^n_{m+2} < T^{n+1}_{k+1}$ and $X_{T^n_{m+2}} = X_{T^{n+1}_{k+1}}$ -- back to the beginning of this argument.

Since the crossing is complete, $T^{n+1}_{k+1}<+\infty$ implies that sooner that later a crossing of $\pm\delta 2^{n+1}$ grid occurs, in which case $T^{n+1}_{k+1}=T^n_{m+2p}$, meaning that there were exactly $2p$ subcrossings.

A model of the offspring distribution

Let's test the hypothesis that the number of subcrossings follows a distribution similar to geometric. Recall that $G$ is a geometrically distributed random variable, $G\sim \text{Geom}(\theta)$, if $\mathbb{P}(G=k) = {(1-\theta)}^{k-1}\theta$ for all $k\geq1$. Other properties of geometrically dsitributed random variables include

  • Complimentary CDF $\mathbb{P}(G\geq k) = {(1-\theta)}^{k-1}$ for any $k\geq1$;
  • Expectation $\mathbb{E}(G) = \theta^{-1}$;
  • memorylessness.

The number of offspring of any $\delta 2^n$-grid crossing is the number of subcrossings of a finer grid with spacing $\delta 2^{n-1}$ during a typical crossing.

Our hypothesis is that $N$, the number of offspring, follows a geometric distribution on the even numbers, i.e. $\frac{1}{2}N\sim \text{Geom}(\theta)$. To test it we utilize a truncated geometric distribution with a fixed threshold $\bar{k}$, beacuse this approach naturally and easily handles unbounded random variables. Truncating the data effectively means that two kinds of observations are registered:

  • a value less than $\bar{k}$;
  • an event $\big\{N\geq \bar{k}\big\}$, indicating that the value has not been less than $\bar{k}$. Basically such trincated random variable behaves like a typical geometric one on the values $\big\{ \left . 2m\right \rvert 2m < \bar{k}\big\}$, but happens ot have an unusual concentration of probability at the upper truncation level $\bar{k}$.

Therefore for even integers $k=2,4,\ldots\bar{k}$ the distribution of the number of offspring is given by $$\mathbb{P}(N=k) = {(1-\theta)}^{\frac{k}{2}-1}\theta\,1_{k<\bar{k}} + {(1-\theta)}^{\frac{\bar{k}}{2}-1}\,1_{k = \bar{k}}$$ where $1_{(\cdot)}$ -- is the $0-1$ indicator.

Maximum Likelihhod Estimation

Suppose ${(g_i)}_{i=1}^N$ a sample of independent geometric random variables, with distribution truncated by $\bar{k}$. Due to truncation there is a finite numebr of distinct values that are observed in any given sample. Therefore it is convenient to represent every sample in an equivalent value-frequency form: ${\Big(j, f_j\Big)}$ for even $j$ not greater than $\bar{k}$ and $f_j = \bigg\lvert\big\{\big.i\,\big\rvert\,g_i=j\big\}\bigg\rvert$ -- the number of observations with the specified value. Without the loss of generality, $f_j$ can be set to zero for those $j$ that were not observed.

The log-likelihood function is given by $$\ln\mathcal{L} = \sum_{j\neq \bar{k}} f_j {\Big(\frac{j}{2}-1\Big)} \ln {(1-\theta)} + \sum_{j\neq \bar{k}} f_j \ln \theta + f_{\bar{k}} {\Big(\frac{\bar{k}}{2} - 1\Big)} \ln {(1-\theta)}$$ where the summation is done over the all possible distinct values. The first-order condition on optimal $\theta$ is given by $$\frac{d}{d \theta} \ln\mathcal{L}\,:\quad-\sum_j f_j {\Big(\frac{j}{2}-1\Big)} \frac{1}{1-\theta} + \sum_{j\neq \bar{k}} f_j \frac{1}{\theta} = 0$$ This is equivalent to $\frac{S-N}{1-\theta} = \frac{N-f_{\bar{k}}}{\theta}$, where $S = \sum_j \frac{j}{2} f_j$ and $N = \sum_j f_j$ becasue the frequencies sum up to the the total number of observations. Therefore the desired ML estimator of the probability parameter $\theta$ is $$\hat{\theta} = \frac{N-f_{\bar{k}}}{S-f_{\bar{k}}}$$

I strongly suspec that the MLE of the truncated geometric distribution is biased. If so, this renders it useless for monte-carlo estimation purposes. Clearly the bias should depend on the truncation threshold, and $\hat{\theta}_T \to \hat{\theta}$ as $T\to \infty$.

The Monte Carlo routine

The procedure below is performs the specified $M$ number of MonteCralo simulations of the sample paths of the process specified by the generator object. The paratmeters $K$ and $T$ determine the truncation performed while collecting the crossing tree data.
For example $K=4$ menas that subcrossings of size up to and including 4 are recorded in full detail, while every subcrossing of larger size is agregated and stored in the ``tail''.

Similarly for the parameter $T$, which set the threshold level of the crossing tree starting from the leaf level and up, beyond which the data is strored in aggregate manner.


In [ ]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

from synthfbmcircul import synth_fbm
from hsssi_processes import *
from Weierstrass import synth_Weier, synth_Weier_2

from crossing_tree import *
from monte_carlo import *

In [ ]:
genr = synth_fbm( 2**21, .65 )

%lprun -f synth_fgn.__call__ T, X = genr( )

In [ ]:
mht, mhp, mhx, mex = xtree_build( T, X, delta = 2**(np.floor(np.log2(np.max(X)-np.min(X)))-8) )
# mht, mhp, mhx, mex = xtree_build( T, X, delta = None )

In [ ]:
#(/\, \/, ±1)
# inx = ( mex[8][:,2] + 1 ) // 2
# print  mex[8]#[:,inx]
# wed = np.where(mex[8][:,2]>0, mex[8][:,0], mex[8][:,1])
# vee = np.where(mex[8][:,2]<0, mex[8][:,0], mex[8][:,1])

In [ ]:
lex= mex[8]
lex[ lex[:,2]<0, :2 ]

In [ ]:
hatvee_cond = np.zeros( (len( mex ), 2, 2 ), dtype = np.int )
for i, lex in enumerate( mex, 0 ) :
    if lex.shape[0]<1 : continue
    hatvee_cond[i,:,0] = lex[ lex[:,2]>0, :2 ].sum( axis = 0 )
    hatvee_cond[i,:,1] = lex[ lex[:,2]<0, :2 ].sum( axis = 0 )

In [ ]:
print hatvee_cond.sum(axis = 2)
print [len(t) for t in mhx]
print [len(t) for t in mex]
lex = mex[4]
print lex[ lex[:,2]>0, :2 ].sum( axis = 0 )
print lex[ lex[:,2]<0, :2 ].sum( axis = 0 )
print hatvee_cond[4,:]

In [ ]:
hatvee_cond[1,0,:]

In [ ]:
## Import the complimentary chi squared distribution 
from scipy.stats import chisqprob as cdchisq

## Choose the levels to test the hypothesis on
levels = range( -6, -2+1, 1 )

## The total number of crossings of \delta 2^n grid
Nn = np.array( [ len( mht[ l ] ) for l in levels ] ).reshape( (len(levels), -1 ) )

## The number of subcrossings of size \delta 2^{n-1} that
##  make up the k-th crossing of n-th level.
Zn = [ mhx[ l ] for l in levels ]

## Compute the empirical distribution of the number of subcrossings in the pooled sample
bins, f_pool = np.unique( np.concatenate( tuple( Zn ) ), return_counts = True )
p_pool = f_pool / np.sum( f_pool, dtype = np.float64 )

In [ ]:
p_lvl = np.zeros( ( len( Nn ), len( bins ) ), dtype = np.float )
for l, z in enumerate( Zn, 0 ) :
    cat, freq = np.unique( z, return_counts = True )
    p_lvl[ l, np.searchsorted( bins, cat ) ] = freq / np.sum( freq, dtype = np.float64 )

In [ ]:
stat = np.sum( Nn * ( ( p_lvl - p_pool )**2 / p_pool ) )
pv = cdchisq( stat, ( len( levels ) - 1 ) * ( len( bins ) - 1 ) )
print stat, pv

In [ ]:
## Given the agregated value-count data in X, compute the MLE of \theta of the geometric distribution.
## I think the MLE is biased... This renders averaging across simulations useselss.(
def mle_theta( x ) :
    N = np.sum( x )
    S = np.dot( x, np.arange( 1, len( x ) + 1, dtype = np.float ) )
    return ( N+1 - x[ -1 ] ) / ( S+1 - x[ -1 ] )

In [ ]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

from synthfbmcircul import synth_fbm
from hsssi_processes import *
from Weierstrass import synth_Weier, synth_Weier_2

from crossing_tree import *
from monte_carlo import *

In [ ]:
## The Monte Carlo kernel performs a single experiment and returns its
## K'xT slice of the result.
def monte_carlo_kernel( generator, K, T, delta = None ) :
	from scipy.stats import chisqprob as cdchisq
## K is the maximal number of subcrossings beyond which the data
##  on subcrossings is agregateed into the tail. (truncation parameter)
	K = 2 * ( K // 2 + 1 )
## T is the number of detailed levels of the crossing tree to stored in
##  output. Any crossings of grids coarser than the treshold are
##  aggregated.
	distr = np.zeros( ( T+1, K // 2 ), dtype = np.int )
## Generate a replication of the process
	t, x = generator( )
## G. Decrouez 2015-02-12: the selection of the spacing of the finest grid
##  based on the scale of the process is crucial as it allows comparison
##  of the crossing tree between different sample paths.
	if delta is None :
# 		sigma = np.std( np.diff( x ) )
## Set the base scale so that the constructed tree is 6 layers deep.
# 		delta = 2**( np.floor( np.log2( np.max( x ) - np.min( x ) ) ) - 6 )
		delta = None
## 2015-04-08 : I think 6-level deep crossing tree is too poor for any analysis
##  which is why it is necessary to take the variance into account
## Get the hitting times and points
	ht, hp, subx, excur = xtree_build_superfast( t, x, delta )
## Pool all the crossing counts together and construct the distribution matrix 
## ToDo
## ChiSquare test for distributional match: choose the levels to test the hypothesis on
	levels = range( -5, -3+1, 1 )
## The total number of crossings of \delta 2^n grid
	Nn = np.array( [ len( ht[ l ] ) for l in levels ] ).reshape( (len(levels), -1 ) )
## The number of subcrossings of size \delta 2^{n-1} that
##  make up the k-th crossing of n-th level.
	Zn = [ subx[ l ] for l in levels ]
## Compute the empirical distribution of the number of subcrossings in the pooled sample
	bins, f_pool = np.unique( np.concatenate( tuple( Zn ) ), return_counts = True )
	p_pool = f_pool / np.sum( f_pool, dtype = np.float64 )
## Calculate the empirical probabilites at each level
	p_lvl = np.zeros( ( len( Nn ), len( bins ) ), dtype = np.float )
	for l, z in enumerate( Zn, 0 ) :
		cat, freq = np.unique( z, return_counts = True )
		p_lvl[ l, np.searchsorted( bins, cat ) ] = freq / np.sum( freq, dtype = np.float64 )
## Run the chi_square test on each level of the tree
	stat = np.sum( Nn * ( ( p_lvl - p_pool )**2 / p_pool ) )
	dof = len( levels ) * ( len( bins ) - 1 )
## Now summarize the distribution: Simplify
## ToDo
	for level, xing in enumerate( subx[1:], 0 ) :
## Count the absolute frequencies
		c, f = np.unique( xing, return_counts = True )
## We collect the data on the distribution truncated by K (including).
## Thus truncate the number of subcrossings by K -- everything less is
##  detailed, otherwise -- agregated into tail
		c = np.minimum( c, K )
## Similarly truncate the level by T + 1
		if level >= T : level = T
## Fill the offspring distribution matrix
		for c, f in zip( c, f ) :
			distr[ level, c // 2 - 1 ] += f
## Use straightforward bivariate regression implementation.
## We could also use ML estimators, which is much better!
## See the handwritten notes.
## Estimate the conditional distribution of excursions
## lvl x (++ --) x (+- -+) # (/\, \/, +/-1) ==  (u,d,s)
	hatvee = np.zeros( (len( excur ), 2, 2 ), dtype = np.int )
	for i, lex in enumerate( excur, 0 ) :
		if lex.shape[0] < 1 : continue
		hatvee[i,0,:] = lex[ lex[:,2]>0, :2 ].sum( axis = 0 )
		hatvee[i,1,:] = lex[ lex[:,2]<0, :2 ].sum( axis = 0 )
## Estimate the probability distribution of the number of offspring
## ToDo
	return distr, (stat,dof, cdchisq(stat, dof)), hatvee

Fractional Brownian Motion

Let's have a look at the estiate of the parameter of the offspring distribution. We set up the grid with base spacing equal to $2^{-6}$, as has been done in Geoffrey's paper.

The continuous processes are confined to the $[0,1]$ interval and their smaple paths are observed at discrete knots of the uniformly spaced lattice ${\big(t_i\big)}_{i=0}^{N-1}$ with $t_0=0$ and $t_1=1$. In effect every single Monte Carlo replication deals with a sample discretized path ${\big(t_k, x_k\big)}_{k=0}^{N-1}$ where $x_k = X(t_k)$.


In [ ]:
## Set the size of the monte carlo experiment
M = 10
## Determine how many levels of the crosssing tree to keep
T = 5
## And set the truncation threshold so that subcrossings of size greater
##  than or equal to 34 are put in the tail of the offspring distribution.
K = 32
## Set the time-precision of the sampled paths
N = 2**18
## Set the studied Hurst exponent
H = .95
## The base grid spacing is going to be 2^{-6}
# delta = 2**(-6)
## Setting \delta to zero construct the tree for adaptive
##  base spacing equal to the standard deviation of the increments

In [ ]:
## Study the Fractional Brownian motion.
%lprun -f xtree_build_superfast res_fbm = monte_carlo_serial( synth_fbm( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
# res_fbm = monte_carlo_parallel( synth_fbm( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )

In [ ]:
distr  = np.concatenate( [ [d] for d, _, _ in res_fbm[0] ] )
chisq  = np.concatenate( [ [c] for _, c, _ in res_fbm[0] ] )
hatvee = np.concatenate( [ p for _, _, p in res_fbm[0] ] )

In [ ]:
distr

In [ ]:
## Import the complimentary chi squared distribution 
from scipy.stats import chisqprob as cdchisq
pv = np.array( [p for s,d,p in chisq], dtype = np.float )
print np.sum( pv < 0.05, dtype = np.float ) / len( pv )

In [ ]:
chisq

In [ ]:
plt.plot(pv)

In [ ]:
def get_probs( res ) :
## Compute the empirical distribution function of the number of offspring 
    return np.apply_along_axis( lambda f : f * 1.0 / np.sum( f ), 2, res )
## Compute the probabilities
probs = get_probs( distr )
## In homage to the original code, call the mean probabilities and errors by PrZ and StZ
PrZ = np.mean( probs, axis = 0 )
StZ =  np.std( probs, axis = 0 )
## Create an appropriate array of offspring values
offs = np.arange( 2, 2 * probs.shape[ 2 ] + 1, 2, np.float )
## Create a new plot
plt.figure( 1, figsize = ( 8, 6 ) )
theta = 2**(1.0-1.0/H)
offs_prb = (1.0-theta)**(offs//2-1)*theta
plt.plot( offs, np.log( offs_prb ), "k-", linewidth = 2, markersize = 15 )
## Plot the logarithm of the probability of observing a given number of subcrossings 
for i, c in zip( range( 0, probs.shape[ 2 ] ), ["r", "g", "b", "y"] ) :
    plt.plot( offs, np.log( PrZ[i,:] ), "o" + c )
#         plt.errorbar( offs, PrZ[i,:], yerr = StZ[i,:], color = c, linewidth = 0 )
plt.show( )

In [ ]:
# P(-+|u) = P(+-|d) = 1/sqrt(\mu)
pu = list( ); pd = list( )
for _,_,p in res_fbm[0]:
    pp = p[1,:]+p[2,:]+p[3,:]
    tt = pp.sum(axis=1)
    
print pp
print tt


In [ ]:
## Examine the properties of the Weierstrass-Mandelbrot process
# %lprun -f monte_carlo_kernel res_wei = monte_carlo_serial( synth_Weier_2( N, 1.2, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )
res_wei = monte_carlo_parallel( synth_Weier( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )

In [ ]:
## Finally have a look at the Rosenblatt process
# res_ros = monte_carlo_parallel( synth_Rosenblatt( N // (2**6), H, K = 2**6 ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )

In [ ]:
## Save the results for later reuse
import time as tm
## Make a file with a timestamped name
outfile = "./MC m%d p%d h%.3f %s.npz" % ( M, np.log2(N), H, tm.strftime("%Y%m%d-%H%M") )

## Save the copy of the monte carlo output
np.savez( outfile, fbm = res_fbm, wei = res_wei )
# np.savez( outfile, fbm = res_fbm, wei = res_wei, ros = res_ros )

In [ ]:
## Set the size of the monte carlo experiment
M = 100
## Determine how many levels of the crosssing tree to keep
T = 5
## And set the truncation threshold so that subcrossings of size greater
##  than or equal to 34 are put in the tail of the offspring distribution.
K = 32
## Set the time-precision of the sampled paths
N = 2**18
## Set the studied Hurst exponent
H = .85
## The base grid spacing is going to be 2^{-6}
# delta = 2**(-6)
## Setting \delta to zero construct the tree for adaptive
##  base spacing equal to the standard deviation of the increments

In [ ]:
## Study the Fractional Brownian motion.
res_fbm = monte_carlo_parallel( synth_fbm( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )

In [ ]:
## Examine the properties of the Weierstrass-Mandelbrot process
res_wei = monte_carlo_parallel( synth_Weier( N, H ), monte_carlo_kernel, M = M, quiet = True, K = K, T = T )

In [ ]:
## Save the results for later reuse
import time as tm
## Make a file with a timestamped name
outfile = "./MC m%d p%d h%.3f %s.npz" % ( M, np.log2(N), H, tm.strftime("%Y%m%d-%H%M") )

## Save the copy of the monte carlo output
np.savez( outfile, fbm = res_fbm, wei = res_wei )
# np.savez( outfile, fbm = res_fbm, wei = res_wei, ros = res_ros )

Analysis


In [ ]:
def get_probs( res ) :
## Compute the empirical distribution function of the number of offspring 
    return np.apply_along_axis( lambda f : f * 1.0 / np.sum( f ), 2, res )

This procedure plots the empirical probability of observing a set number of offspring at each node of thw crossing tree.


In [ ]:
def mc_plot_empirical_probs( res, H, title = None ) :
## Compute the probabilities
    probs = get_probs( res )
## In homage to the original code, call the mean probabilities and errors by PrZ and StZ
    PrZ = np.mean( probs, axis = 0 )
    StZ =  np.std( probs, axis = 0 )
## Create an appropriate array of offspring values
    offs = np.arange( 2, 2 * probs.shape[ 2 ] + 1, 2, np.float )
## Create a new plot
    plt.figure( 1, figsize = ( 8, 6 ) )
    if title is not None :
        plt.title( title )
    theta = 2**(1.0-1.0/H)
    offs_prb = (1.0-theta)**(offs//2-1)*theta
    plt.plot( offs, np.log( offs_prb ), "k-", linewidth = 2, markersize = 15 )
## Plot the logarithm of the probability of observing a given number of subcrossings 
    for i, c in zip( range( 0, probs.shape[ 2 ] ), ["r", "g", "b", "y"] ) :
        plt.plot( offs, np.log( PrZ[i,:] ), "o" + c )
#         plt.errorbar( offs, PrZ[i,:], yerr = StZ[i,:], color = c, linewidth = 0 )
    plt.show( )

Simulation results

Below interspersed with the listings of code are the results obtained so far for the corssing tree monte carlo simulation.


In [ ]:
mc_plot_empirical_probs( res_fbm, H, "fBM %0.2f" % ( H ) )

In [ ]:
import re
outfiles = [ './MC m100 p18 h0.500 20150226-1252.npz', './MC m100 p18 h0.850 20150226-1301.npz']
for outfile in outfiles :
    opt = dict( ( x.group(1).upper(), float(x.group(2)) )
        for x in re.finditer( r"\b([a-z])([-+]?(?:\d+(?:\.\d*)?|\.\d+))\b", outfile ) )
    with np.load( outfile ) as f_res :
        res_fbm, res_wei = f_res['fbm'], f_res['wei']
        # res_ros = f_res['ros']
        mc_plot_empirical_probs( res_fbm, opt['H'], "fBM %0.2f" % ( opt['H'] ) )
        mc_plot_empirical_probs( res_wei, opt['H'], "WM %0.2f" % ( opt['H'] ) )
# mc_plot_empirical_probs( res_ros, "Rosenblat %0.2f" % ( H ) )


In [ ]:
res_fbm

In [ ]:
## The MC results array is MxTxK. Fixing axes 0 and 1, compute
##  the ML estimate of the parameter over the axis 2 -- the within
##  tree-level smaple of the offspring distribution.
theta = np.apply_along_axis( mle_theta, 2, res_fbm )
# mle_theta(DD[:,0,0])

# np.save( "./run 2b22 2015-02-14", DD_par )
# XX_par = np.load( "./run 2b22 2015-02-14.npy" )

# np.max(np.abs(DD_par - XX_par))

In [ ]:
plt.plot( theta[:,0], "r-")
plt.plot( theta[:,1], "b-")
plt.plot( theta[:,2], "k-")
1/(1-np.log(np.mean(theta, axis=0)))

In [ ]:
plt.plot( 1/(1-np.log(theta[:,0])), "r-")
plt.plot( 1/(1-np.log(theta[:,1])), "b-")
plt.plot( 1/(1-np.log(theta[:,2])), "k-")

In [ ]:
def mean_subx( x ) :
    return np.dot( x, 2*np.arange( 1, len( x ) + 1, dtype = np.float ) ) / np.sum( x )

moff = np.apply_along_axis( mean_subx, 2, res_fbm )
plt.plot( moff[:,0], "k-")
plt.plot( moff[:,1], "b-")
plt.plot( moff[:,2], "r-")

In [ ]:
DD_par[:,2,:]

In [ ]:
theta2 = np.apply_along_axis(mle_theta, 2, DD_par2)

plt.plot( theta2[:,0], "r-")
plt.plot( theta2[:,1], "b-")
plt.plot( theta2[:,2], "k-")
1/(1-np.log(np.mean(theta2, axis=0)))
# DD_par2[:,:,3]

In [ ]:
def scaffolding_plot( t, x ) :
    ht, hp, xt, exc = xtree_build( t, x )
    l = len( ht ) - 5
    plt.figure( figsize = (15, 8) )
    plt.plot( t, x, "y-")
# for p in tp[l] : plt.axhline( p, linewidth = 1, color = 'k' )
    plt.step( ht[l+0], hp[l+0], "g>" ) #, where = 'mid' )
    plt.step( ht[l+1], hp[l+1], "b>" ) #, where = 'mid' )
    plt.step( ht[l+2], hp[l+2], "r>" ) #, where = 'mid' )
    plt.step( ht[l+3], hp[l+3], "k>" ) #, where = 'mid' )
    plt.show( )
#     return [zip(*np.unique( x, return_counts = True)) for x in xt ]

In [ ]:
genr = synth_Weier(N, H)
t,x = genr()
plt.plot(t,x, "y-")
scaffolding_plot(t,x)

In [ ]:
## The mean number of offspring
## The empirical distribution of the subcrossings
##
Straighforward, dumb linear regression

However, for the estimation purposes it is useful to worj with the complementary CDF, instead of the ``density'' itself: $$\mathbb{P}(N\geq k) = {(1-\theta)}^k$$ this way no distinction between $k<\bar{k}$ of $k=\bar{k}$ needs to be done.

For this purpose, let's estimate the following regression model: $$\ln \hat{p}_k \sim C + \beta k$$ where $\hat{p}_k$ is the empirical probability that a crossing had no less than $k$ subcrossings and $\beta = \ln \sqrt{(1-\theta)}$.

It is entirely posible that the distribution's shape depend on the depth. If it does, then there is no self similarity.


In [ ]:
from scipy.stats import chi2_contingency as chisq_test
def ctable( x, y ) :
## This procedure makes a contingeny table out of x and y (they should be discrete)
    lxy = np.append( x, y ).reshape( len( x ),2)
    lxy = lxy.view(dtype = np.dtype([('x', x.dtype), ('y', y.dtype)]))
    lc, lf = np.unique( lxy, return_counts = True )
    print lc
    xc = np.unique( x ) ; yc = np.unique( y )
    txy = np.zeros( ( len( xc ), len( yc ) ), dtype = np.int )
    vix = lc.view( ( np.int, len( lc.dtype ) ) )
    ix = np.searchsorted( xc, vix[:,0] )
    iy = np.searchsorted( yc, vix[:,1] )
    txy[[ix, iy]] = lf
    return txy

Appendix

Testing on the deterministic cascade


In [ ]:
# scaffolding_plot( *genr( ) )
def run() :
    monte_carlo( genr, M=100, K = 16, T = 3 )
%lprun -f xtree_integer_crossings_fast run()

In [ ]:
# plt.plot(d[0,:], "rx", d[1,:], "bx", d[2,:], "kx")
# f ~ a^(k-1)(1-a)
# l
plt.plot( range(2, 13, 2), np.log(np.mean( d, 1 )) )

In [ ]:
m = 0
for s in a[1:]:
    plt.step( s[0], np.log( s[1] * 2**m ), "b-")
    m+=1
plt.step( a[1][0], np.log( a[1][1] ), "r-")

In [ ]:
N = 2**10
H = .5

# draw(synth_fbm( N, H ))
# draw(synth_Weier( N, H ))
# draw(synth_Rosenblatt( N, H ))
# draw(synth_Hermite3( N, H ))
# draw(synth_Hermite4( N, H ))

In [ ]:
## Test the process generators
def draw( gen, seed = None ) :
	t, u = gen( seed )
	t, v = gen( seed )
	plt.plot( t, u, "r-", t, v, "b-" )
	plt.show()

Code testing


In [ ]:
genr = synth_fbm( 2**20, .5 )

In [ ]:
T, X = genr()

In [ ]:
mht, mhp, mhx, mex = xtree_build( T, X, delta = 2**(np.floor(np.log2(np.max(X)-np.min(X)))-8) )

In [ ]:
from scipy.stats import chisquare as chisq
xlevels = mhx[-6:-4+1]
xpool = np.concatenate( tuple( xlevels ) )
pc, pf = np.unique( xpool, return_counts = True )
xf = list()
for x in xlevels :
    c, f = np.unique( x, return_counts = True )
    zf = np.zeros( pf.shape, np.int )
    zf[ np.searchsorted( pc, c ) ] = f
    xf.append( zf )
[ chisq(x, pf) for x in xf ]

In [ ]:
print [len( t ) for t in mht]
print [x.sum() for x in mhx[1:]]

In [ ]:
print np.allclose( np.sum( np.abs( mex[2][:,:-1] ), axis = 1 ) * 2+2, mhx[2])
mex[5][:,:-1]

In [ ]:
from Weierstrass import *
N = 2**20
T, X = genr()
# T, X = synthweier( N, 0.7, 1.2, 1000, deterministic = True, seed = None )
# T = np.arange(N, dtype=np.float)/ (N-1)

delta = np.std( np.diff( X ), ddof = 1 )
Z = ( X - X[ 0 ] ) / delta
delta
lht, lhp, lhx, exc = xtree_integer_crossings_fast( T, Z )

In [ ]:
ht, hp, hx, exc = xtree_integer_crossings( T, Z )
print np.sum(np.abs(ht-lht))+np.sum(np.abs(hp-lhp))
# lht, lhp, lhx, exc = xtree_super_crossing( lht, lhp, 2 )

In [ ]:
print len(lht), len(lhp)
lhp[ np.append( [False], lhp[1:]==lhp[:-1]) ]
print len(T), len(Z)
np.std( np.diff( lht ), ddof = 1 )-5.3e-05
scaffolding_plot( T, X )

In [ ]:
def test(M=10) :
    genr = synth_fbm( 2**12, .5 )
    for i in xrange(M) :
        T, X = genr()
        mht, mhp, mhx = xtree_build( T, X, delta = 2**(-6) )

%lprun -f xtree_integer_crossings_fast test( 50 )

In [ ]:
genr = synth_fbm( 2**10, .5 )
for i in xrange(100) :
    T, X = genr()
    j2 = int( np.floor( np.log2( max( X ) - min( X ) ) ) )
    base = float( 2**(j2-6) )
    mht, mhp, mhx = xtree_build( T, X, delta = base )
    ght, ghp, ghx = f_get_w( T, X, delta = 1, levels = range( j2-6, j2+1 ), deleteFirst=False )
    for m,g in zip( mhp, ghp ) :
        assert( not [ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ])
    for m,g in zip( mht, ght ) :
        assert( not [ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ])
    for m,g in zip( mhx[1:], ghx ) :
        assert( not [ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ])
# print [len(x)-len(y) for x, y in zip( ghp, mhp )]
# print [len(x)-len(y) for x, y in zip( ght, mht )]
# print [len(x)-len(y) for x, y in zip( ghx, mhx[1:] )]
# print [ np.nanmax(np.abs(m/g-1)) for m,g in zip( mht, ght ) ]
# print [[ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ] for m,g in zip( mhp, ghp )]
# print [[ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ] for m,g in zip( mht, ght )]
# print [[ i for i, e in enumerate( zip( m, g ) ) if e[0]!=e[1] ] for m,g in zip( mhx[1:], ghx )]