In [1]:
from __future__ import division, absolute_import
%matplotlib inline
import numpy as np
from numpy import log
import matplotlib.pyplot as plt
from IPython.html.widgets import *
from wtmm import wtmm, perform_cwt, skeletor

Wavelet Transform Modulus Maxima

Tail Trim Update

Goal

Given the below sample ridgeline, find the first discontinuity. This sample can be found in wtmm/tails_trim.py and in cell #8 of this page.

Data

The sample ridgeline data has been corrected to make it easier to think about. Usually is comes as a list of (col, row) tuples that are found when walking the matrix. This is function (1-1) of the y-axis which I personally find as an annoying way to thing about functions.

The sample set is the list of zip(row, col[::-1]) making thinking about the ridge easier as it becomes function of the x-axis.

Description

These ridges are created during the WTMM process and the first discontinuity cooresponds to the point at with the tail seperates from the body of the ridge. The tail is not useful data to us, whereas the body is.

The rate of the change in slope (2nd derivative) of the ridge line should temporarily leap to a high value relative to its neighbors. The one caveat is that there can be some local jumpyness from point to point so we'll need compare a local peak to its neighbors' average values. This, combinded with some epsilon we use to determine a discontinuity has occured (probably 3x the last measurement) yield to us the measurement we want.

If we can capture this effect, we're golden. Also, as we can be certian that this event only happens once per line and that it occurs usually very near the start of the line, we need only find the first one.

I've done some research below to verify that this is a valid approach to the problem.

Problem Definition

Using the below ridgeline data, find the point where the body begins. Visually, this point is known to be x = 42.

Let's generate some test data!

Decomposing signals created from financial cartoons


In [2]:
def forgery(Iterations=0, Multifractal=1):
    if Multifractal:
        turns=((0.25,0.5),(0.75, 0.25))
    else:
        turns=((0.4,0.6),(0.6, 0.4))
    first_turn, second_turn = turns
    ys = [0,1]
    ts = [0,1]
    for i in range(0, Iterations + 1):
        
        j=0
        while ts[j] < 1:
            dt = ts[j+1] - ts[j] 
            dy = ys[j+1] - ys[j]
            
            ts.insert(j+1, ts[j] + first_turn[0]*dt)
            ts.insert(j+2, ts[j] + second_turn[0]*dt)
            ys.insert(j+1, ys[j] + first_turn[1]*dy)
            ys.insert(j+2, ys[j] + second_turn[1]*dy)
            
            j += 3
    
    fig, ax = plt.subplots(figsize=(8, 8), subplot_kw={'axisbg':'#EEEEEE', 'axisbelow':True})
    ax.grid(color='w', linewidth=2, linestyle='solid')
    ax.plot(ts, ys, color='b', alpha=0.4)
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    
    return fig

def forgery_prices(Iterations=10, Multifractal=1):
    if Multifractal:
        turns=((0.25,0.5),(0.75, 0.25))
    else:
        turns=((0.4,0.6),(0.6, 0.4))
    first_turn, second_turn = turns
    ys = [0,1]
    ts = [0,1]
    for i in range(0, Iterations + 1):
        
        j=0
        while ts[j] < 1:
            dt = ts[j+1] - ts[j] 
            dy = ys[j+1] - ys[j]
            
            ts.insert(j+1, ts[j] + first_turn[0]*dt)
            ts.insert(j+2, ts[j] + second_turn[0]*dt)
            ys.insert(j+1, ys[j] + first_turn[1]*dy)
            ys.insert(j+2, ys[j] + second_turn[1]*dy)
            
            j += 3
    
    return np.array(ts), np.array(ys)

cartoon_t, cartoon_s = forgery_prices(5, True)
print("cartoon's signal is {} in length".format(len(cartoon_s)))
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw={'axisbg':'#EEEEEE', 'axisbelow':True})
ax.grid(color='w', linewidth=2, linestyle='solid')
ax.plot(cartoon_t, cartoon_s, color='b', alpha=0.4)
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
plt.show()


cartoon's signal is 730 in length

Interactive Example


In [3]:
interact(forgery, Multifractal=CheckboxWidget(value=True), Iterations=(0, 9, 1))


Out[3]:
<function __main__.forgery>

Run the Direct CWT

This process creates the beautiful and distinctive 'coronae'/arches. Each arch corresponds to a pair of points are relative maxima (in an absolute sense) regarding the local Holder exponent. The local Holder is a "roughness"/inverse-of-similarity measure.

My interpretation is that these points also denote a bifurcation in the signal itself, signalling a regime change. They denote points where to either side lay a region of higher similarity. Simply put, they denote the most dissimilar point(s) in a given region of space.

Instead of clustering by finding the most likely sets of points, WTMM clusters by finding the regions that would bound these sets. For example take:

[1,2,3,5,1,2,3,8,1,2,3,2,3]

If you were to cluster on this list, you would expect the clusters to look like:

[1,2,3] [5] [1,2,3] [8] [1,2,3,2,3]

Current 1D clustering algorithms attempt to find the sets of like points. However, I posit that the same result can be found via finding the most dissimilar set


In [4]:
%%time
sig = cartoon_s
sig_t = cartoon_t

w_coefs = perform_cwt(sig, width_step=0.25, max_scale=100)


CPU times: user 8.97 s, sys: 214 ms, total: 9.18 s
Wall time: 9.12 s

Some plots of the w_coef matrix


In [5]:
plt.figure(figsize=(14,10))
plt.pcolormesh(w_coefs, cmap='Greys')
plt.colorbar()
ax = plt.gca()
ax.set_ylim(ax.get_ylim()[::-1])
ax.xaxis.tick_top()
plt.show()

plt.figure(figsize=(14,10))
plt.pcolormesh(w_coefs[:100,:100], cmap='Greys')
plt.colorbar()
ax = plt.gca()
ax.set_ylim(ax.get_ylim()[::-1])
ax.xaxis.tick_top()
plt.show()


Create the bifucration skeleton


In [6]:
bifucations = skeletor(w_coefs, smallest_scale=1)

plt.figure(figsize=(14,10))
for n, (k, v) in enumerate(bifucations.iteritems()):
    rows, cols = zip(*v)
    plt.plot(cols, rows)
ax = plt.gca()
ax.set_ylim(ax.get_ylim()[::-1])
ax.xaxis.tick_top()
plt.show()


Pick out a good sample ridgeline to protoype on

And store the ridgeline so that we can tinker with it later


In [7]:
which_ridgeline = 4

for i, key in enumerate(bifucations.iterkeys()):
    if i == which_ridgeline:
        ridgeline = bifucations[key]
        rows, cols = zip(*ridgeline)
        ridgeline = zip(rows, cols[::-1])[::-1]
        break
print(ridgeline)


[(0, 106), (1, 104), (2, 103), (3, 101), (4, 100), (5, 99), (6, 98), (7, 97), (8, 96), (9, 96), (10, 95), (11, 94), (12, 94), (13, 93), (14, 93), (15, 92), (16, 92), (18, 91), (19, 91), (20, 90), (22, 90), (23, 89), (24, 89), (25, 89), (26, 88), (27, 88), (29, 88), (30, 87), (31, 87), (32, 87), (33, 87), (34, 86), (35, 86), (36, 86), (37, 86), (38, 85), (39, 85), (40, 85), (41, 85), (42, 85), (43, 90), (44, 92), (45, 94), (46, 96), (47, 97), (48, 98), (49, 99), (50, 100), (51, 101), (52, 102), (53, 102), (54, 103), (55, 103), (56, 105), (57, 105), (58, 105), (59, 106), (60, 107), (61, 107), (62, 107), (63, 108), (64, 109), (65, 109), (66, 109), (67, 109), (68, 110), (69, 110), (70, 111), (71, 111), (72, 112), (73, 112), (74, 112), (75, 112), (76, 113), (77, 113), (78, 113), (79, 113), (81, 114), (82, 114), (83, 114), (84, 114), (86, 115), (87, 115), (88, 115), (89, 115), (90, 116), (91, 116), (92, 116), (93, 116), (94, 117), (95, 117), (96, 117), (97, 117), (98, 118), (99, 118), (100, 118), (101, 118), (102, 119), (103, 119), (104, 118), (105, 118), (106, 119), (107, 119), (108, 119), (109, 119), (110, 120), (111, 120), (112, 120), (113, 120), (114, 121), (115, 121), (116, 121), (117, 121), (118, 121), (119, 121), (120, 121), (121, 122), (122, 122), (123, 122), (124, 122), (125, 122), (126, 122), (127, 122), (128, 122), (129, 123), (130, 123), (131, 123), (133, 123), (134, 123), (135, 123), (137, 123), (138, 124), (139, 124), (140, 124), (141, 124), (142, 124), (143, 124), (144, 124), (146, 125), (147, 125), (148, 125), (149, 125), (150, 125), (151, 125), (152, 125), (153, 125), (154, 126), (155, 126), (156, 126), (157, 126), (158, 126), (159, 126), (160, 126), (161, 127), (162, 127), (163, 126), (164, 126), (165, 127), (166, 127), (167, 127), (168, 127), (169, 127), (170, 127), (171, 127), (172, 127), (173, 128), (174, 128), (175, 127), (176, 127), (177, 128), (178, 128), (179, 128), (180, 128), (181, 128), (182, 128), (183, 128), (184, 128), (186, 129), (187, 128), (188, 128), (189, 128), (190, 129), (191, 129), (192, 128), (193, 128), (194, 129), (195, 129), (196, 129), (197, 129), (198, 129), (199, 129), (201, 129), (202, 129), (203, 129), (204, 129), (205, 129), (206, 129), (207, 129), (209, 129), (210, 129), (211, 129), (212, 129), (213, 130), (214, 129), (215, 129), (216, 129), (217, 130), (218, 129), (219, 129), (220, 129), (221, 130), (222, 129), (223, 129), (224, 129), (226, 130), (227, 129), (228, 129), (229, 129), (230, 130), (231, 129), (232, 129), (233, 129), (234, 130), (235, 129), (236, 129), (237, 129), (238, 130), (239, 129), (240, 129), (241, 129), (242, 130), (243, 129), (244, 129), (245, 129), (246, 130), (247, 129), (248, 129), (249, 129), (250, 130), (251, 130), (252, 129), (253, 129), (254, 130), (255, 130), (256, 130), (257, 130), (258, 130), (259, 130), (260, 130), (261, 130), (262, 130), (263, 130), (264, 131), (265, 131), (266, 131), (267, 131), (268, 133), (269, 133), (270, 133), (271, 133), (272, 135), (273, 135), (274, 136), (275, 136), (276, 138), (277, 138), (278, 138), (279, 139), (280, 140), (281, 140), (282, 140), (283, 141), (284, 142), (285, 142), (286, 142), (287, 142), (288, 143), (289, 143), (290, 144), (291, 144), (292, 145), (293, 145), (294, 145), (295, 145), (296, 146), (297, 146), (298, 145), (299, 145), (300, 146), (301, 146), (302, 146), (303, 146), (304, 147), (305, 147), (306, 147), (307, 147), (308, 147), (309, 147), (310, 147), (311, 147), (312, 147), (313, 147), (314, 145), (315, 145), (316, 144), (317, 143), (318, 143), (319, 142), (320, 142), (321, 142), (322, 142), (323, 141), (324, 141), (325, 142), (326, 142), (327, 142), (328, 142), (329, 142), (330, 142), (331, 142), (332, 142), (333, 143), (334, 143), (335, 143), (336, 143), (337, 144)]

In [8]:
# Store the ridgeline for future use

ridgeline = [(0, 106), (1, 104), (2, 103), (3, 101), (4, 100), (5, 99), (6, 98), (7, 97), (8, 96), (9, 96), (10, 95), (11, 94), (12, 94), (13, 93), (14, 93), (15, 92), (16, 92), (18, 91), (19, 91), (20, 90), (22, 90), (23, 89), (24, 89), (25, 89), (26, 88), (27, 88), (29, 88), (30, 87), (31, 87), (32, 87), (33, 87), (34, 86), (35, 86), (36, 86), (37, 86), (38, 85), (39, 85), (40, 85), (41, 85), (42, 85), (43, 90), (44, 92), (45, 94), (46, 96), (47, 97), (48, 98), (49, 99), (50, 100), (51, 101), (52, 102), (53, 102), (54, 103), (55, 103), (56, 105), (57, 105), (58, 105), (59, 106), (60, 107), (61, 107), (62, 107), (63, 108), (64, 109), (65, 109), (66, 109), (67, 109), (68, 110), (69, 110), (70, 111), (71, 111), (72, 112), (73, 112), (74, 112), (75, 112), (76, 113), (77, 113), (78, 113), (79, 113), (81, 114), (82, 114), (83, 114), (84, 114), (86, 115), (87, 115), (88, 115), (89, 115), (90, 116), (91, 116), (92, 116), (93, 116), (94, 117), (95, 117), (96, 117), (97, 117), (98, 118), (99, 118), (100, 118), (101, 118), (102, 119), (103, 119), (104, 118), (105, 118), (106, 119), (107, 119), (108, 119), (109, 119), (110, 120), (111, 120), (112, 120), (113, 120), (114, 121), (115, 121), (116, 121), (117, 121), (118, 121), (119, 121), (120, 121), (121, 122), (122, 122), (123, 122), (124, 122), (125, 122), (126, 122), (127, 122), (128, 122), (129, 123), (130, 123), (131, 123), (133, 123), (134, 123), (135, 123), (137, 123), (138, 124), (139, 124), (140, 124), (141, 124), (142, 124), (143, 124), (144, 124), (146, 125), (147, 125), (148, 125), (149, 125), (150, 125), (151, 125), (152, 125), (153, 125), (154, 126), (155, 126), (156, 126), (157, 126), (158, 126), (159, 126), (160, 126), (161, 127), (162, 127), (163, 126), (164, 126), (165, 127), (166, 127), (167, 127), (168, 127), (169, 127), (170, 127), (171, 127), (172, 127), (173, 128), (174, 128), (175, 127), (176, 127), (177, 128), (178, 128), (179, 128), (180, 128), (181, 128), (182, 128), (183, 128), (184, 128), (186, 129), (187, 128), (188, 128), (189, 128), (190, 129), (191, 129), (192, 128), (193, 128), (194, 129), (195, 129), (196, 129), (197, 129), (198, 129), (199, 129), (201, 129), (202, 129), (203, 129), (204, 129), (205, 129), (206, 129), (207, 129), (209, 129), (210, 129), (211, 129), (212, 129), (213, 130), (214, 129), (215, 129), (216, 129), (217, 130), (218, 129), (219, 129), (220, 129), (221, 130), (222, 129), (223, 129), (224, 129), (226, 130), (227, 129), (228, 129), (229, 129), (230, 130), (231, 129), (232, 129), (233, 129), (234, 130), (235, 129), (236, 129), (237, 129), (238, 130), (239, 129), (240, 129), (241, 129), (242, 130), (243, 129), (244, 129), (245, 129), (246, 130), (247, 129), (248, 129), (249, 129), (250, 130), (251, 130), (252, 129), (253, 129), (254, 130), (255, 130), (256, 130), (257, 130), (258, 130), (259, 130), (260, 130), (261, 130), (262, 130), (263, 130), (264, 131), (265, 131), (266, 131), (267, 131), (268, 133), (269, 133), (270, 133), (271, 133), (272, 135), (273, 135), (274, 136), (275, 136), (276, 138), (277, 138), (278, 138), (279, 139), (280, 140), (281, 140), (282, 140), (283, 141), (284, 142), (285, 142), (286, 142), (287, 142), (288, 143), (289, 143), (290, 144), (291, 144), (292, 145), (293, 145), (294, 145), (295, 145), (296, 146), (297, 146), (298, 145), (299, 145), (300, 146), (301, 146), (302, 146), (303, 146), (304, 147), (305, 147), (306, 147), (307, 147), (308, 147), (309, 147), (310, 147), (311, 147), (312, 147), (313, 147), (314, 145), (315, 145), (316, 144), (317, 143), (318, 143), (319, 142), (320, 142), (321, 142), (322, 142), (323, 141), (324, 141), (325, 142), (326, 142), (327, 142), (328, 142), (329, 142), (330, 142), (331, 142), (332, 142), (333, 143), (334, 143), (335, 143), (336, 143), (337, 144)]

Plot the ridgeline so we can see what we're working with


In [9]:
plt.figure(figsize=(14,10))
xs, ys = zip(*ridgeline)
plt.plot(xs, ys)
ax = plt.gca()
ax.set_ylim(ax.get_ylim())
ax.xaxis.tick_top()
plt.show()


Let's do some research

As a sanity check to see what the first and second derivative graphs look like

First Derivative Graph


In [10]:
def get_derivative_points(xy0, xy1):
    x0, y0 = xy0
    x1, y1 = xy1
    return (x0, (y0 - y1)/(x0 - x1))

xys = ridgeline
firsts = [get_derivative_points(xys[i], xys[i+1]) for i in range(0, len(xys) - 1)]

plt.figure(figsize=(14,10))
xs, ys = zip(*firsts)
plt.plot(xs, ys)
ax = plt.gca()
ax.set_ylim(ax.get_ylim())
ax.xaxis.tick_top()
plt.show()


Second Derivative Graph


In [11]:
seconds = []

for i in range(0, len(firsts) - 1):
    x = firsts[i][0]
    y = (firsts[i][1] - firsts[i+1][1])/(firsts[i][0] - firsts[i+1][0])
    seconds.append((x, y)) 

plt.figure(figsize=(14,10))
xs, ys = zip(*seconds)
plt.plot(xs, ys)
ax = plt.gca()
ax.set_ylim(ax.get_ylim())
ax.xaxis.tick_top()
plt.show()


Final sanity check (part 1)

Seems to be working, their is a distinct peak in the graph. When we print the data behind the graph we can visually find the x-coord cooresponding to the peak. In this case, the tail and body meet at x = 41.


In [12]:
seconds[30:50]


Out[12]:
[(33, 1.0),
 (34, -0.0),
 (35, -0.0),
 (36, -1.0),
 (37, 1.0),
 (38, -0.0),
 (39, -0.0),
 (40, -0.0),
 (41, 5.0),
 (42, -3.0),
 (43, -0.0),
 (44, -0.0),
 (45, -1.0),
 (46, -0.0),
 (47, -0.0),
 (48, -0.0),
 (49, -0.0),
 (50, -0.0),
 (51, -1.0),
 (52, 1.0)]

Final sanity check (part 2)

Just to make sure we have the tracking right, plot the graph after striping out any points where x <= 41


In [13]:
plt.figure(figsize=(14,10))
xs, ys = zip(*[i for i in ridgeline if i[0] > 41])
plt.plot(xs, ys)
ax = plt.gca()
ax.set_ylim(ax.get_ylim())
ax.xaxis.tick_top()
plt.show()