Example for finding best patter similarities by using Distance Time Warping


In [5]:
import os.path
import numpy as np
import blaze
from blaze.ts.ucr_dtw import ucr

Some conversion code from text files into native Blaze format


In [6]:
# Convert txt file into Blaze native format (if it is not yet)
def convert(filetxt, storage):
    if not os.path.exists(storage):
        blaze.Array(np.loadtxt(filetxt),
                    params=blaze.params(storage=storage))

In [7]:
# Make sure that data is converted into a persistent Blaze array
convert("Data.txt", "Data")
convert("Query.txt", "Query")
convert("Query2.txt", "Query2")

Open Blaze array on-disk


In [8]:
# Open Blaze arrays on-disk (will not be loaded in memory)
data = blaze.open("Data")
query = blaze.open("Query")
query2 = blaze.open("Query2")

Find the best similarities using DTW or ED (Euclidena Distance) algorithms


In [9]:
# Play with different methods & parameters here...
#%time loc, dist = ucr.ed(data, query, 128)
%time loc, dist = ucr.dtw(data, query, 0.1, 128, verbose=False)
#%time loc, dist = ucr.dtw(data, query2, 0.1, 128)


CPU times: user 0.56 s, sys: 0.01 s, total: 0.57 s
Wall time: 0.57 s

Notice that times here can be up to 4x than the original code based on text files. Blaze format is fast to read!


In [10]:
print "Location : ", loc
print "Distance : ", dist
print "Data Scanned : ", data.size


Location :  756562
Distance :  3.20559486181
Data Scanned :  1000000

Check that patterns are really similar


In [12]:
from pylab import plot
plot(data[loc:loc+128])


Out[12]:
[<matplotlib.lines.Line2D at 0x1083b7650>]

In [ ]:
plot(query[:])

In [ ]: