Example for finding best patter similarities by using Distance Time Warping



In [5]:

    
import os.path
import numpy as np
import blaze
from blaze.ts.ucr_dtw import ucr

Some conversion code from text files into native Blaze format



In [6]:

    
# Convert txt file into Blaze native format (if it is not yet)
def convert(filetxt, storage):
    if not os.path.exists(storage):
        blaze.Array(np.loadtxt(filetxt),
                    params=blaze.params(storage=storage))



In [7]:

    
# Make sure that data is converted into a persistent Blaze array
convert("Data.txt", "Data")
convert("Query.txt", "Query")
convert("Query2.txt", "Query2")

Open Blaze array on-disk



In [8]:

    
# Open Blaze arrays on-disk (will not be loaded in memory)
data = blaze.open("Data")
query = blaze.open("Query")
query2 = blaze.open("Query2")

Find the best similarities using DTW or ED (Euclidena Distance) algorithms



In [9]:

    
# Play with different methods & parameters here...
#%time loc, dist = ucr.ed(data, query, 128)
%time loc, dist = ucr.dtw(data, query, 0.1, 128, verbose=False)
#%time loc, dist = ucr.dtw(data, query2, 0.1, 128)









    



CPU times: user 0.56 s, sys: 0.01 s, total: 0.57 s
Wall time: 0.57 s

Notice that times here can be up to 4x than the original code based on text files. Blaze format is fast to read!



In [10]:

    
print "Location : ", loc
print "Distance : ", dist
print "Data Scanned : ", data.size









    



Location :  756562
Distance :  3.20559486181
Data Scanned :  1000000

Check that patterns are really similar



In [12]:

    
from pylab import plot
plot(data[loc:loc+128])









    Out[12]:





[<matplotlib.lines.Line2D at 0x1083b7650>]



In [ ]:

    
plot(query[:])



In [ ]: