hashing vs array_equal

Some quickie test to see how the perfomance of hashing compares to np_array_equal

The idea here is that we may want to use/store a hash of the mesh so that we can quickly compare two mesh's to see if they are equal.


In [1]:
import numpy as np
import hashlib

In [2]:
# create some test data:

arr1 = np.random.random_sample((3,10000))
arr2 = np.random.random_sample((3,10000))

In [3]:
% timeit np.array_equal(arr1, arr2)


10000 loops, best of 3: 28.6 µs per loop

In [4]:
# do a hash version:
def hash_compare(a1, a2, hash=hashlib.sha1):
    h1 = hash(a1.tobytes()).digest()
    h2 = hash(a2.tobytes()).digest()
    return h1 == h2
    

print hash_compare(arr1, arr2)
print hash_compare(arr1, arr1.copy())


False
True

In [5]:
% timeit np.array_equal(arr1, arr2)


10000 loops, best of 3: 28.7 µs per loop

In [6]:
% timeit hash_compare(arr1, arr2)


1000 loops, best of 3: 700 µs per loop

In [7]:
% timeit hash_compare(arr1, arr2, hashlib.md5)


1000 loops, best of 3: 744 µs per loop

In [8]:
% timeit hash_compare(arr1, arr2, hashlib.sha256)


1000 loops, best of 3: 1.85 ms per loop

In [9]:
def hash_compare_builtin(a1, a2):
    h1 = hash(a1.tobytes())
    h2 = hash(a2.tobytes())
    return h1 == h2

In [10]:
% timeit hash_compare_builtin(arr1, arr2)


1000 loops, best of 3: 585 µs per loop

In [11]:
585 / 28.7


Out[11]:
20.38327526132404

In [ ]: