For this assignment you will investigate hash function and hash chains. You may complete the assignment in any language. For a given hash function h, a hash chain is defined as follows:
$H_0=x$
$H_i=h(H_{i-1})\ for\ i\ge 1$
For the purpose of this assignment, our hash function is the last k bits of MD5. In other words, $h_k(x)$ = least significant k bits of MD5(x). Before completing the assignment, read Section 2.1.6 from the Handbook of Applied Crypto (available: http://www.cacr.math.uwaterloo.ca/hac/about/chap2.pdf).
(70 points) Write a computer program to compute the number of components, average/max tail length, min/average/max cycle length when we use $h_{16}(x)$. Your output should print these 6 numbers. Tail and cycle length are defined in HAC Definition 2.35. For the following figure, there are 2 components, 4 tails of lengths 3, 2, 1, and 1 for an average tail length of 1.75. The two cycles are length 4 and 3.
In [78]:
from hashlib import md5 #Hash source
from collections import defaultdict as dd #Dynamic Dictionary
import networkx as nx #Graph Theory
import time
from multiprocessing import Pool
DG=nx.DiGraph() #Main DiGraph
CE=[] #Cycle Elements
CL=[] #Cycle Lengths
TL=[] #Tail Lengths
#RevMap=dd()
def get_hash(name,size): #Hash function
temp=bin(int(md5(name).hexdigest(),16)%2**size)[2:]
temp='0'*(size-len(temp))+temp
return temp
def trymap(size): #Create Graph
for i in range(2**size):
name=bin(i)[2:]
temp=get_hash(name,size)
name="0"*(size-len(name))+name
DG.add_edge(int(name,2),int(temp,2))
#Return a path from start to the
#Beginning of a cycle if 0
#End of a cycle if 1
def findpath(start):
templist=[start]
temp=start
while not(DG.successors(temp)[0] in templist):
if (DG.successors(temp)[0] in CE):
return (templist,0)
temp=DG.successors(temp)[0]
templist+=[temp]
return (templist,1) #templist[-1] is the start of a cycle
def taillen(tailpath):
global CE
global CL
global TL
cycle=findpath(tailpath[-1])[0]
CE+=cycle #Add to Cycle Watchlist
CL+= [len(cycle)] #Add cycle length to
TL+= [len(tailpath)-len(cycle)]
def pathrunner(start):
global TL
temp=findpath(start)
if temp[1]:
taillen(temp[0])
else:
TL+=[len(temp[0])]
complexity=16
#pool = Pool(processes=(complexity/2))
timer=time.time()
trymap(complexity)
if complexity<=8:
nx.draw(DG)
pylab.show()
tails=[i for i in DG.pred.keys() if len(DG.pred[i])==0] #Nodes with no predeccesors
timer1=time.time()-timer
print "Tail-enumeration time: {0:.2}s".format(timer1)
#pool.map(pathrunner,tails)
map(pathrunner,tails)
#print "Cycle Elements: ",CE
print "Cycle Lengths: ",CL
#print "Tail Lengths: ",TL
print "Cycle Lengths: Max: {0}, Avg: {1:.3}, Min: {2}, Count: {3}".format(max(CL),average(CL),min(CL),len(CL))
print "Tail Lengths: Max: {0}, Avg: {1:.3}, Min: {2}, Count: {3}".format(max(TL),average(TL),min(TL),len(TL))
print "Total Components: {0}".format(DG.number_of_selfloops()+len(CL))
timer2=time.time()-timer1-timer
print "Time: {0:.6}s for {1} hashes, {2:.6} hashes per second".format(timer2,2**complexity,(2**complexity)/timer2)
#Clean memory since a lot is being stored
del DG #Main DiGraph
del CE #Cycle Elements
del CL #Cycle Lengths
del TL #Tail Lengths
The variable above labeled 'complexity' determines k in $h_k(x)$
For the purposes of chaining, keep track of an iteration variable i and add i to the input of each number before calculating the next value.
$H_i=h(H_{i-1}+i)\ for\ i\ge 1$
In [ ]: