Hash Tables

Hashing Idea

Uniformly map the keys

TableSize is prefered to be a prime number. (10007 is a prime number).

  • Hashing text keys:

$37^2 x_2^2 + 37 x_1 + x_0$ can be written as $37(37 x_2 + x_1) + x_0$

unsigned int hash(const String & key, int tablesize) {
    unsigned int hashval = 0;

    for (ch : key) 
        hashval = hashval * 37 + ch;

    return hashval % tablesize;
}

Collision

  • Placing more than one item into the same array location
  • Example: $h(4567) = h(7597) = 22$
    • both search keys 4567 and 7597 are placed into index 22

Resolving Collisions

Open-addressing

When inserting a new element reaches a filled array location:

  • Probe for another array location
  • Removal requires knowing the state of an address:
    • occupied, emptied, removed

Open addressing can be done :

  • linearly
  • quadratically
  • double hashing

Problem: It can create clustering. Double hashing can reduce clustering.

Linear Probing

Operations

  • Add
  • Remove

Quadratic probing

Different Types of Clustering:

  • Primary clustering: Groups of consecutively occupied locations Clusters can merge together and become larger Caused by linear-probing
  • Secondary clustering

In [ ]: