data-structures algorithms

Definition

Hash Table

A hash table (or hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. It uses a hash function to compute an index (or hash code) into an array of buckets or slots, from which the desired value can be found.

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ some form of collision resolution.

Collision Resolution

Separate Chaining

In this implementation, each cell of the array points to a singly linked list (or other dynamic collection) of records that have the same hash function value.

Open Addressing

In open addressing, all entry records are stored in the bucket array itself. When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found.

Linear Probing

The interval between probes is fixed—often at 1.

Primary Clustering

Linear probing is subject to primary clustering, where long runs of occupied slots build up, increasing the average search time.

Quadratic Probing

The interval between probes is increased by adding the successive outputs of a quadratic polynomial to the starting value given by the original hash.

where are properly chosen constants.

Secondary Clustering

Primary clustering is mitigated, but another phenomenon, secondary clustering, could occur (where keys that hash to the same initial position follow the same probe sequence).

Double Hashing

The interval between probes is computed by a second hash function.

with .

Choice of

For all keys , the probing sequence must reach all slots . This means that must hold and it must not divide . Usually, is chosen as a prime number, and is chosen independently of .

Brent’s Improvement

Idea: Whenever a key is inserted at a probed slot with (already occupied), check if moving to its next position in its own probe sequence is more beneficial than continuing to probe for .

Specifically, set:

If is occupied but is free, then move to to free the position for .

Benefit: The average-case runtime of a successful search is in worst case .

 def insert_brent(T: HashTable, k: Key) -> None:
	j = h1(k)
	while T[j].status == "used":
		k_prime = T[j].key
		j1 = (j + h2(k)) % m
		j2 = (j + h2(k_prime)) % m
		if T[j1].status != "used" or T[j2].status == "used":
			j = j1
		else:
			T[j] = k
			k = k_prime
			j = j2
	T[j] = k
	T[j].status = "used"

Complexity

The following table outlines the time complexity (assuming a good hash function and uniform hashing):

OperationAverage CaseWorst Case
Search
Insertion
Deletion