Hash Map/Set is used for fast key-value pair storage and lookup. They are commonly implemented using arrays, combined with hashing.

Set Data Structures Comparison

| | Hash Set/Map | Trie | Sorted Set/Map: Binary Search Tree (BST) (If duplicates are not allowed) | Sorted Set/Map: Sorted Array (If duplicates are not allowed) | | --- | --- | --- | --- | --- | | In-Order Traversal | Not supported | Not supported | O(n) | O(n) | | Search | O(1) | O(m)

*m is the word length* | O(log n ~ n) | O(log n) (binary search) | | Insertion | O(1) | O(m)

*m is the word length* | O(log n ~ n) | O(n) (binary search + shifting) | | Deletion | O(1) | O(m)

*m is the word length* | O(log n ~ n) | O(n) (binary search + shifting) | | Note | | 1. For large dataset, it is space efficient due to sharing memory. 2. It supports prefix search. | | |

When to Use

The O(1) look up time is useful for:

  1. Hash Map saves references of Doubly Linked Lists (with unique values) to achieve random access and streamline operations.

The unique key is useful for:

  1. Collecting unique values (Hash Set) → 128. Longest Consecutive Sequence (Use Map Traversal trying to meet next value in the sequence), Sliding Windows Questions (If duplications are in the window)
  2. Finding other half of target value (Hash Map) → 1. Two Sum, 15. 3Sum (Find another half to reach the target)
  3. Counting value frequency (Hash Map) → 242.Valid Anagram, 76.Minimum Window Substring (Collect characters’ count of both strings and compare), 347.Top k Frequent Elements (Count unique values and assign to array to heapify or to bucket sort), Sliding Windows Questions (Count duplications in the window)
  4. Mixing 2. and 3. (Hash Map) → 560. Subarray Sum Equals K (Prefix) (To see how many subarrays in current subarray can be chopped off to reach the target.)

<aside> 💡

We can use additional n length of array as Map, Set, or Bucket, using its index as key, if the keys are integer value in the range of 0 to n-1(or to n).

<aside> 💡

When we are collection a combination of characters of a string, we can not use its combined Unicode points as a key. Because different group of characters may have the same total Unicode points. Instead, we use string itself or [26]int (an array counts a strings characters, allowing anagrams to have the same array) as key.

</aside>

Hashing

hash function is responsible for converting a key (such as a string) into an integer, which is used as an index to store the key-value pair in the underlying array. This process is sometimes referred to as pre-hashing. The same key will always produce the same integer.

  1. Convert the key into its ASCII codes.
  2. Sum the ASCII values.
  3. Apply the modulo operator (%) to map the sum to a valid index within the array, ensuring that no matter how large the hashValue is, the result will always be a valid index.
func hash(key string, capacity int) int {
	hashValue := 0
	for _, char := range key {
		hashValue += int(char)
	}
	return hashValue % capacity
}

Go's hash/fnv package offers the FNV-1a hash algorithm