Exam 3 Review | Notion

BTree (not included in Exam 3 during SP2022)
- Insertion
  
  There are two cases. # keys < m-2 or not. If insertion causes # of keys to be > m-1, then split the child.
  1. Normal Insertion
    
    see root, is our number <, in the middle or > than root? recurse to that subtree.
    
    continue until you reach the correct group of keys
    
    insert element in the correct order in keys
    
    check to make sure # keys < m-1
  2. Splitting the Child
    
    see root, is our number <, in the middle or > than root? recurse to that subtree.
    
    continue until you reach the correct group of keys
    
    insert element in the correct order in keys
    
    check to make sure # keys < m-1
    
    if # keys $\geq$ m-1, split the child:
    
    take the middle elem and throw it up.
    
    if the top row is full:
    
    split top row
    
    etc.
- Difference between a key and a node in a BTree
at MOST m-1 keys on each "node" which points to at MOST M nodes/children
- The order of a BTree
M is the "order" of a Btree. The number of keys is M-1 and the max number of children is M.

If a leaf has x nodes, x+1 ≤ M ≤ 2x
- Structural properties of a BTree
  - Everything on the left subtree is less than the left key.
  - Everything on the right subtree is greater than the right key.
  - Middle tree is the data in between left and right key.
Hash Table
- Hash functions
  
  Converts keys to a integer that is used as the index in a hash table
- Properties of a good hash function
  
  A Perfect Hash Function is a one-to-one function, where for each data / input, the hash function produces a unique output.
- SUHA
  
  Simple Uniform Hashing Assumption: if you dump in random values into the hash function, each hash should have about the same amount of elements (aka each element has continuous uniform distribution, and each element is independent)
- Load Factor
  
  O(n/M) where n is the number of inserted elements and M is the size of the array. This factor tells us when we have to re-make our hash function and resize the array.
  - open hashing can have a load factor that exceeds 1 but for closed hashing, the load factor must be between {0,1}
- Hash Table Collisions
  
  When items with two different keys result in the same hash (and would theoretically be inserted into the same place in the hash table)
- Open Hashing vs. Closed Hashing
  
  Open Hashing means that multiple values can hash into the same key vs closed hashing only one element can hash into that key.
- Separate Chaining (ex of Open Hashing)
  
  Collision management strategy where you turn each collision to a linked list pointing to the key of the hash function.
- Linear Probing (ex of Closed Hashing)
  
  Collision management strategy where you advance to the next available slot in the hash table in the case of collision.
- Double Hashing
  
  Collision management strategy where you use a second hash function (aka the step function) to offset the first one in case of collisions. It has to
  - never equal 0
  - cover all the elements of the array
  - - array size would be prime or a relative prime to the range of the second hash function
- Re-Hashing
  
  When we reach the n value in our load factor, we double the size of the array and rehash every value in the array. This helps keep the complexity down and decreases collisions.
- Runtime of a hash table
  - due to hash table collisions, big O for most operations is O(n)
  - only separate chaining insert is O(1) iirc
  - under the SUHA assumption that resizing is amortized, avg behavior is O(1) for all operations.
- Load factor’s impact on running times
  
  Higher load factors decrease the space overhead (from having to double the table when there are too many values) but increase the lookup cost.
- Properties of a good hash table
  - length should be a prime number or odd number
  - whatever data structure you use to implement the hash table should be easily resizable
Heaps
- Construction of a Heap
  - binary complete tree with the smallest element at the root
  - children are larger than the parent
  - a binary tree is a minheap if:
    - T ={}
    - T = ${r, T_l, T_r}$ where $T_l, T_r$ are minHeaps and r is less than the root.
  - Then you map the tree onto an array/vector with the level order traversal of the tree.
- Heap runtime
  
  Insertion: O(lg n)
  
  BuildHeap O(n)
- HeapifyUp
  
  starts from the inserted node, assumes that the heap is valid below that node. If the curr element is not the root and smaller than the parent, swap the curr element and the parent and continue heapify up on the parent
  
  Runtime: takes O(h) or O(log n) time since the tree is complete.
  
  Insertion Runtime: Since GrowArray and Insertion both take constant time, total runtime is O(log n)
```
template<class T>
void Heap<T>::_heapifyUp(unsigned idx) {
		if (index > 1) {
			if (item_[index] < item_[parent(index)]) {
				std::swap(item_[index], item_[parent(index)]);
				_heapifyUp(parent(index));
			}
		}
}
```
- HeapifyDown
  
  When we remove the top element from the heap, we swap it with the element that we just added. Then we have to recurse down the heap and swap the element with the larger priority with it's child to maintain the heap property.
  
  runtime: O(log n)
```
template <class T>
void Heap<T>::_heapifyDown(int index) {
    if (!_isLeaf(index)) {
      T minChildIndex = _mindChild(index);
      if (item_[index] < item)[minChildIndex] {
        std::swap(item_[index], item[minChildIndex]);
				_heapifyDown(minChildIndex);
      }
    }
}

template <class T>
void Heap<T>::_removeMin(int index) {
  //swap with the last value
  T minValue = item_[1];
  item_[1] = item_[size_];
  size--;
  //restore the heap property
  heapifyDown()
  //return the min value
  return minValue;
}
```
- Array Representation
  - see the construction part above
- Priority queue ADT
  - insert
  - remove: pops out element with predetermined priority (like the largest or smallest value). you cannot tell it what to remove. (similar to stack/queue)
  - isEmpty
  stores ordered data
  
  must implement < operator
Disjoint Sets

each element exists in exactly one set.

ADT: make set, union and find
- Union
  
  point set (or UpTree) with less nodes to the one that has more nodes.
```
void DisjointSets::union(int r1, int r2) {
		s[r2] = r1;
}
```
- Smart Union (Size and Height)
  
  Size: the metadata at the root is the height of the tree. you want to optimize for keeping the height of the tree as small as possible.
  
  Height: minimize the number of nodes that increase in heights. the metadata at the root is the node with the smallest subtree.
  
  //too lazy to write the psuedocode but it's in the lecture notes :)
- Find
  
  the value of the array is -1 if we have found the representative element or the index of the parent if we haven't found the rep element. loop thru this list and return the index of the parent.
```
int DisjointSets::find(int i) {
  if (s[i] < 0) {return i;}
  else {return _find(s[i]);}
}
```
- Path Compression
  
  Every time we find an element, we make the key the representative of the subset. The next time we call find, the runtime is shorter.
  
  Any sequence of m union and find operations results in the worst case running time ofO(mlg n ) (average cost per operation is O(log* n )), where n is the number of items in the Disjoint Sets.
- Array Representation
  - Array Based (Cache-optimized)
    - Sorted Array
      - good Binary Search O(lg n)
      - terrible insert O(n)
    - Unsorted Array
      - good insert O(1), bad search O(n)
      - Stacks, Queues, providing limited ordering
      - Hashing, Heaps, Priority Queues, UpTrees (Disjoint Set).