• BTree (not included in Exam 3 during SP2022)

    • Insertion

      There are two cases. # keys < m-2 or not. If insertion causes # of keys to be > m-1, then split the child.

      1. Normal Insertion

        see root, is our number <, in the middle or > than root? recurse to that subtree.

        continue until you reach the correct group of keys

        insert element in the correct order in keys

        check to make sure # keys < m-1

        Untitled

      2. Splitting the Child

        see root, is our number <, in the middle or > than root? recurse to that subtree.

        continue until you reach the correct group of keys

        insert element in the correct order in keys

        check to make sure # keys < m-1

        if # keys $\geq$ m-1, split the child:

        take the middle elem and throw it up.

        if the top row is full:

        split top row

        etc.

        Screen Shot 2022-04-13 at 6.28.54 PM.png

    • Difference between a key and a node in a BTree

    at MOST m-1 keys on each "node" which points to at MOST M nodes/children

    • The order of a BTree

    M is the "order" of a Btree. The number of keys is M-1 and the max number of children is M.

    If a leaf has x nodes, x+1 ≤ M ≤ 2x

    • Structural properties of a BTree
      • Everything on the left subtree is less than the left key.
      • Everything on the right subtree is greater than the right key.
      • Middle tree is the data in between left and right key.

    Untitled

  • Hash Table

    • Hash functions

      Converts keys to a integer that is used as the index in a hash table

    • Properties of a good hash function

      A Perfect Hash Function is a one-to-one function, where for each data / input, the hash function produces a unique output.

    • SUHA

      Simple Uniform Hashing Assumption: if you dump in random values into the hash function, each hash should have about the same amount of elements (aka each element has continuous uniform distribution, and each element is independent)

    • Load Factor

      O(n/M) where n is the number of inserted elements and M is the size of the array. This factor tells us when we have to re-make our hash function and resize the array.

      • open hashing can have a load factor that exceeds 1 but for closed hashing, the load factor must be between {0,1}
    • Hash Table Collisions

      When items with two different keys result in the same hash (and would theoretically be inserted into the same place in the hash table)

    • Open Hashing vs. Closed Hashing

      Open Hashing means that multiple values can hash into the same key vs closed hashing only one element can hash into that key.

    • Separate Chaining (ex of Open Hashing)

      Collision management strategy where you turn each collision to a linked list pointing to the key of the hash function.

      Untitled

    • Linear Probing (ex of Closed Hashing)

      Collision management strategy where you advance to the next available slot in the hash table in the case of collision.

      Untitled

    • Double Hashing

      Collision management strategy where you use a second hash function (aka the step function) to offset the first one in case of collisions. It has to

      • never equal 0
      • cover all the elements of the array
        • array size would be prime or a relative prime to the range of the second hash function
    • Re-Hashing

      When we reach the n value in our load factor, we double the size of the array and rehash every value in the array. This helps keep the complexity down and decreases collisions.

    • Runtime of a hash table

      • due to hash table collisions, big O for most operations is O(n)
      • only separate chaining insert is O(1) iirc
      • under the SUHA assumption that resizing is amortized, avg behavior is O(1) for all operations.
    • Load factor’s impact on running times

      Higher load factors decrease the space overhead (from having to double the table when there are too many values) but increase the lookup cost.

    • Properties of a good hash table

      • length should be a prime number or odd number
      • whatever data structure you use to implement the hash table should be easily resizable
  • Heaps

    • Construction of a Heap

      • binary complete tree with the smallest element at the root

      Untitled

      • children are larger than the parent
      • a binary tree is a minheap if:
        • T ={}
        • T = ${r, T_l, T_r}$ where $T_l, T_r$ are minHeaps and r is less than the root.
      • Then you map the tree onto an array/vector with the level order traversal of the tree.
    • Heap runtime

      Insertion: O(lg n)

      BuildHeap O(n)

    • HeapifyUp

      starts from the inserted node, assumes that the heap is valid below that node. If the curr element is not the root and smaller than the parent, swap the curr element and the parent and continue heapify up on the parent

      Runtime: takes O(h) or O(log n) time since the tree is complete.

      Insertion Runtime: Since GrowArray and Insertion both take constant time, total runtime is O(log n)

      template<class T>
      void Heap<T>::_heapifyUp(unsigned idx) {
      		if (index > 1) {
      			if (item_[index] < item_[parent(index)]) {
      				std::swap(item_[index], item_[parent(index)]);
      				_heapifyUp(parent(index));
      			}
      		}
      }
      
    • HeapifyDown

      When we remove the top element from the heap, we swap it with the element that we just added. Then we have to recurse down the heap and swap the element with the larger priority with it's child to maintain the heap property.

      runtime: O(log n)

      template <class T>
      void Heap<T>::_heapifyDown(int index) {
          if (!_isLeaf(index)) {
            T minChildIndex = _mindChild(index);
            if (item_[index] < item)[minChildIndex] {
              std::swap(item_[index], item[minChildIndex]);
      				_heapifyDown(minChildIndex);
            }
          }
      }
      
      template <class T>
      void Heap<T>::_removeMin(int index) {
        //swap with the last value
        T minValue = item_[1];
        item_[1] = item_[size_];
        size--;
        //restore the heap property
        heapifyDown()
        //return the min value
        return minValue;
      }
      
    • Array Representation

      • see the construction part above
    • Priority queue ADT

      • insert
      • remove: pops out element with predetermined priority (like the largest or smallest value). you cannot tell it what to remove. (similar to stack/queue)
      • isEmpty

      stores ordered data

      must implement < operator

  • Disjoint Sets

    each element exists in exactly one set.

    ADT: make set, union and find

    • Union

      point set (or UpTree) with less nodes to the one that has more nodes.

      void DisjointSets::union(int r1, int r2) {
      		s[r2] = r1;
      }
      
    • Smart Union (Size and Height)

      Size: the metadata at the root is the height of the tree. you want to optimize for keeping the height of the tree as small as possible.

      Height: minimize the number of nodes that increase in heights. the metadata at the root is the node with the smallest subtree.

      //too lazy to write the psuedocode but it's in the lecture notes :)

    • Find

      the value of the array is -1 if we have found the representative element or the index of the parent if we haven't found the rep element. loop thru this list and return the index of the parent.

      int DisjointSets::find(int i) {
        if (s[i] < 0) {return i;}
        else {return _find(s[i]);}
      }
      
    • Path Compression

      Every time we find an element, we make the key the representative of the subset. The next time we call find, the runtime is shorter.

      Any sequence of m union and find operations results in the worst case running time ofO(mlg n ) (average cost per operation is O(log* n )), where n is the number of items in the Disjoint Sets.

    • Array Representation

      • Array Based (Cache-optimized)
        • Sorted Array
          • good Binary Search O(lg n)
          • terrible insert O(n)
        • Unsorted Array
          • good insert O(1), bad search O(n)
          • Stacks, Queues, providing limited ordering
          • Hashing, Heaps, Priority Queues, UpTrees (Disjoint Set).