BTree (not included in Exam 3 during SP2022)
Insertion
There are two cases. # keys < m-2 or not. If insertion causes # of keys to be > m-1, then split the child.
Normal Insertion
see root, is our number <, in the middle or > than root? recurse to that subtree.
continue until you reach the correct group of keys
insert element in the correct order in keys
check to make sure # keys < m-1
Splitting the Child
see root, is our number <, in the middle or > than root? recurse to that subtree.
continue until you reach the correct group of keys
insert element in the correct order in keys
check to make sure # keys < m-1
if # keys $\geq$ m-1, split the child:
take the middle elem and throw it up.
if the top row is full:
split top row
etc.
Difference between a key and a node in a BTree
at MOST m-1 keys on each "node" which points to at MOST M nodes/children
M is the "order" of a Btree. The number of keys is M-1 and the max number of children is M.
If a leaf has x nodes, x+1 ≤ M ≤ 2x
Hash Table
Hash functions
Converts keys to a integer that is used as the index in a hash table
Properties of a good hash function
A Perfect Hash Function is a one-to-one function, where for each data / input, the hash function produces a unique output.
SUHA
Simple Uniform Hashing Assumption: if you dump in random values into the hash function, each hash should have about the same amount of elements (aka each element has continuous uniform distribution, and each element is independent)
Load Factor
O(n/M) where n is the number of inserted elements and M is the size of the array. This factor tells us when we have to re-make our hash function and resize the array.
Hash Table Collisions
When items with two different keys result in the same hash (and would theoretically be inserted into the same place in the hash table)
Open Hashing vs. Closed Hashing
Open Hashing means that multiple values can hash into the same key vs closed hashing only one element can hash into that key.
Separate Chaining (ex of Open Hashing)
Collision management strategy where you turn each collision to a linked list pointing to the key of the hash function.
Linear Probing (ex of Closed Hashing)
Collision management strategy where you advance to the next available slot in the hash table in the case of collision.
Double Hashing
Collision management strategy where you use a second hash function (aka the step function) to offset the first one in case of collisions. It has to
Re-Hashing
When we reach the n value in our load factor, we double the size of the array and rehash every value in the array. This helps keep the complexity down and decreases collisions.
Runtime of a hash table
Load factor’s impact on running times
Higher load factors decrease the space overhead (from having to double the table when there are too many values) but increase the lookup cost.
Properties of a good hash table
Heaps
Construction of a Heap
Heap runtime
Insertion: O(lg n)
BuildHeap O(n)
HeapifyUp
starts from the inserted node, assumes that the heap is valid below that node. If the curr element is not the root and smaller than the parent, swap the curr element and the parent and continue heapify up on the parent
Runtime: takes O(h) or O(log n) time since the tree is complete.
Insertion Runtime: Since GrowArray and Insertion both take constant time, total runtime is O(log n)
template<class T>
void Heap<T>::_heapifyUp(unsigned idx) {
if (index > 1) {
if (item_[index] < item_[parent(index)]) {
std::swap(item_[index], item_[parent(index)]);
_heapifyUp(parent(index));
}
}
}
HeapifyDown
When we remove the top element from the heap, we swap it with the element that we just added. Then we have to recurse down the heap and swap the element with the larger priority with it's child to maintain the heap property.
runtime: O(log n)
template <class T>
void Heap<T>::_heapifyDown(int index) {
if (!_isLeaf(index)) {
T minChildIndex = _mindChild(index);
if (item_[index] < item)[minChildIndex] {
std::swap(item_[index], item[minChildIndex]);
_heapifyDown(minChildIndex);
}
}
}
template <class T>
void Heap<T>::_removeMin(int index) {
//swap with the last value
T minValue = item_[1];
item_[1] = item_[size_];
size--;
//restore the heap property
heapifyDown()
//return the min value
return minValue;
}
Array Representation
Priority queue ADT
stores ordered data
must implement < operator
Disjoint Sets
each element exists in exactly one set.
ADT: make set, union and find
Union
point set (or UpTree) with less nodes to the one that has more nodes.
void DisjointSets::union(int r1, int r2) {
s[r2] = r1;
}
Smart Union (Size and Height)
Size: the metadata at the root is the height of the tree. you want to optimize for keeping the height of the tree as small as possible.
Height: minimize the number of nodes that increase in heights. the metadata at the root is the node with the smallest subtree.
//too lazy to write the psuedocode but it's in the lecture notes :)
Find
the value of the array is -1 if we have found the representative element or the index of the parent if we haven't found the rep element. loop thru this list and return the index of the parent.
int DisjointSets::find(int i) {
if (s[i] < 0) {return i;}
else {return _find(s[i]);}
}
Path Compression
Every time we find an element, we make the key the representative of the subset. The next time we call find, the runtime is shorter.
Any sequence of m union and find operations results in the worst case running time ofO(mlg n ) (average cost per operation is O(log* n )), where n is the number of items in the Disjoint Sets.
Array Representation