Many problems involve finding a “similar set”:

Given high-dimensional datapoints $x_1, x_2, ..., x_n$ and some distance function $d(x_1, x_2)$:

Recall that we can find identical items in a collection of $n$ via hash table

Locality Sensitive Hashing (LSH)

Suppose that for objects $X, X'$, we have that $d(X, X') = c$.

Given $X, h(X)$: