In order to be prepared for class, you need to understand (1) unique decodeability – why this code property is useful for lossless compression, how we check it. (2) expected and optimal code length, how to use it to evaluate asymptotic performance of compression algorithms. (3) addressing changing source distribution with adaptive methods. In addition, you need to be able to implement data compression algorithms like Huffman coding, Arithmetic coding and Lempel-Ziv coding at a high level.

If we want to compress a message to minimal length, why use symbol codes? In the previous session we have seen that by accepting loss (delta), it’s possible for fixed-length block codes to minimize the expected length to the limit NH (as N approaches infinity). But this coding scheme requires looking up the entire set, so we need more practical algorithms for encoding and decoding. The class of symbol codes that is prefix codes can decode a symbol as soon as you see its encoding. This prefixness property lets us infer about the minimum expected codelength.

Prefix code:

$$ \forall X_1,X_2 \in A, X_1 \neq X_2 \implies C(X_1)\neq prefix(C(X_2)) $$

Prefix codes imply unique decodeability. In other words, each symbol code can only map to one source outcome. One might see that each uniquely decodable code that takes D possible values is like a tree taht has that many code words as its leaves.

Uniquely decodeable:

$$ \forall X_1^n\in A^n, \forall X_2^m \in A^m, X_1^n \neq X_2^m \implies C (X_1^n) \neq C (X_2^m) $$

where n, m are the source symbols from the true distribution and C is the encoding function.

Kraft inequality says that assuming a symvol coding scheme C is a prefix code, and it uses binary symbols with codelenghs (l1, …, l_A), then the inequality should hold. Conversely, if a set of codelengths satisfy the inequality, then there exists a prefix code with these codelengths.

Kraft Inequality:

$$ ⁍ $$

where l is the code length of each symbol.

The goal is to make the shortest code that losslessly encodes the message. You can try to show the bounds by considering the problem as constrained optimization $\min \sum_i p_i l_i$ where the constraint is the Kraft inequality. Then you would get that at the minima, the smallest expected codelength of a prefix code is its Shannon entropy content $l^* (x) = \log_D (1/p(x))$.

Optimal code length:

$$ H(X) \leq L(C,X) = \sum_i p_i[\log(1/p_i)] < H(X) + 1 $$

Huffman code is the prefix code with the shortest expected length. It’s a greedy algorithm that constructs a binary tree from the leaves upwards, every step taking the two symbols with the least probabilities, then assign them equal code-lengths, merge them and shrink the alphabet until there is only one symbol left.

Huffman code pseudocode:

Lecture 7 of Singh, A. (2012)

Lecture 7 of Singh, A. (2012)