Arithmetic Coding
Huffman coding works well for larger alphabets and gets to within one bit of the entropy lower bound. Can we do better. Yes
Basic idea in arithmetic coding:
- represent each string x of length n by an interval [l,r) in [0,1).
- The width r-l of the interval [l,r) represents the probability of x occurring.
- The interval [l,r) can itself be represented by any number, called a tag, within the half open interval.
- The k significant bits of the tag .t1t2t3... is the code of x. That is, . .t1t2t3...tk000... is in the interval [l,r).