Compression


Suggested Readings

  • Introduction to data compression by Peter Gutmann. Overview of Huffman, Shannon-Fano, Arithmetic, LZ78, LZW, and LZ77 coding techniques.
  • LZW explained by Steve Blackstock. A nice 200 line description of Lempel-Ziv Welch compression along with the particular implementation used in the GIF standard.
  • Image Compression Techniques.
    This has little technical content, but gives a broad background of the various techniques.
  • Video Compression. This has more technical content, but basically only describes JPEG and MPEG.
  • Benchmarks

  • A comparison of various compression codes for text and black and white images. Shows progression of the state-of-the-art over the years. The data is somewhat out of date (e.g. the best bpc for the Calgary Corpus is now around 2).
  • Comparison of compression ratios of over 40 archivers. The overall winner is BZIP, which is freely available and patent free (at least for now). It is based on the Burrows-Wheeler block sorting algorithm and it compresses to about 75% the size of gzip and is slightly faster.
  • Optional Readings

  • Khalid Sayood. Introduction to Data Compression. Morgan Kaufmann, 1996.
    Looks at both Theoretical and practical aspects of data compression. Discusses a wide range of lossless and lossy compression methods, including fractals, wavelets, and subband coding.
  • Timothy C. Bell, John G. Cleary, and Ian H. Witten. Text compression. Prentice Hall, 1990.
  • Gilbert Held and Thomas R. Marshall. Data and Image Compression: Tools and Techniques. Wiley 1996 (4th ed.)
    This book is quite basic and does not cover many important topics. On the other hand, it includes source code and a detailed description of most of the basic algorithms.
  • Mark Nelson. The Data Compression Book. M&T Books, 1995.
  • James A. Storer (Editor). Image and Text Compression. Kluwer, 1992.
    This a collection of articles on a broad set of topics.
  • Further Readings and Links

  • General links
  • comp.compression faq. This is surely the best source of information on compression available on the web.
  • Compression Pointers (comprehensive, but badly organized).
  • An online overview of lossless compression techniques.
  • Data compression links.
  • Yet more links
  • Entropy and Information Theory
  • Entropy on the World Wide Web
  • LZ78 and LZW
  • Terry A. Welch. A Technique for High Performance Data Compression. IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19.
  • The GIF format, which uses LZW, is described in GIF87(5) - GIF 87 and GIF89a(5) - GIF 89a standards.
  • The GIF Licensing Controversy. The GIF format is patented by ComputServe and the LZW algorithm used by the format is patented by Unisys.
  • LZ77 (sliding window compression)
  • E. Fiala and D. Greene. Data compression with finite windows (abstract).
  • A.D. Wyner and J. Ziv. The sliding-window Lempel-Ziv algorithm is asymptotically optimal. (abstract).
  • The PPM algorithm.
  • A. Moffat. Implementing the PPM data compression scheme. (abstract).
  • Bill Teahan's home page has pointers to several papers on PPM.
  • The Burrows-Wheeler algorithm
  • The Burrows-Wheeler block sorting algorithm off of the compression faq.
  • BZIP, an implementation that uses Burrows-Wheeler.
  • JPEG and MPEG.
  • JPEG FAQ
  • MPEG Starting Points and FAQs.
  • A nice MPEG FAQ.
  • Fractal compression
  • Introduction to fractal compression off of the compression faq.
  • What is the state of fractal compression?.
  • Links to various resources organized by Yuval Fisher. He also has a book on the topic.
  • Wavelet compression
  • Amara's Wavelet Page. This page is comprehensive.
  • Summus. A company that sells products based on wavelet compression and has a nice short overview of Wavelet theory off of their home page.

  • Back to the topics list.