I have been curious about data compression and the Zip file format
in particular for a long time. At some point I decided to address
that by learning how it works and writing my own Zip program. The
implementation turned into an exciting programming exercise; there
is great pleasure to be had from creating a well oiled machine that
takes data apart, jumbles its bits into a more efficient
representation, and puts it all back together again. Hopefully it is
interesting to read about too.
This article explains how the Zip file format and its compression
scheme work in great detail: LZ77 compression, Huffman coding,
Deflate and all. It tells some of the history, and provides a
reasonably efficient example implementation written from scratch in
C.