How to lzw compression

Content on WhatAnswers is provided "as is" for informational purposes. While we strive for accuracy, we make no guarantees. Content is AI-assisted and should not be used as professional advice.

Last updated: April 4, 2026

Quick Answer: LZW compression is a lossless data compression algorithm that works by building a dictionary of frequently occurring data strings. As it encounters these strings, it replaces them with shorter codes, thereby reducing the overall file size. This method is effective for compressing repetitive data like images and text.

Key Facts

LZW stands for Lempel-Ziv-Welch.
It was patented by Terry Welch in 1984, building on earlier work by Abraham Lempel and Jacob Ziv.
LZW is a dictionary-based compression algorithm.
It is a lossless compression method, meaning no data is lost during compression and decompression.
It was widely used in GIF image files and early Unix 'compress' utilities.

What is LZW Compression?

LZW (Lempel-Ziv-Welch) compression is a popular and historically significant lossless data compression algorithm. Developed by Terry Welch in 1984 as an improvement on earlier Lempel-Ziv algorithms, LZW gained widespread adoption due to its efficiency and relative simplicity. It works by dynamically building a dictionary of strings (sequences of bytes or characters) encountered in the input data. As the algorithm processes the data, it identifies recurring patterns and assigns them a unique code. These codes are shorter than the original strings they represent, leading to a reduction in the overall file size.

How Does LZW Compression Work?

The core principle behind LZW compression is dictionary building. Imagine you are reading a book and you notice the phrase "the quick brown fox" appears many times. Instead of writing it out each time, you could assign it a short symbol, say, '#'. Every time you see "the quick brown fox", you just write '#'. LZW compression does something similar, but it builds its dictionary automatically as it scans the data.

The Compression Process:

Initialization: The algorithm starts with a predefined dictionary containing all possible single characters (e.g., ASCII characters).
Scanning and Matching: It reads the input data character by character. It maintains a current string (initially empty or the first character). It then looks for the longest string in its dictionary that matches the current string plus the next input character.
Dictionary Update: If a match is found, the current string is extended with the next character. If no match is found (meaning a new string pattern has been identified), the code for the existing matched string is outputted, and the newly formed string (the matched string plus the new character) is added to the dictionary with a new, unique code. The current string is then reset to the new character.
Outputting Codes: The codes representing the matched strings are written to the output stream.
End of Input: When the end of the input data is reached, the code for the final matched string is outputted.

The Decompression Process:

Decompression is essentially the reverse process. The decompressor also maintains a dictionary, which it builds in parallel with the compressor. It reads the incoming codes and uses them to reconstruct the original strings. When it encounters a code, it looks it up in its dictionary and outputs the corresponding string. If it encounters a code that is not yet in its dictionary (a special case that arises when the compressor outputs a code for a string it just added), it can deduce the string by taking the previously outputted string, appending its first character, and outputting that.

Key Features and Advantages:

Lossless: LZW compression does not discard any data. The original file can be perfectly reconstructed from the compressed version.
Adaptive: The dictionary is built dynamically based on the input data, making it effective for a wide range of data types, especially those with repeating patterns.
Relatively Simple Implementation: Compared to some other compression algorithms, LZW is conceptually straightforward to implement.

Disadvantages and Limitations:

Patent Issues: Historically, LZW was encumbered by patents (held by Unisys), which limited its free use in certain commercial applications for a period. While most of these patents have now expired, this history impacted its adoption.
Not Always Optimal: For certain types of data, other compression algorithms might achieve higher compression ratios. For example, algorithms like Huffman coding or arithmetic coding can sometimes be more efficient, especially when symbol probabilities are highly skewed.
Dictionary Size: The size of the dictionary can grow, which requires memory. The maximum dictionary size is often fixed or limited.

Common Applications of LZW:

LZW compression found significant use in several areas:

GIF Images: The Graphics Interchange Format (GIF) used LZW compression extensively for many years, contributing to its popularity for web graphics due to smaller file sizes.
Unix 'compress' utility: This standard Unix command-line utility utilized LZW for file compression.
TIFF Images: Some implementations of the Tagged Image File Format (TIFF) also support LZW compression.
PostScript: The page description language PostScript has used LZW for compressing parts of its documents.

LZW vs. Other Compression Methods:

LZW is a type of dictionary-based compression. Other dictionary-based algorithms include LZ77 and LZ78 (which LZW is derived from). These algorithms work by finding repeated strings. In contrast, statistical compression methods like Huffman coding or arithmetic coding assign shorter codes to more frequent symbols and longer codes to less frequent symbols based on their probability. LZW can be seen as a hybrid, as it builds a dictionary of strings, effectively learning the statistical properties of the data's structure.

While LZW was revolutionary in its time and remains a valuable algorithm, modern compression techniques, often combining dictionary-based and statistical methods, have surpassed it in terms of compression efficiency for many general-purpose tasks. However, its legacy in early digital imaging and file compression is undeniable.