Go Compress Lab

An educational compression experimentation toolkit built in Go. This CLI tool allows you to experiment with different compression algorithms, analyze their performance, and understand compression theory through hands-on examples.

Features

🔧 Multiple Compression Algorithms:
- RLE (Run-Length Encoding) - Best for data with consecutive repeated characters
- Huffman Coding - Variable-length encoding based on character frequency
- LZ77 - Dictionary-based compression using sliding windows
- LZW (Lempel-Ziv-Welch) - Dictionary-based compression with dynamic dictionary
📊 Comprehensive Metrics:
- Compression ratio
- Space savings percentage
- Shannon entropy analysis
- Compression/decompression performance timing
📈 Visualization:
- ASCII bar charts for compression ratios
- Entropy visualization with interpretation
- Side-by-side algorithm comparison
- Color-coded output for clarity
🎯 Educational Focus:
- Clean, readable algorithm implementations
- Detailed metrics to understand compression behavior
- Support for comparing multiple algorithms

Installation

Prerequisites

Go 1.16 or higher

Build from Source

git clone https://github.com/BaseMax/go-compress-lab.git
cd go-compress-lab
go build -o compress-lab ./cmd/compress-lab

Optionally, install it globally:

go install ./cmd/compress-lab

Usage

Basic Commands

# Compress text with all algorithms (comparison mode)
./compress-lab -text="Hello World" -compare

# Compress a file with a specific algorithm
./compress-lab -input=file.txt -algo=huffman

# Compress and save to file
./compress-lab -input=file.txt -algo=lzw -output=compressed.bin

# Decompress a file
./compress-lab -input=compressed.bin -algo=lzw -decompress -output=original.txt

Command-Line Flags

-text - Text string to compress (alternative to input file)
-input - Input file path
-output - Output file path for compressed/decompressed data
-algo - Algorithm to use: rle, huffman, lz77, lzw, or all
-compare - Compare all algorithms (shows detailed metrics)
-decompress - Decompress mode instead of compress

Examples

Example 1: Comparing Algorithms on Repetitive Data

./compress-lab -text="AAAABBBCCCCCCDDDDDD" -compare

Output:

Comparing all compression algorithms...
Input size: 19 bytes

====================================================================================================
COMPRESSION RESULTS
====================================================================================================
Algorithm    |     Original |   Compressed |    Ratio |  Savings % |  Entropy |    Comp Time |  Decomp Time
----------------------------------------------------------------------------------------------------
RLE          |         19 B |          8 B |     2.38 |     57.89% |     1.94 |      2.284µs |        360ns
Huffman      |         19 B |         23 B |     0.83 |    -21.05% |     1.94 |     13.244µs |      8.937µs
LZ77         |         19 B |         25 B |     0.76 |    -31.58% |     1.94 |      1.503µs |      1.353µs
LZW          |         19 B |         22 B |     0.86 |    -15.79% |     1.94 |     40.856µs |     36.188µs
====================================================================================================

COMPRESSION RATIO VISUALIZATION
------------------------------------------------------------
RLE          | ████████████████████████████████████████ 2.38:1
Huffman      | █████████████ 0.83:1
LZ77         | ████████████ 0.76:1
LZW          | ██████████████ 0.86:1
------------------------------------------------------------

ENTROPY ANALYSIS
------------------------------------------------------------
Shannon Entropy: 1.9440 bits/byte
Randomness: 24.30% [▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░]

Interpretation:
  → Very low entropy: Highly repetitive data, excellent for compression
------------------------------------------------------------

Analysis: RLE performs best on this data because it efficiently encodes consecutive runs of identical characters.

Example 2: Natural Language Text

./compress-lab -text="The quick brown fox jumps over the lazy dog." -compare

Analysis: LZW and Huffman typically perform better on natural language text due to repeated patterns and character frequency distribution.

Example 3: File Compression and Decompression

# Compress a file
./compress-lab -input=document.txt -algo=lzw -output=document.lzw

# Decompress it
./compress-lab -input=document.lzw -algo=lzw -decompress -output=restored.txt

Example 4: Single Algorithm Analysis

./compress-lab -input=data.txt -algo=huffman

This shows detailed metrics for just the Huffman algorithm.

Understanding the Metrics

Compression Ratio

Formula: Original Size / Compressed Size
Interpretation: Higher is better. Ratio > 1 means compression, < 1 means expansion.

Space Savings

Formula: (1 - Compressed/Original) × 100%
Interpretation: Positive percentage = compression, negative = expansion.

Shannon Entropy

Range: 0 to 8 bits/byte
Interpretation:
- 0-2: Very low entropy, highly compressible
- 2-4: Low entropy, good compression potential
- 4-6: Medium entropy, moderate compression
- 6-8: High entropy, difficult to compress

Algorithm Overview

RLE (Run-Length Encoding)

Best for: Data with long runs of repeated characters
How it works: Replaces sequences of repeated bytes with (count, byte) pairs
Example: "AAAA" → (4, 'A')

Huffman Coding

Best for: Data with non-uniform character distribution
How it works: Assigns shorter codes to frequent characters
Example: 'e' might be encoded as "01" while 'z' as "1101100"

LZ77

Best for: Data with repeated patterns at various distances
How it works: Uses a sliding window to find and reference previous occurrences
Example: References previous data with (offset, length, next_char) triplets

LZW

Best for: Data with repeated substrings
How it works: Builds a dictionary of patterns on-the-fly
Example: Commonly used in GIF and TIFF formats

Project Structure

go-compress-lab/
├── cmd/
│   └── compress-lab/
│       └── main.go           # CLI application entry point
├── pkg/
│   ├── algorithms/
│   │   ├── compressor.go     # Compressor interface
│   │   ├── rle.go           # RLE implementation
│   │   ├── huffman.go       # Huffman coding implementation
│   │   ├── lz77.go          # LZ77 implementation
│   │   └── lzw.go           # LZW implementation
│   ├── metrics/
│   │   └── metrics.go       # Compression metrics and calculations
│   └── visualization/
│       └── display.go       # Terminal output formatting
├── go.mod
└── README.md

Educational Use Cases

Understanding Compression Theory: Compare algorithms on different types of data to see which performs best
Entropy Analysis: Learn about information theory and data randomness
Performance Benchmarking: Measure compression speed vs. ratio tradeoffs
Algorithm Behavior: Observe how each algorithm handles different data patterns

Contributing

Contributions are welcome! This is an educational project, so please keep implementations clear and well-documented.

License

MIT License - See LICENSE file for details

Author

Max Base

Acknowledgments

This project is designed for educational purposes to help understand compression algorithms and their behavior on different types of data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Go Compress Lab

Features

Installation

Prerequisites

Build from Source

Usage

Basic Commands

Command-Line Flags

Examples

Example 1: Comparing Algorithms on Repetitive Data

Example 2: Natural Language Text

Example 3: File Compression and Decompression

Example 4: Single Algorithm Analysis

Understanding the Metrics

Compression Ratio

Space Savings

Shannon Entropy

Algorithm Overview

RLE (Run-Length Encoding)

Huffman Coding

LZ77

LZW

Project Structure

Educational Use Cases

Contributing

License

Author

Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Go Compress Lab

Features

Installation

Prerequisites

Build from Source

Usage

Basic Commands

Command-Line Flags

Examples

Example 1: Comparing Algorithms on Repetitive Data

Example 2: Natural Language Text

Example 3: File Compression and Decompression

Example 4: Single Algorithm Analysis

Understanding the Metrics

Compression Ratio

Space Savings

Shannon Entropy

Algorithm Overview

RLE (Run-Length Encoding)

Huffman Coding

LZ77

LZW

Project Structure

Educational Use Cases

Contributing

License

Author

Acknowledgments