Skip to content

Latest commit

 

History

History
231 lines (170 loc) · 7.9 KB

File metadata and controls

231 lines (170 loc) · 7.9 KB

Go Compress Lab

An educational compression experimentation toolkit built in Go. This CLI tool allows you to experiment with different compression algorithms, analyze their performance, and understand compression theory through hands-on examples.

Features

  • 🔧 Multiple Compression Algorithms:

    • RLE (Run-Length Encoding) - Best for data with consecutive repeated characters
    • Huffman Coding - Variable-length encoding based on character frequency
    • LZ77 - Dictionary-based compression using sliding windows
    • LZW (Lempel-Ziv-Welch) - Dictionary-based compression with dynamic dictionary
  • 📊 Comprehensive Metrics:

    • Compression ratio
    • Space savings percentage
    • Shannon entropy analysis
    • Compression/decompression performance timing
  • 📈 Visualization:

    • ASCII bar charts for compression ratios
    • Entropy visualization with interpretation
    • Side-by-side algorithm comparison
    • Color-coded output for clarity
  • 🎯 Educational Focus:

    • Clean, readable algorithm implementations
    • Detailed metrics to understand compression behavior
    • Support for comparing multiple algorithms

Installation

Prerequisites

  • Go 1.16 or higher

Build from Source

git clone https://github.com/BaseMax/go-compress-lab.git
cd go-compress-lab
go build -o compress-lab ./cmd/compress-lab

Optionally, install it globally:

go install ./cmd/compress-lab

Usage

Basic Commands

# Compress text with all algorithms (comparison mode)
./compress-lab -text="Hello World" -compare

# Compress a file with a specific algorithm
./compress-lab -input=file.txt -algo=huffman

# Compress and save to file
./compress-lab -input=file.txt -algo=lzw -output=compressed.bin

# Decompress a file
./compress-lab -input=compressed.bin -algo=lzw -decompress -output=original.txt

Command-Line Flags

  • -text - Text string to compress (alternative to input file)
  • -input - Input file path
  • -output - Output file path for compressed/decompressed data
  • -algo - Algorithm to use: rle, huffman, lz77, lzw, or all
  • -compare - Compare all algorithms (shows detailed metrics)
  • -decompress - Decompress mode instead of compress

Examples

Example 1: Comparing Algorithms on Repetitive Data

./compress-lab -text="AAAABBBCCCCCCDDDDDD" -compare

Output:

Comparing all compression algorithms...
Input size: 19 bytes

====================================================================================================
COMPRESSION RESULTS
====================================================================================================
Algorithm    |     Original |   Compressed |    Ratio |  Savings % |  Entropy |    Comp Time |  Decomp Time
----------------------------------------------------------------------------------------------------
RLE          |         19 B |          8 B |     2.38 |     57.89% |     1.94 |      2.284µs |        360ns
Huffman      |         19 B |         23 B |     0.83 |    -21.05% |     1.94 |     13.244µs |      8.937µs
LZ77         |         19 B |         25 B |     0.76 |    -31.58% |     1.94 |      1.503µs |      1.353µs
LZW          |         19 B |         22 B |     0.86 |    -15.79% |     1.94 |     40.856µs |     36.188µs
====================================================================================================

COMPRESSION RATIO VISUALIZATION
------------------------------------------------------------
RLE          | ████████████████████████████████████████ 2.38:1
Huffman      | █████████████ 0.83:1
LZ77         | ████████████ 0.76:1
LZW          | ██████████████ 0.86:1
------------------------------------------------------------

ENTROPY ANALYSIS
------------------------------------------------------------
Shannon Entropy: 1.9440 bits/byte
Randomness: 24.30% [▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░]

Interpretation:
  → Very low entropy: Highly repetitive data, excellent for compression
------------------------------------------------------------

Analysis: RLE performs best on this data because it efficiently encodes consecutive runs of identical characters.

Example 2: Natural Language Text

./compress-lab -text="The quick brown fox jumps over the lazy dog." -compare

Analysis: LZW and Huffman typically perform better on natural language text due to repeated patterns and character frequency distribution.

Example 3: File Compression and Decompression

# Compress a file
./compress-lab -input=document.txt -algo=lzw -output=document.lzw

# Decompress it
./compress-lab -input=document.lzw -algo=lzw -decompress -output=restored.txt

Example 4: Single Algorithm Analysis

./compress-lab -input=data.txt -algo=huffman

This shows detailed metrics for just the Huffman algorithm.

Understanding the Metrics

Compression Ratio

  • Formula: Original Size / Compressed Size
  • Interpretation: Higher is better. Ratio > 1 means compression, < 1 means expansion.

Space Savings

  • Formula: (1 - Compressed/Original) × 100%
  • Interpretation: Positive percentage = compression, negative = expansion.

Shannon Entropy

  • Range: 0 to 8 bits/byte
  • Interpretation:
    • 0-2: Very low entropy, highly compressible
    • 2-4: Low entropy, good compression potential
    • 4-6: Medium entropy, moderate compression
    • 6-8: High entropy, difficult to compress

Algorithm Overview

RLE (Run-Length Encoding)

  • Best for: Data with long runs of repeated characters
  • How it works: Replaces sequences of repeated bytes with (count, byte) pairs
  • Example: "AAAA" → (4, 'A')

Huffman Coding

  • Best for: Data with non-uniform character distribution
  • How it works: Assigns shorter codes to frequent characters
  • Example: 'e' might be encoded as "01" while 'z' as "1101100"

LZ77

  • Best for: Data with repeated patterns at various distances
  • How it works: Uses a sliding window to find and reference previous occurrences
  • Example: References previous data with (offset, length, next_char) triplets

LZW

  • Best for: Data with repeated substrings
  • How it works: Builds a dictionary of patterns on-the-fly
  • Example: Commonly used in GIF and TIFF formats

Project Structure

go-compress-lab/
├── cmd/
│   └── compress-lab/
│       └── main.go           # CLI application entry point
├── pkg/
│   ├── algorithms/
│   │   ├── compressor.go     # Compressor interface
│   │   ├── rle.go           # RLE implementation
│   │   ├── huffman.go       # Huffman coding implementation
│   │   ├── lz77.go          # LZ77 implementation
│   │   └── lzw.go           # LZW implementation
│   ├── metrics/
│   │   └── metrics.go       # Compression metrics and calculations
│   └── visualization/
│       └── display.go       # Terminal output formatting
├── go.mod
└── README.md

Educational Use Cases

  1. Understanding Compression Theory: Compare algorithms on different types of data to see which performs best
  2. Entropy Analysis: Learn about information theory and data randomness
  3. Performance Benchmarking: Measure compression speed vs. ratio tradeoffs
  4. Algorithm Behavior: Observe how each algorithm handles different data patterns

Contributing

Contributions are welcome! This is an educational project, so please keep implementations clear and well-documented.

License

MIT License - See LICENSE file for details

Author

Max Base

Acknowledgments

This project is designed for educational purposes to help understand compression algorithms and their behavior on different types of data.