Skip to content

Mirudull-D/ByteShrink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ByteShrink

A command-line file compression tool built in C++ using Huffman Encoding with optional Run-Length Encoding (RLE) preprocessing. ByteShrink compresses text files into compact binary format and restores them losslessly.


Features

  • Lossless compression and decompression
  • Huffman encoding with a frequency-based min-heap tree
  • Optional RLE preprocessing for repetitive data
  • Binary file output with embedded frequency table header
  • Clean CLI interface

Project Structure

byteshrink/
├── include/
│   └── Huffman.h        # Node struct, Compare functor, function declarations
├── src/
│   ├── Huffman.cpp      # Tree building, encoding, decoding logic
│   ├── RLE.cpp          # Run-Length Encoding / Decoding
│   └── FileM.cpp        # Compress() and Decompress() file I/O
├── main.cpp             # CLI entry point
└── README.md

How It Works

Compression (-c)

  1. Reads the input text file into memory
  2. (Optional) Applies RLE to reduce repeated character sequences
  3. Counts character frequencies and builds a min-heap priority queue
  4. Constructs a Huffman Tree — lower frequency characters get longer codes
  5. Generates a binary code (bitstring) for each character
  6. Writes the compressed binary file with this structure:
[ Map Size (int) ]
[ char + frequency pairs × N ]
[ Total Bits (int) ]
[ Encoded payload (packed bits) ]

Decompression (-d)

  1. Reads the header to reconstruct the frequency map
  2. Rebuilds the identical Huffman Tree from the frequency map
  3. Reads the packed bit payload and decodes it by traversing the tree
  4. (Optional) Applies RLE decoding to recover original text
  5. Writes the restored text to the output file

Building

g++ main.cpp src/Huffman.cpp src/RLE.cpp src/FileM.cpp -Iinclude -o byteshrink

Requires a C++17-compatible compiler (for structured bindings).


Usage

# Compress a text file
./byteshrink -c input.txt compressed.bin

# Decompress a binary file
./byteshrink -d compressed.bin restored.txt

# Verify output matches original (Windows)
fc.exe input.txt restored.txt

# Verify output matches original (Linux/macOS)
diff input.txt restored.txt

RLE (Run-Length Encoding)

RLE is included but disabled by default in FileM.cpp. It is only beneficial for inputs with long runs of repeated characters (e.g., binary bitmap data, highly repetitive logs).

⚠️ Do not enable RLE for normal text files. English prose has very few character runs, so RLE will expand the data before Huffman sees it, resulting in a much larger output file.

To enable, uncomment in FileM.cpp:

// std::string rle = to_RLE(content);   ← uncomment to enable
// std::string content = from_RLE(rle); ← uncomment to enable

Example

Input:  "aaaaabbbcc" (10 bytes)

Frequencies: a=5, b=3, c=2

Huffman Codes:        
  a → 0       (1 bit)
  b → 10      (2 bits)
  c → 11      (2 bits)

Encoded: 00000101010 11 11  →  ~2 bytes instead of 10

Dependencies

  • C++17 standard library (<map>, <queue>, <fstream>, <string>)
  • No external libraries required

About

A command-line file compression tool built in C++ using Huffman Encoding with optional Run-Length Encoding (RLE) preprocessing. ByteShrink compresses text files into compact binary format and restores them losslessly.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages