A command-line file compression tool built in C++ using Huffman Encoding with optional Run-Length Encoding (RLE) preprocessing. ByteShrink compresses text files into compact binary format and restores them losslessly.
- Lossless compression and decompression
- Huffman encoding with a frequency-based min-heap tree
- Optional RLE preprocessing for repetitive data
- Binary file output with embedded frequency table header
- Clean CLI interface
byteshrink/
├── include/
│ └── Huffman.h # Node struct, Compare functor, function declarations
├── src/
│ ├── Huffman.cpp # Tree building, encoding, decoding logic
│ ├── RLE.cpp # Run-Length Encoding / Decoding
│ └── FileM.cpp # Compress() and Decompress() file I/O
├── main.cpp # CLI entry point
└── README.md
- Reads the input text file into memory
- (Optional) Applies RLE to reduce repeated character sequences
- Counts character frequencies and builds a min-heap priority queue
- Constructs a Huffman Tree — lower frequency characters get longer codes
- Generates a binary code (bitstring) for each character
- Writes the compressed binary file with this structure:
[ Map Size (int) ]
[ char + frequency pairs × N ]
[ Total Bits (int) ]
[ Encoded payload (packed bits) ]
- Reads the header to reconstruct the frequency map
- Rebuilds the identical Huffman Tree from the frequency map
- Reads the packed bit payload and decodes it by traversing the tree
- (Optional) Applies RLE decoding to recover original text
- Writes the restored text to the output file
g++ main.cpp src/Huffman.cpp src/RLE.cpp src/FileM.cpp -Iinclude -o byteshrinkRequires a C++17-compatible compiler (for structured bindings).
# Compress a text file
./byteshrink -c input.txt compressed.bin
# Decompress a binary file
./byteshrink -d compressed.bin restored.txt
# Verify output matches original (Windows)
fc.exe input.txt restored.txt
# Verify output matches original (Linux/macOS)
diff input.txt restored.txtRLE is included but disabled by default in FileM.cpp. It is only beneficial for inputs with long runs of repeated characters (e.g., binary bitmap data, highly repetitive logs).
⚠️ Do not enable RLE for normal text files. English prose has very few character runs, so RLE will expand the data before Huffman sees it, resulting in a much larger output file.
To enable, uncomment in FileM.cpp:
// std::string rle = to_RLE(content); ← uncomment to enable
// std::string content = from_RLE(rle); ← uncomment to enableInput: "aaaaabbbcc" (10 bytes)
Frequencies: a=5, b=3, c=2
Huffman Codes:
a → 0 (1 bit)
b → 10 (2 bits)
c → 11 (2 bits)
Encoded: 00000101010 11 11 → ~2 bytes instead of 10
- C++17 standard library (
<map>,<queue>,<fstream>,<string>) - No external libraries required