All notable changes to TechCompressor will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
1.0.0 - 2025-10-25
- LZW Compression Algorithm: Dictionary-based compression with automatic dictionary reset for unlimited input support
- Huffman Coding Algorithm: Frequency-based compression with optimal tree construction
- DEFLATE Algorithm: Industry-standard hybrid LZ77 + Huffman compression
- AES-256-GCM Encryption: Military-grade authenticated encryption with PBKDF2 key derivation (100,000 iterations)
- TCAF Archive Format: Custom archive format supporting folder compression with metadata preservation
- Per-file and Single-stream Compression Modes: Flexible archiving strategies for different use cases
- CLI Interface: Full-featured command-line interface with commands for compress, decompress, create, extract, list, and verify
- Tkinter GUI: User-friendly graphical interface with multi-tab design, background threading, and real-time progress updates
- Security Features: Path traversal protection, symlink validation, recursion detection, and authenticated encryption
- Progress Callbacks: Cancellable operations with real-time progress reporting for GUI and CLI
- Streaming Support: Efficient handling of large files (>16MB) with chunked I/O
- Comprehensive Test Suite: 137 passing tests covering algorithms, encryption, archiving, and integration scenarios
- Performance Benchmarking: Built-in
bench.pyscript and CLI--benchmarkflag for algorithm comparison - Magic Header Validation: Automatic format detection and wrong-algorithm prevention
- Python 3.10+ Support: Modern type hints with PEP 604 union syntax (
|)
- LZW: 4096-entry dictionary, 2-byte big-endian codes, automatic reset
- Huffman: Heap-based tree construction, serialized tree format, single-byte edge case handling
- DEFLATE: 32KB sliding window, 258-byte max match, two-pass compression
- Two compression modes: per-file (better random access) and single-stream (better ratio)
- Metadata preservation: relative paths, timestamps, and file attributes
- Security: path validation, symlink blocking, recursion detection, sanitized extraction
- Format: TCAF header with version, algorithm ID, compression mode flag, and entry metadata
- AES-256-GCM authenticated encryption with 16-byte authentication tag
- PBKDF2-HMAC-SHA256 key derivation with 100,000 iterations
- Random salt (16 bytes) and nonce (12 bytes) per encryption
- Automatic encryption detection via TCE1 magic header
- No password recovery mechanism (intentional security design)
create/c: Create archives from files or foldersextract/x: Extract archives with optional passwordcompress: Single file compression without archive overheaddecompress: Single file decompressionlist/l: Show archive contents without extractionverify: Check archive integrity--benchmark: Run inline performance tests--gui: Launch GUI from command line- Entry points:
techcompressor,techcmp,techcompressor-gui
- Multi-tab interface: Compress, Extract, Settings, Logs
- Background threading with ThreadPoolExecutor for non-blocking operations
- Real-time progress bars and status updates
- Password fields with show/hide toggle
- Algorithm selection (LZW, HUFFMAN, DEFLATE)
- Per-file mode toggle for archives
- Operation cancellation support
- Custom logging handler for GUI text widget
- Keyboard shortcuts: Ctrl+Shift+C (compress), Ctrl+Shift+E (extract)
- Modular architecture: core, archiver, crypto, cli, gui, utils
- Standardized logging via
utils.get_logger() - Type hints throughout codebase
- Comprehensive docstrings with algorithm explanations
- Test organization by component and integration level
- Performance regression tests
cryptography>=41.0.0: AES-GCM and PBKDF2 implementationtqdm>=4.65.0: Progress bars for CLI operationspytest>=7.0.0: Testing framework (dev dependency)
- Algorithm-specific tests: test_lzw.py, test_huffman.py, test_deflate.py
- Encryption tests: test_crypto.py (password validation, wrong password detection)
- Archive tests: test_archiver.py (security validation, path traversal prevention)
- Integration tests: test_integration.py (cross-algorithm workflows)
- Performance tests: test_perf_sanity.py (regression checks)
- GUI tests: test_gui_basic.py (basic functionality)
- DEFLATE: Best compression ratio (~1% on repetitive data), 6 MB/s
- LZW: Fastest compression (3 MB/s), decent ratio (~9%)
- Huffman: Middle ground (~44% ratio, 2.5 MB/s)
- Encryption overhead: 50-100ms for PBKDF2 (intentional security feature)
- Comprehensive README with quickstart, API reference, CLI examples
- Architecture documentation in .github/copilot-instructions.md
- Algorithm explanations in code docstrings
- Security best practices and warnings
- Dictionary reset handling in LZW for unlimited input size
- Single-byte edge case in Huffman tree construction
- Magic header validation for format detection
- Path traversal prevention in archive extraction
- Symlink handling to prevent infinite loops
- Archive recursion detection (output inside source)
- Path traversal attack prevention via sanitized extraction paths
- Symlink validation to prevent directory traversal
- Recursion detection to prevent archive-in-source issues
- Authenticated encryption with GCM mode
- Intentionally slow key derivation (PBKDF2, 100K iterations)
- No password recovery mechanism (by design)
- Future algorithm additions (Arithmetic coding, BWT, etc.)
- Additional archive formats (ZIP, TAR interoperability)
- Advanced compression options (dictionary size tuning)
- GPU-accelerated compression
- Parallel compression for multi-file archives
TechCompressor v1.0.0 is the first production-ready release featuring three compression algorithms (LZW, Huffman, DEFLATE), AES-256-GCM encryption, custom TCAF archive format, and both CLI and GUI interfaces. All 137 tests pass, security features are fully implemented, and performance is optimized for typical use cases.
Breaking Changes: None (initial release)
Migration Guide: Not applicable (initial release)
Known Issues:
- Compression reveals data patterns even with encryption (semantic security limitation inherent to compress-then-encrypt)
- PBKDF2 key derivation is intentionally slow (~50-100ms) for security
- No password recovery - data loss is permanent without password
Recommended Usage:
- Use DEFLATE for general-purpose compression
- Use LZW for speed-critical applications
- Use per-file mode for mixed content or selective extraction
- Use single-stream mode for similar files (source code, text)
- Always use strong passwords (12+ characters) for encryption