Skip to content

Latest commit

 

History

History
148 lines (122 loc) · 7.1 KB

File metadata and controls

148 lines (122 loc) · 7.1 KB

Changelog

All notable changes to TechCompressor will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

1.0.0 - 2025-10-25

Added

  • LZW Compression Algorithm: Dictionary-based compression with automatic dictionary reset for unlimited input support
  • Huffman Coding Algorithm: Frequency-based compression with optimal tree construction
  • DEFLATE Algorithm: Industry-standard hybrid LZ77 + Huffman compression
  • AES-256-GCM Encryption: Military-grade authenticated encryption with PBKDF2 key derivation (100,000 iterations)
  • TCAF Archive Format: Custom archive format supporting folder compression with metadata preservation
  • Per-file and Single-stream Compression Modes: Flexible archiving strategies for different use cases
  • CLI Interface: Full-featured command-line interface with commands for compress, decompress, create, extract, list, and verify
  • Tkinter GUI: User-friendly graphical interface with multi-tab design, background threading, and real-time progress updates
  • Security Features: Path traversal protection, symlink validation, recursion detection, and authenticated encryption
  • Progress Callbacks: Cancellable operations with real-time progress reporting for GUI and CLI
  • Streaming Support: Efficient handling of large files (>16MB) with chunked I/O
  • Comprehensive Test Suite: 137 passing tests covering algorithms, encryption, archiving, and integration scenarios
  • Performance Benchmarking: Built-in bench.py script and CLI --benchmark flag for algorithm comparison
  • Magic Header Validation: Automatic format detection and wrong-algorithm prevention
  • Python 3.10+ Support: Modern type hints with PEP 604 union syntax (|)

Compression Features

  • LZW: 4096-entry dictionary, 2-byte big-endian codes, automatic reset
  • Huffman: Heap-based tree construction, serialized tree format, single-byte edge case handling
  • DEFLATE: 32KB sliding window, 258-byte max match, two-pass compression

Archive Features

  • Two compression modes: per-file (better random access) and single-stream (better ratio)
  • Metadata preservation: relative paths, timestamps, and file attributes
  • Security: path validation, symlink blocking, recursion detection, sanitized extraction
  • Format: TCAF header with version, algorithm ID, compression mode flag, and entry metadata

Encryption Features

  • AES-256-GCM authenticated encryption with 16-byte authentication tag
  • PBKDF2-HMAC-SHA256 key derivation with 100,000 iterations
  • Random salt (16 bytes) and nonce (12 bytes) per encryption
  • Automatic encryption detection via TCE1 magic header
  • No password recovery mechanism (intentional security design)

CLI Features

  • create/c: Create archives from files or folders
  • extract/x: Extract archives with optional password
  • compress: Single file compression without archive overhead
  • decompress: Single file decompression
  • list/l: Show archive contents without extraction
  • verify: Check archive integrity
  • --benchmark: Run inline performance tests
  • --gui: Launch GUI from command line
  • Entry points: techcompressor, techcmp, techcompressor-gui

GUI Features

  • Multi-tab interface: Compress, Extract, Settings, Logs
  • Background threading with ThreadPoolExecutor for non-blocking operations
  • Real-time progress bars and status updates
  • Password fields with show/hide toggle
  • Algorithm selection (LZW, HUFFMAN, DEFLATE)
  • Per-file mode toggle for archives
  • Operation cancellation support
  • Custom logging handler for GUI text widget
  • Keyboard shortcuts: Ctrl+Shift+C (compress), Ctrl+Shift+E (extract)

Developer Features

  • Modular architecture: core, archiver, crypto, cli, gui, utils
  • Standardized logging via utils.get_logger()
  • Type hints throughout codebase
  • Comprehensive docstrings with algorithm explanations
  • Test organization by component and integration level
  • Performance regression tests

Dependencies

  • cryptography>=41.0.0: AES-GCM and PBKDF2 implementation
  • tqdm>=4.65.0: Progress bars for CLI operations
  • pytest>=7.0.0: Testing framework (dev dependency)

Testing

  • Algorithm-specific tests: test_lzw.py, test_huffman.py, test_deflate.py
  • Encryption tests: test_crypto.py (password validation, wrong password detection)
  • Archive tests: test_archiver.py (security validation, path traversal prevention)
  • Integration tests: test_integration.py (cross-algorithm workflows)
  • Performance tests: test_perf_sanity.py (regression checks)
  • GUI tests: test_gui_basic.py (basic functionality)

Performance

  • DEFLATE: Best compression ratio (~1% on repetitive data), 6 MB/s
  • LZW: Fastest compression (3 MB/s), decent ratio (~9%)
  • Huffman: Middle ground (~44% ratio, 2.5 MB/s)
  • Encryption overhead: 50-100ms for PBKDF2 (intentional security feature)

Documentation

  • Comprehensive README with quickstart, API reference, CLI examples
  • Architecture documentation in .github/copilot-instructions.md
  • Algorithm explanations in code docstrings
  • Security best practices and warnings

Fixed

  • Dictionary reset handling in LZW for unlimited input size
  • Single-byte edge case in Huffman tree construction
  • Magic header validation for format detection
  • Path traversal prevention in archive extraction
  • Symlink handling to prevent infinite loops
  • Archive recursion detection (output inside source)

Security

  • Path traversal attack prevention via sanitized extraction paths
  • Symlink validation to prevent directory traversal
  • Recursion detection to prevent archive-in-source issues
  • Authenticated encryption with GCM mode
  • Intentionally slow key derivation (PBKDF2, 100K iterations)
  • No password recovery mechanism (by design)

[Unreleased]

  • Future algorithm additions (Arithmetic coding, BWT, etc.)
  • Additional archive formats (ZIP, TAR interoperability)
  • Advanced compression options (dictionary size tuning)
  • GPU-accelerated compression
  • Parallel compression for multi-file archives

Release Notes

v1.0.0 - Production Release

TechCompressor v1.0.0 is the first production-ready release featuring three compression algorithms (LZW, Huffman, DEFLATE), AES-256-GCM encryption, custom TCAF archive format, and both CLI and GUI interfaces. All 137 tests pass, security features are fully implemented, and performance is optimized for typical use cases.

Breaking Changes: None (initial release)

Migration Guide: Not applicable (initial release)

Known Issues:

  • Compression reveals data patterns even with encryption (semantic security limitation inherent to compress-then-encrypt)
  • PBKDF2 key derivation is intentionally slow (~50-100ms) for security
  • No password recovery - data loss is permanent without password

Recommended Usage:

  • Use DEFLATE for general-purpose compression
  • Use LZW for speed-critical applications
  • Use per-file mode for mixed content or selective extraction
  • Use single-stream mode for similar files (source code, text)
  • Always use strong passwords (12+ characters) for encryption