|
1 | 1 | # Changelog |
2 | 2 |
|
3 | | -## 1.0.0 |
4 | | - |
5 | | -First stable release. Published as `context-compression-engine` (renamed from `@cce/core`). |
6 | | - |
7 | | -### Features |
8 | | - |
9 | | -- **Pluggable token counter** — `tokenCounter` option for accurate budget decisions with real tokenizers |
10 | | -- **`forceConverge`** — hard-truncate non-recency messages when binary search bottoms out and budget is still exceeded |
11 | | -- **`embedSummaryId`** — embed `summary_id` in compressed content for downstream reference |
12 | | -- **Dedup target IDs** — dedup references now carry target IDs for provenance tracking |
13 | | -- **Fuzzy dedup** — line-level Jaccard similarity catches near-duplicate content (opt-in) |
14 | | -- **Cross-message deduplication** — exact-duplicate detection enabled by default |
15 | | -- **LLM benchmark suite** — multi-provider (OpenAI, Anthropic, Ollama) head-to-head comparison |
16 | | -- **Escalating summarizer** — `createEscalatingSummarizer` with three-level fallback (normal → aggressive → deterministic) |
17 | | - |
18 | | -### Fixes |
19 | | - |
20 | | -- Fix TDZ bug in summarizer initialization |
21 | | -- Fix field drops and double-counting in compression stats |
22 | | -- Fix pattern boundary false positives in classifier |
23 | | -- Add input validation for public API entry points |
24 | | - |
25 | | -## 0.1.0 |
26 | | - |
27 | | -Initial release. |
28 | | - |
29 | | -### Features |
30 | | - |
31 | | -- **Lossless context compression** — compress/uncompress round-trip restores byte-identical originals |
32 | | -- **Code-aware classification** — fences, SQL, JSON, API keys, URLs, file paths stay verbatim |
33 | | -- **Paragraph-aware sentence scoring** — deterministic summarizer picks highest-signal sentences |
34 | | -- **Code-split messages** — prose compressed, code fences preserved inline |
35 | | -- **Exact dedup** — hash-based duplicate detection replaces earlier copies with compact references (on by default) |
36 | | -- **Fuzzy dedup** — line-level Jaccard similarity catches near-duplicate content (opt-in) |
37 | | -- **LLM summarizer** — `createSummarizer` and `createEscalatingSummarizer` for pluggable LLM-powered compression |
38 | | -- **Token budget** — `tokenBudget` option binary-searches recency window to fit a target token count |
39 | | -- **Verbatim store** — originals keyed by ID for lossless retrieval via `uncompress()` |
40 | | - |
41 | | -### API |
42 | | - |
43 | | -- `compress(messages, options?)` — sync or async depending on whether `summarizer` is provided |
44 | | -- `uncompress(messages, verbatim)` — restore originals from compressed messages + verbatim map |
45 | | -- `createSummarizer(callLlm)` — wrap an LLM call with an optimized summarization prompt |
46 | | -- `createEscalatingSummarizer(callLlm)` — three-level summarizer (normal → aggressive → deterministic) |
| 3 | +All notable changes to this project will be documented in this file. |
| 4 | + |
| 5 | +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), |
| 6 | +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). |
| 7 | + |
| 8 | +## [1.0.0] - 2025-02-24 |
| 9 | + |
| 10 | +First stable release. Published as `context-compression-engine`. |
| 11 | + |
| 12 | +### Added |
| 13 | + |
| 14 | +- Lossless context compression with `compress()` and `uncompress()` |
| 15 | +- Code-aware classification: fences, SQL, JSON/YAML, API keys, URLs, file paths preserved verbatim |
| 16 | +- Paragraph-aware sentence scoring in `summarize()` |
| 17 | +- Code-bearing message splitting to compress surrounding prose |
| 18 | +- Exact and fuzzy cross-message deduplication (enabled by default) |
| 19 | +- LLM-powered summarization with `createSummarizer()` and `createEscalatingSummarizer()` |
| 20 | +- Three-level fallback: LLM → deterministic → size guard |
| 21 | +- `tokenBudget` with binary search over `recencyWindow` |
| 22 | +- `forceConverge` hard-truncation pass for guaranteed budget convergence |
| 23 | +- Pluggable `tokenCounter` option (default: `ceil(content.length / 3.5)`) |
| 24 | +- `embedSummaryId` option to embed summary IDs directly into message content |
| 25 | +- Provenance tracking via `_cce_original` metadata (origin IDs, summary hashes, version chains) |
| 26 | +- Verbatim store for lossless round-trip (`VerbatimMap` or lookup function) |
| 27 | +- Recursive `uncompress()` for multi-round compression chains |
| 28 | +- `preserve` option for role-based message protection |
| 29 | +- `recencyWindow` to protect recent messages from compression |
| 30 | +- Tool/function result compression through the classifier |
| 31 | +- Compression stats: `ratio`, `token_ratio`, `messages_compressed`, `messages_removed` |
| 32 | +- Input validation on public API surface |
| 33 | +- 333 tests with coverage across all compression paths |
| 34 | +- Benchmark suite with synthetic and real-session scenarios |
| 35 | +- LLM benchmark with multi-provider support (Claude, GPT, Gemini, Grok, Ollama) |
| 36 | + |
| 37 | +[1.0.0]: https://github.com/SimplyLiz/ContextCompressionEngine/releases/tag/v1.0.0 |
0 commit comments