Skip to content

Commit 935d39d

Browse files
committed
chore: add CLAUDE.md, SECURITY.md, CHANGELOG.md, .nvmrc, bump setup-node to v6
1 parent 003525d commit 935d39d

5 files changed

Lines changed: 134 additions & 48 deletions

File tree

.github/workflows/ci.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ jobs:
1212
runs-on: ubuntu-latest
1313
steps:
1414
- uses: actions/checkout@v6
15-
- uses: actions/setup-node@v4
15+
- uses: actions/setup-node@v6
1616
with:
1717
node-version: 22
1818
cache: npm
@@ -23,7 +23,7 @@ jobs:
2323
runs-on: ubuntu-latest
2424
steps:
2525
- uses: actions/checkout@v6
26-
- uses: actions/setup-node@v4
26+
- uses: actions/setup-node@v6
2727
with:
2828
node-version: 22
2929
cache: npm
@@ -38,7 +38,7 @@ jobs:
3838
node-version: [18, 20, 22]
3939
steps:
4040
- uses: actions/checkout@v6
41-
- uses: actions/setup-node@v4
41+
- uses: actions/setup-node@v6
4242
with:
4343
node-version: ${{ matrix.node-version }}
4444
cache: npm
@@ -61,7 +61,7 @@ jobs:
6161
id-token: write
6262
steps:
6363
- uses: actions/checkout@v6
64-
- uses: actions/setup-node@v4
64+
- uses: actions/setup-node@v6
6565
with:
6666
node-version: 22
6767
registry-url: https://registry.npmjs.org

.nvmrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
22

CHANGELOG.md

Lines changed: 35 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,37 @@
11
# Changelog
22

3-
## 1.0.0
4-
5-
First stable release. Published as `context-compression-engine` (renamed from `@cce/core`).
6-
7-
### Features
8-
9-
- **Pluggable token counter**`tokenCounter` option for accurate budget decisions with real tokenizers
10-
- **`forceConverge`** — hard-truncate non-recency messages when binary search bottoms out and budget is still exceeded
11-
- **`embedSummaryId`** — embed `summary_id` in compressed content for downstream reference
12-
- **Dedup target IDs** — dedup references now carry target IDs for provenance tracking
13-
- **Fuzzy dedup** — line-level Jaccard similarity catches near-duplicate content (opt-in)
14-
- **Cross-message deduplication** — exact-duplicate detection enabled by default
15-
- **LLM benchmark suite** — multi-provider (OpenAI, Anthropic, Ollama) head-to-head comparison
16-
- **Escalating summarizer**`createEscalatingSummarizer` with three-level fallback (normal → aggressive → deterministic)
17-
18-
### Fixes
19-
20-
- Fix TDZ bug in summarizer initialization
21-
- Fix field drops and double-counting in compression stats
22-
- Fix pattern boundary false positives in classifier
23-
- Add input validation for public API entry points
24-
25-
## 0.1.0
26-
27-
Initial release.
28-
29-
### Features
30-
31-
- **Lossless context compression** — compress/uncompress round-trip restores byte-identical originals
32-
- **Code-aware classification** — fences, SQL, JSON, API keys, URLs, file paths stay verbatim
33-
- **Paragraph-aware sentence scoring** — deterministic summarizer picks highest-signal sentences
34-
- **Code-split messages** — prose compressed, code fences preserved inline
35-
- **Exact dedup** — hash-based duplicate detection replaces earlier copies with compact references (on by default)
36-
- **Fuzzy dedup** — line-level Jaccard similarity catches near-duplicate content (opt-in)
37-
- **LLM summarizer**`createSummarizer` and `createEscalatingSummarizer` for pluggable LLM-powered compression
38-
- **Token budget**`tokenBudget` option binary-searches recency window to fit a target token count
39-
- **Verbatim store** — originals keyed by ID for lossless retrieval via `uncompress()`
40-
41-
### API
42-
43-
- `compress(messages, options?)` — sync or async depending on whether `summarizer` is provided
44-
- `uncompress(messages, verbatim)` — restore originals from compressed messages + verbatim map
45-
- `createSummarizer(callLlm)` — wrap an LLM call with an optimized summarization prompt
46-
- `createEscalatingSummarizer(callLlm)` — three-level summarizer (normal → aggressive → deterministic)
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [1.0.0] - 2025-02-24
9+
10+
First stable release. Published as `context-compression-engine`.
11+
12+
### Added
13+
14+
- Lossless context compression with `compress()` and `uncompress()`
15+
- Code-aware classification: fences, SQL, JSON/YAML, API keys, URLs, file paths preserved verbatim
16+
- Paragraph-aware sentence scoring in `summarize()`
17+
- Code-bearing message splitting to compress surrounding prose
18+
- Exact and fuzzy cross-message deduplication (enabled by default)
19+
- LLM-powered summarization with `createSummarizer()` and `createEscalatingSummarizer()`
20+
- Three-level fallback: LLM → deterministic → size guard
21+
- `tokenBudget` with binary search over `recencyWindow`
22+
- `forceConverge` hard-truncation pass for guaranteed budget convergence
23+
- Pluggable `tokenCounter` option (default: `ceil(content.length / 3.5)`)
24+
- `embedSummaryId` option to embed summary IDs directly into message content
25+
- Provenance tracking via `_cce_original` metadata (origin IDs, summary hashes, version chains)
26+
- Verbatim store for lossless round-trip (`VerbatimMap` or lookup function)
27+
- Recursive `uncompress()` for multi-round compression chains
28+
- `preserve` option for role-based message protection
29+
- `recencyWindow` to protect recent messages from compression
30+
- Tool/function result compression through the classifier
31+
- Compression stats: `ratio`, `token_ratio`, `messages_compressed`, `messages_removed`
32+
- Input validation on public API surface
33+
- 333 tests with coverage across all compression paths
34+
- Benchmark suite with synthetic and real-session scenarios
35+
- LLM benchmark with multi-provider support (Claude, GPT, Gemini, Grok, Ollama)
36+
37+
[1.0.0]: https://github.com/SimplyLiz/ContextCompressionEngine/releases/tag/v1.0.0

CLAUDE.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Commands
6+
7+
```bash
8+
npm install # Install dependencies (uses npm ci in CI)
9+
npm run build # Compile TypeScript (tsc)
10+
npm test # Run Vitest once
11+
npm run test:coverage # Run tests with coverage (requires Node 20+)
12+
npm run lint # ESLint check
13+
npm run format # Prettier write
14+
npm run format:check # Prettier check
15+
npm run bench # Run benchmark suite
16+
```
17+
18+
Run a single test file:
19+
20+
```bash
21+
npx vitest run tests/classify.test.ts
22+
```
23+
24+
## Architecture
25+
26+
Single-package ESM library with zero dependencies. Compresses LLM message arrays by summarizing prose while preserving code, structured data, and technical content verbatim. Every compression is losslessly reversible via a verbatim store.
27+
28+
### Compression pipeline
29+
30+
```
31+
messages → classify → dedup → merge → summarize → size guard → result
32+
```
33+
34+
- **classify** (`src/classify.ts`) — three-tier classification (T0 = preserve verbatim, T2 = compressible prose, T3 = filler/removable). Uses structural pattern detection (code fences, JSON, YAML, LaTeX), SQL/API-key anchors, and prose density scoring.
35+
- **dedup** (`src/dedup.ts`) — exact (djb2 hash + full comparison) and fuzzy (line-level Jaccard similarity) duplicate detection. Earlier duplicates are replaced with compact references.
36+
- **compress** (`src/compress.ts`) — orchestrator. Handles message merging, code-bearing message splitting (prose compressed, fences preserved inline), budget binary search over `recencyWindow`, and `forceConverge` hard-truncation.
37+
- **summarize** (internal in `compress.ts`) — deterministic sentence scoring: rewards technical identifiers (camelCase, snake_case), emphasis phrases, status words; penalizes filler. Paragraph-aware to keep topic boundaries.
38+
- **summarizer** (`src/summarizer.ts`) — LLM-powered summarization. `createSummarizer` wraps an LLM call with a prompt template. `createEscalatingSummarizer` adds three-level fallback: normal → aggressive → deterministic.
39+
- **expand** (`src/expand.ts`) — `uncompress()` restores originals from a `VerbatimMap` or lookup function. Supports recursive expansion for multi-round compression chains (max depth 10).
40+
41+
### Key data flow concepts
42+
43+
- **Provenance** — every compressed message carries `metadata._cce_original` with `ids` (source message IDs into `verbatim`), `summary_id` (djb2 hash), and `parent_ids` (chain from prior compressions).
44+
- **Verbatim store**`compress()` returns `{ messages, verbatim }`. Both must be persisted atomically. `uncompress()` reports `missing_ids` when verbatim entries are absent.
45+
- **Token budget** — when `tokenBudget` is set, binary search finds the largest `recencyWindow` that fits. Each iteration runs the full pipeline. `forceConverge` hard-truncates if the search bottoms out.
46+
- **Sync/async**`compress()` is synchronous by default. Providing a `summarizer` makes it return a `Promise`.
47+
48+
## Branching Strategy
49+
50+
```
51+
main ← develop ← feature branches
52+
```
53+
54+
- **`develop`** — default branch, all day-to-day work and PRs target here
55+
- **`main`** — stable releases only, merge develop → main when releasing
56+
- **Feature branches** — branch off `develop`, PR back to `develop`
57+
- **Tags** `v*.*.*` on `main` — trigger CI → publish to npm
58+
- **Dependabot** PRs target `develop`
59+
60+
## Code Conventions
61+
62+
- **TypeScript:** ES2020 target, NodeNext module resolution, strict mode, ESM-only
63+
- **Unused params** must be prefixed with `_` (ESLint enforced)
64+
- **Prettier:** 100 char width, 2-space indent, single quotes, trailing commas, semicolons
65+
- **Tests:** Vitest 4, test files in `tests/`, coverage via `@vitest/coverage-v8` (Node 20+ only)
66+
- **Node version:** ≥18 (.nvmrc: 22)
67+
- **Always run `npm run format` before committing** — CI enforces `format:check`
68+
- **No author/co-author attribution** in commits, code, or docs

SECURITY.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Security Policy
2+
3+
## Supported Versions
4+
5+
| Version | Supported |
6+
| ------- | --------- |
7+
| 1.x | Yes |
8+
9+
## Reporting a Vulnerability
10+
11+
If you discover a security issue, please report it responsibly.
12+
13+
**Do not open a public GitHub issue for security vulnerabilities.**
14+
15+
Instead, email [lisa@tastehub.io](mailto:lisa@tastehub.io) with:
16+
17+
- A description of the vulnerability
18+
- Steps to reproduce
19+
- Potential impact
20+
- Suggested fix (if any)
21+
22+
You can expect an initial response within 72 hours. We will work with you to understand the issue and coordinate a fix before any public disclosure.
23+
24+
## Scope
25+
26+
This policy applies to the `context-compression-engine` package published to npm, as well as the source code in this repository.

0 commit comments

Comments
 (0)