chore: add CLAUDE.md, SECURITY.md, CHANGELOG.md, .nvmrc, bump setup-node to v6

SimplyLiz · SimplyLiz · commit 935d39d47999 · 2026-02-25T07:06:09.000+01:00
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -12,7 +12,7 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v6
-      - uses: actions/setup-node@v4
+      - uses: actions/setup-node@v6
         with:
           node-version: 22
           cache: npm
@@ -23,7 +23,7 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v6
-      - uses: actions/setup-node@v4
+      - uses: actions/setup-node@v6
         with:
           node-version: 22
           cache: npm
@@ -38,7 +38,7 @@ jobs:
         node-version: [18, 20, 22]
     steps:
       - uses: actions/checkout@v6
-      - uses: actions/setup-node@v4
+      - uses: actions/setup-node@v6
         with:
           node-version: ${{ matrix.node-version }}
           cache: npm
@@ -61,7 +61,7 @@ jobs:
       id-token: write
     steps:
       - uses: actions/checkout@v6
-      - uses: actions/setup-node@v4
+      - uses: actions/setup-node@v6
         with:
           node-version: 22
           registry-url: https://registry.npmjs.org
diff --git a/.nvmrc b/.nvmrc
@@ -0,0 +1 @@
+22
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,46 +1,37 @@
 # Changelog
 
-## 1.0.0
-
-First stable release. Published as `context-compression-engine` (renamed from `@cce/core`).
-
-### Features
-
-- **Pluggable token counter** — `tokenCounter` option for accurate budget decisions with real tokenizers
-- **`forceConverge`** — hard-truncate non-recency messages when binary search bottoms out and budget is still exceeded
-- **`embedSummaryId`** — embed `summary_id` in compressed content for downstream reference
-- **Dedup target IDs** — dedup references now carry target IDs for provenance tracking
-- **Fuzzy dedup** — line-level Jaccard similarity catches near-duplicate content (opt-in)
-- **Cross-message deduplication** — exact-duplicate detection enabled by default
-- **LLM benchmark suite** — multi-provider (OpenAI, Anthropic, Ollama) head-to-head comparison
-- **Escalating summarizer** — `createEscalatingSummarizer` with three-level fallback (normal → aggressive → deterministic)
-
-### Fixes
-
-- Fix TDZ bug in summarizer initialization
-- Fix field drops and double-counting in compression stats
-- Fix pattern boundary false positives in classifier
-- Add input validation for public API entry points
-
-## 0.1.0
-
-Initial release.
-
-### Features
-
-- **Lossless context compression** — compress/uncompress round-trip restores byte-identical originals
-- **Code-aware classification** — fences, SQL, JSON, API keys, URLs, file paths stay verbatim
-- **Paragraph-aware sentence scoring** — deterministic summarizer picks highest-signal sentences
-- **Code-split messages** — prose compressed, code fences preserved inline
-- **Exact dedup** — hash-based duplicate detection replaces earlier copies with compact references (on by default)
-- **Fuzzy dedup** — line-level Jaccard similarity catches near-duplicate content (opt-in)
-- **LLM summarizer** — `createSummarizer` and `createEscalatingSummarizer` for pluggable LLM-powered compression
-- **Token budget** — `tokenBudget` option binary-searches recency window to fit a target token count
-- **Verbatim store** — originals keyed by ID for lossless retrieval via `uncompress()`
-
-### API
-
-- `compress(messages, options?)` — sync or async depending on whether `summarizer` is provided
-- `uncompress(messages, verbatim)` — restore originals from compressed messages + verbatim map
-- `createSummarizer(callLlm)` — wrap an LLM call with an optimized summarization prompt
-- `createEscalatingSummarizer(callLlm)` — three-level summarizer (normal → aggressive → deterministic)
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [1.0.0] - 2025-02-24
+
+First stable release. Published as `context-compression-engine`.
+
+### Added
+
+- Lossless context compression with `compress()` and `uncompress()`
+- Code-aware classification: fences, SQL, JSON/YAML, API keys, URLs, file paths preserved verbatim
+- Paragraph-aware sentence scoring in `summarize()`
+- Code-bearing message splitting to compress surrounding prose
+- Exact and fuzzy cross-message deduplication (enabled by default)
+- LLM-powered summarization with `createSummarizer()` and `createEscalatingSummarizer()`
+- Three-level fallback: LLM → deterministic → size guard
+- `tokenBudget` with binary search over `recencyWindow`
+- `forceConverge` hard-truncation pass for guaranteed budget convergence
+- Pluggable `tokenCounter` option (default: `ceil(content.length / 3.5)`)
+- `embedSummaryId` option to embed summary IDs directly into message content
+- Provenance tracking via `_cce_original` metadata (origin IDs, summary hashes, version chains)
+- Verbatim store for lossless round-trip (`VerbatimMap` or lookup function)
+- Recursive `uncompress()` for multi-round compression chains
+- `preserve` option for role-based message protection
+- `recencyWindow` to protect recent messages from compression
+- Tool/function result compression through the classifier
+- Compression stats: `ratio`, `token_ratio`, `messages_compressed`, `messages_removed`
+- Input validation on public API surface
+- 333 tests with coverage across all compression paths
+- Benchmark suite with synthetic and real-session scenarios
+- LLM benchmark with multi-provider support (Claude, GPT, Gemini, Grok, Ollama)
+
+[1.0.0]: https://github.com/SimplyLiz/ContextCompressionEngine/releases/tag/v1.0.0
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,68 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Commands
+
+```bash
+npm install              # Install dependencies (uses npm ci in CI)
+npm run build            # Compile TypeScript (tsc)
+npm test                 # Run Vitest once
+npm run test:coverage    # Run tests with coverage (requires Node 20+)
+npm run lint             # ESLint check
+npm run format           # Prettier write
+npm run format:check     # Prettier check
+npm run bench            # Run benchmark suite
+```
+
+Run a single test file:
+
+```bash
+npx vitest run tests/classify.test.ts
+```
+
+## Architecture
+
+Single-package ESM library with zero dependencies. Compresses LLM message arrays by summarizing prose while preserving code, structured data, and technical content verbatim. Every compression is losslessly reversible via a verbatim store.
+
+### Compression pipeline
+
+```
+messages → classify → dedup → merge → summarize → size guard → result
+```
+
+- **classify** (`src/classify.ts`) — three-tier classification (T0 = preserve verbatim, T2 = compressible prose, T3 = filler/removable). Uses structural pattern detection (code fences, JSON, YAML, LaTeX), SQL/API-key anchors, and prose density scoring.
+- **dedup** (`src/dedup.ts`) — exact (djb2 hash + full comparison) and fuzzy (line-level Jaccard similarity) duplicate detection. Earlier duplicates are replaced with compact references.
+- **compress** (`src/compress.ts`) — orchestrator. Handles message merging, code-bearing message splitting (prose compressed, fences preserved inline), budget binary search over `recencyWindow`, and `forceConverge` hard-truncation.
+- **summarize** (internal in `compress.ts`) — deterministic sentence scoring: rewards technical identifiers (camelCase, snake_case), emphasis phrases, status words; penalizes filler. Paragraph-aware to keep topic boundaries.
+- **summarizer** (`src/summarizer.ts`) — LLM-powered summarization. `createSummarizer` wraps an LLM call with a prompt template. `createEscalatingSummarizer` adds three-level fallback: normal → aggressive → deterministic.
+- **expand** (`src/expand.ts`) — `uncompress()` restores originals from a `VerbatimMap` or lookup function. Supports recursive expansion for multi-round compression chains (max depth 10).
+
+### Key data flow concepts
+
+- **Provenance** — every compressed message carries `metadata._cce_original` with `ids` (source message IDs into `verbatim`), `summary_id` (djb2 hash), and `parent_ids` (chain from prior compressions).
+- **Verbatim store** — `compress()` returns `{ messages, verbatim }`. Both must be persisted atomically. `uncompress()` reports `missing_ids` when verbatim entries are absent.
+- **Token budget** — when `tokenBudget` is set, binary search finds the largest `recencyWindow` that fits. Each iteration runs the full pipeline. `forceConverge` hard-truncates if the search bottoms out.
+- **Sync/async** — `compress()` is synchronous by default. Providing a `summarizer` makes it return a `Promise`.
+
+## Branching Strategy
+
+```
+main ← develop ← feature branches
+```
+
+- **`develop`** — default branch, all day-to-day work and PRs target here
+- **`main`** — stable releases only, merge develop → main when releasing
+- **Feature branches** — branch off `develop`, PR back to `develop`
+- **Tags** `v*.*.*` on `main` — trigger CI → publish to npm
+- **Dependabot** PRs target `develop`
+
+## Code Conventions
+
+- **TypeScript:** ES2020 target, NodeNext module resolution, strict mode, ESM-only
+- **Unused params** must be prefixed with `_` (ESLint enforced)
+- **Prettier:** 100 char width, 2-space indent, single quotes, trailing commas, semicolons
+- **Tests:** Vitest 4, test files in `tests/`, coverage via `@vitest/coverage-v8` (Node 20+ only)
+- **Node version:** ≥18 (.nvmrc: 22)
+- **Always run `npm run format` before committing** — CI enforces `format:check`
+- **No author/co-author attribution** in commits, code, or docs
diff --git a/SECURITY.md b/SECURITY.md
@@ -0,0 +1,26 @@
+# Security Policy
+
+## Supported Versions
+
+| Version | Supported |
+| ------- | --------- |
+| 1.x     | Yes       |
+
+## Reporting a Vulnerability
+
+If you discover a security issue, please report it responsibly.
+
+**Do not open a public GitHub issue for security vulnerabilities.**
+
+Instead, email [lisa@tastehub.io](mailto:lisa@tastehub.io) with:
+
+- A description of the vulnerability
+- Steps to reproduce
+- Potential impact
+- Suggested fix (if any)
+
+You can expect an initial response within 72 hours. We will work with you to understand the issue and coordinate a fix before any public disclosure.
+
+## Scope
+
+This policy applies to the `context-compression-engine` package published to npm, as well as the source code in this repository.