Skip to content

Latest commit

 

History

History
2343 lines (1878 loc) · 100 KB

File metadata and controls

2343 lines (1878 loc) · 100 KB

Changelog

All notable changes to CKB will be documented in this file.

[Unreleased]

[9.2.0] - 2026-04-25

Added

  • analyzeOutgoingImpact — forward call graph (MCP + CLI) — mirror of analyzeImpact answering "what does this symbol call?" instead of "who calls it?". New Engine.AnalyzeOutgoingImpact drives off LIP v2.3.5's query_outgoing_impact RPC, folds the result through the same ImpactItem pipeline as the incoming side (with direct-callee / transitive-callee kinds), and surfaces semantically coupled callees alongside the static graph. Degrades cleanly when LIP isn't running: the response is empty with a provenance warning, never an error. Surfaces include ckb impact outgoing <symbolId> (with --min-score for the semantic threshold), the analyzeOutgoingImpact MCP tool, and a new ProvenanceCLI.Warnings field so LIP-degradation messages reach JSON consumers.
  • symbolExists MCP tool — exact-match boolean oracle that returns {exists, kind, location?} for a fully-qualified symbol ID. Built for LLMs to ground references before they cite them in code, without spending tokens on a 20-result searchSymbols payload. Cheaper than getSymbol for the "does this thing actually exist" check.
  • LIP enrichment folds into analyzeImpact — tier-1 tree-sitter callers that LIP discovers (when scip-go emits no Call roles, e.g. Go method dispatch) are now folded into the same directImpact / transitiveImpact lists as SCIP's own results, deduplicated by (file, name). Driven by a new BlastRadiusEnricher interface so the fold path is the single source of truth for both incoming and outgoing impact analysis. Items LIP marks edges_source=empty are skipped (LIP signalling no static evidence); tier1, scip_with_tier1_edges, and scip_only all fold the same way. Risk score now picks up semantic-coupling signals via the same enricher pipeline.
  • register_project_root on LIP handshake — Engine startup now registers the repo root with the daemon so LIP canonicalises file URIs against a known anchor, matching the v2.3.1 contract. Eliminates the URI-shape drift that previously caused tier-1 callers to dedup incorrectly against SCIP results.

Changed

  • analyzeImpact risk score now weighted by bridge centralitycalculateAggregatedRisk multiplies the weighted-mean score by 1 + max(BridgeScore)/1000 (capped at 2.0) over the changed files, so a change landing on a critical architectural path (high betweenness) is reported as riskier than the same-shape change in a leaf module. Implements the behaviour that CARTOGRAPHER_STRATEGY.md had already documented but the code was not actually doing. Bridge lookups match by both Path and ModuleID; if no changed file matches the graph, the multiplier is 1.0 and no informational factor is appended. Only runs when the binary was built with -tags cartographer (graph is a no-op otherwise). A new bridge_centrality informational factor surfaces in RiskScore.Factors when the multiplier fires; its Weight is 0 because it applies multiplicatively, not as a weighted-mean input.

Cartographer

  • Vendored Cartographer fully synced to upstream 3.0.0 — the vendored tree under third_party/cartographer/mapper-core/cartographer/ was 391 lines behind on diagram.rs alone, and 10 .rs files plus Cargo.toml had drifted. Full sync brings in doc-node graph support (cartographer_doc_index, cartographer_doc_context, cartographer_query_docs FFI entry points — Go bindings can be added as a follow-up), LIP-style Range / at_range on GraphEdge, PascalCase bare-identifier resolution for doc backtick refs, and the overlays feature on diagrams. New scripts/sync-cartographer.sh is now the supported path for future syncs — rsync-based, explicit path list, emits next-step commands. No local patches needed against upstream.
  • Diagram overlays in renderArchitecture / ckb diagram — the vendored diagram.rs was synced from upstream Cartographer, so the Mermaid/DOT output now decorates the base import graph with architectural signals: cycle members get a thick red border (pivots dashed), cycle-internal edges a heavy red arrow, layer violations pick up per-type dashed/dotted edge styling, and hot nodes (hotspot_score ≥ 70) get an orange border plus DOT size scaling. Mermaid is border-only for hot nodes (no sizing primitive). Cycle red takes precedence over hot orange on the same node — architectural signal wins over performance signal.
  • renderArchitecture MCP tool — returns the project's module-level import graph as Mermaid or Graphviz (DOT), ready to paste into IDEs that render Mermaid inline (Cursor, Claude Desktop, VS Code markdown preview, GitHub). With focus set, returns an undirected BFS neighborhood around the anchor module to depth (default 2); without, returns the top-N most-connected nodes (default cap 40). Response includes truncated: true when the node cap kicked in. Backed by the new cartographer_render_architecture FFI export; CLI and MCP outputs are produced by the same shared renderer.
  • Go binding cartographer.RenderArchitecture() in internal/cartographer/bridge.go (+ no-op stub for the no-tag build).

Fixed

  • Vendored Cartographer rebuild_graph deadlock — upstream ApiState::rebuild_graph held the mapped_files Mutex across its loop and then called resolve_import_target, which re-acquired the same non-reentrant std::sync::Mutex. Any project with a resolvable import deadlocked — the cartographer diagram / cartographer health CLIs hung, and the Go bridge's cartographer.MapProject would block any time CKB fed it a repo with imports. Fixed in the vendored tree (and contributed back upstream) by splitting the resolver: a public method that locks, and a private helper that takes the already-held map; rebuild_graph now calls the helper. Discovered during end-to-end smoke testing against CKB itself (1093 files). Regression test added upstream.
  • localize-tree-sitter-symbols.sh dropped grammar C parsers — the script extracted archive members via ar x, which silently clobbers files when multiple members share a name. Cargo emits a parser.o and scanner.o per grammar crate (tree-sitter-c, -cpp, -rust, -go, etc.), so ar x left only the last grammar's C parser on disk, producing a localized archive missing _tree_sitter_c / _tree_sitter_cpp. The script now feeds the archive directly to ld -r with -force_load (Mach-O) / --whole-archive (ELF), which pulls every member in without touching the filesystem. The rust_tree_sitter C ABI refs to _tree_sitter_c and _tree_sitter_cpp now resolve inside the combined object as expected.
  • Tree-sitter symbol collisions at link timelibcartographer.a previously exported its bundled tree-sitter runtime and grammar symbols, which collided with go-tree-sitter when building CKB with -tags cartographer (ld: 246 duplicate symbols). make build-cartographer now post-processes the archive via scripts/localize-tree-sitter-symbols.sh (vendored under third_party/cartographer/mapper-core/cartographer/scripts/), which partial-links archive members into one combined object and localizes ts_* / tree_sitter_*. cartographer_* FFI exports stay global. Beyond the duplicate-symbol error, this also rules out a silent memory-corruption class of bug where Cartographer's Rust code could have bound to the consumer's tree-sitter copy at global resolution time if the two versions' struct layouts ever drifted.

[9.1.0] - 2026-04-16

Added

  • LIP v2.1 utilisation — three high-ROI LIP RPCs wired into the query engine, gated on the handshake's supported_messages:
    • stream_context (v2.1) → explainFile attaches up to 10 semantically-related symbols (2048-token budget) in facts.related. New streaming transport reads N symbol_info frames + end_stream.
    • query_expansion (v1.6) → searchSymbols expands ≤ 2-token queries with up to 5 related terms before FTS5, recovering vocabulary-mismatch recall without touching precision on compound queries.
    • explain_match (v2.0) → semantic search hits carry up to two ranked evidence chunks with line ranges, text, and per-chunk scores (top-5 hits, bounded round-trip cost).
  • lip.Handshake runs on engine startup and the daemon's supported_messages list is stashed for feature gating (Engine.lipSupports). Daemon version and supported-count logged.
  • LIP index status probingprobeHandshake now follows up with IndexStatus and caches the result. New Engine.LIPStatus() returns {Reachable, IndexedFiles} so consumers can distinguish "daemon down" from "daemon up, nothing indexed."
  • ckb review warns when LIP index is empty — stderr advisory with lip index <repo> command when daemon is reachable but has no content. Suppressed in --ci to keep CI logs clean.
  • NoAutoFetch option on SummarizePROptions and SummarizeDiffOptions for parity with ReviewPROptions.
  • Troubleshooting section in docs/plans/review-cicd.md covering shallow CI clones, auth-failure remediation, air-gapped pipelines, and depth-0 checkout alternatives.
  • Auth-error detection on auto-fetch with clear remediation guidance.
  • ckb review --no-auto-fetch flag for air-gapped pipelines.
  • Test coverage for GitAdapter.EnsureRef — happy path, missing-ref auto-fetch, unreachable origin, and empty-input guard.

Changed

  • LIP health: push-driven, not polled — Engine opens a long-lived connection to the daemon at startup (internal/lip/subscribe.go) with index_changed frames and per-ping index_status snapshots instead of 60 s TTL polling. Worst-case staleness drops from 60 s to ~3 s.
  • lipFileURI path normalisation — handles absolute paths and already-prefixed file:// URIs without producing malformed results.

Fixed

  • Bug-pattern false positive on sync.Mutex.Lock() — removed "Lock" from LikelyReturnsError heuristic patterns; sync.Mutex.Lock returns nothing and dominated real-world matches with false positives.

  • err shadowing in subscribe.go — four shadow sites eliminated by reusing outer err or renaming to pingErr/readErr where scope isolation requires it.

  • LIP rerank: coherence gate + position-weighted seeding (#209) — the Fast-tier semantic rerank (internal/query/lip_ranker.go) used to average the top-5 seed embeddings with uniform weight and always apply the result. When the top-5 pointed in different directions the centroid collapsed toward zero and amplified noise; when the top seed was strong the blend still diluted it. Seeds are now L2-normalised and position-weighted (1/(rank+1)), the resulting centroid norm is read as a coherence score in [0, 1], and the rerank falls back to pure lexical order when coherence is below MinCoherence (default 0.35). Blend weights, seed count, and threshold are surfaced as RerankConfig so future tuning does not need to touch call sites. Injected embedBatchFn makes the ranker unit-testable without a running daemon.

  • LIP rerank: gate on !MixedModels (#208) — when the LIP index contains vectors from more than one embedding model (e.g. partial re-index during a model upgrade), cosine similarity across those vectors is mathematically meaningless. RerankWithLIP and SemanticSearchWithLIP now consult a cached Engine.lipSemanticAvailable() check (60 s TTL, single IndexStatus RPC) and fall back to lexical ranking when the daemon is down or reports mixed_models. A new lip_mixed_models degradation warning (70% capability) surfaces in response metadata so users learn why results look weaker instead of silently ranking on garbage.

[9.0.1] - 2026-04-15

Fixed

  • ckb review in shallow CI clones — Azure Pipelines, GitHub Actions, and GitLab default to shallow single-branch checkouts, so ckb review --base=main failed with exit 128 because the base ref was not present locally. The review path (and summarizePr / summarizeDiff) now auto-fetch the base ref from origin when it is missing, falling through to origin/<branch>. No pipeline changes required. No cost for full clones.
  • Opaque git errorsGitAdapter.executeGitCommand previously wrapped git failures as "Git command failed: exit status 128" with git's actual stderr hidden in a details map. The stderr (e.g. fatal: bad revision) is now part of the error message, making CI failures diagnosable without reproduction.

[9.0.0] - 2026-04-13

Added

LIP v2.0 semantic integration

CKB now speaks the LIP v2.0 wire protocol correctly and integrates semantic embeddings across the tool suite. The existing internal/lip client had the wrong JSON discriminator ("action" instead of "type") and wrong action strings, meaning all LIP calls were silently failing. The client has been rewritten with the correct Serde-tagged format and 25 new functions covering LIP v1.5–v2.0.

Wire protocol fix — all requests now use "type" as the discriminator with snake_case variant names matching Rust's #[serde(tag = "type", rename_all = "snake_case")]. Field names corrected throughout (e.g. "symbol_uri" not "uri" for annotation queries).

New LIP client functions (internal/lip/client.go):

Function LIP version Purpose
Handshake v1.5 Protocol handshake, returns daemon + protocol versions
BatchNearestByText v1.5 Parallel nearest-neighbour for multiple queries
NearestBySymbol v1.5 Nearest neighbours by lip:// symbol URI
BatchAnnotationGet v1.5 Bulk annotation lookup
ReindexFiles v1.6 Trigger reindex for specific URIs
Similarity v1.6 Cosine similarity between two files
QueryExpansion v1.6 Expand a query with semantically related terms
Cluster v1.6 Group files by semantic proximity
ExportEmbeddings v1.6 Raw embedding export
NearestByContrast v1.7 Like-URI minus unlike-URI retrieval
Outliers v1.7 Semantically isolated files
SemanticDrift v1.7 Cosine distance between two files
SimilarityMatrix v1.7 Pairwise similarity matrix
FindSemanticCounterpart v1.7 Best match for a file within a candidate set
Coverage v1.7 Embedding coverage stats by directory
FindBoundaries v1.8 Semantic boundary detection within a file
SemanticDiff v1.8 Compare two text blobs by embedding distance
NearestInStore v1.8 Nearest neighbours against an in-memory store
NoveltyScore v1.8 Per-file novelty (0–1, higher = fewer neighbours)
ExtractTerminology v1.8 Domain term extraction from a file set
PruneDeleted v1.8 Remove embeddings for deleted files
GetCentroid v1.9 Mean embedding vector for a file set
StaleEmbeddings v1.9 Files with out-of-date embeddings
NearestByTextFiltered v1.9 Nearest-by-text with glob filter + min score
NearestByFileFiltered v1.9 Nearest-by-file with glob filter + min score
ExplainMatch v2.0 Chunk-level explanation of why a file matched a query

Response types added: HandshakeInfo, CoverageInfo, DirCoverage, BoundaryRange, SemanticDiffInfo, NoveltyInfo, NoveltyItem, TermItem, ExplanationChunk. IndexStatusInfo gains MixedModels bool and ModelsInIndex []string. FileStatusInfo gains EmbeddingModel string.

searchSymbols — semantic fallback + re-ranking with filter

SemanticSearchWithLIP now accepts filter string and minScore float32 parameters and delegates to NearestByTextFiltered. The filter accepts glob patterns (e.g. "internal/api/**") to restrict semantic results to a subtree. Call site in symbols.go updated; existing callers pass "", 0 for unchanged behaviour.

reviewPRsemantic-novelty check

A new semantic-novelty check runs alongside the existing 20 review checks. It calls NoveltyScore on changed files and flags any with a score ≥ 0.7 as "semantically novel" — files with few neighbours in the embedding index that may lack test coverage. Degrades silently when LIP is unavailable; skipped automatically when fewer than 2 files are changed.

getAffectedTests — semantic test discovery

After the SCIP-based test collection pass, a LIP pass runs NearestByFileFiltered(fileURI, 5, "*_test.go", 0.6) for each changed source file. Matching test files are added to the result with Reason: "semantic-proximity" and Confidence set to the LIP score. Files already found by SCIP are not duplicated.

explainFile — semantic boundary detection

After the existing symbol analysis, toolExplainFile calls FindBoundaries(fileURI, 0, 0, "") (defaults: 30-line chunks, 0.3 threshold) and appends a semantic_boundaries array to the response:

[{"start_line": 1, "end_line": 45, "shift_magnitude": 0.71, "nearest_symbol": "handleAuth"}]

Silently omitted when LIP is unavailable or returns no boundaries.

getArchitecture — semantic coupling matrix

After structural architecture data is assembled, toolGetArchitecture collects representative file URIs for each module, calls SimilarityMatrix, and adds a semantic_coupling field:

{"modules": ["internal/auth", "internal/api"], "matrix": [[1.0, 0.74], [0.74, 1.0]]}

Also calls GetCentroid over up to 500 repo files and records repo_centroid_included: N in the response metadata. Both are silently omitted when LIP is unavailable or fewer than 2 modules are embedded.

doctor — LIP coverage + stale embedding + model provenance

The LIP health section in ckb doctor now reports:

  • Coverage: N% embedded (Y/Z files)
  • Stale embeddings: count of files with out-of-date vectors
  • Mixed-model warning when multiple embedding models are present in the index
  • List of active embedding models

Changed

  • NearestByFile and NearestByText are now thin wrappers over NearestByFileFiltered and NearestByTextFiltered respectively
  • GetEmbedding and GetSymbolEmbedding delegate to GetEmbeddingsBatch (the old "embedding_get" and "symbol_embedding" wire variants had no corresponding Rust enum variants)

[8.5.0] - 2026-04-11

Added

Cartographer bundled as git subtree (third_party/cartographer/)

Cartographer is now vendored directly into the repo instead of requiring a sibling directory at ../../../../Cartographer/. Contributors no longer need two repos co-located. Update via:

git subtree pull --prefix third_party/cartographer \
  https://github.com/SimplyLiz/Cartographer.git master --squash

Three new MCP tools (Cartographer-backed)

detectShotgunSurgery — Detect files that historically required simultaneous edits across many unrelated files. Ranked by co-change dispersion score.

detectShotgunSurgery(repo_path: "/path/to/repo", min_partners: 3, limit: 100)

getArchitecturalEvolution — Architectural health snapshots over git history. Returns health score trend (improving/stable/degrading), debt indicators, and recommendations.

getArchitecturalEvolution(repo_path: "/path/to/repo", days: 90)

getBlastRadius — Graph-theoretic blast radius for a file or module. Works without a SCIP index; complements analyzeImpact for unindexed repos.

getBlastRadius(repo_path: "/path/to/repo", target: "src/core/engine.go", max_related: 50)

LIP semantic search (GetEmbedding)

internal/lip now exposes GetEmbedding(uri, model) — requests a TurboQuant-quantized embedding vector from the LIP daemon for a given file URI. Returns []float32 suitable for direct dot-product similarity ranking without dequantization. Degrades silently when LIP is not running.

Performance

SCIP loader: lazy CallerIndex — eliminates load-time regression on small indexes

The caller inverted index (CallerIndex) is now built on the first FindCallers call rather than at LoadIndex time. This removes ~22k persistent heap objects from the initial SCIP load on small indexes (1k docs), which were causing elevated GC pressure and a measurable load-time regression. Medium/large indexes are unaffected — the index is built once and cached thereafter.

Benchmark impact vs v8.4.0 (small, 1k docs): load alloc count is unchanged (~375.6k in both versions — the CallerIndex for 1k docs is not large enough to register in alloc counts). The win is GC liveness: ~22k heap objects that would have been promoted to old-gen are no longer live after load. No change for medium/large.

SCIP loader: DiscardUnknown proto decode

Both proto.Unmarshal calls in the document stream parser now use proto.UnmarshalOptions{DiscardUnknown: true}. This skips the reflection-based unknown-field accumulator, reducing allocations during SCIP file decode.

Measured vs v8.4.0 (medium, 10k docs):

  • B/op: 909 MiB → 781 MiB (-14.10%)
  • allocs/op: 6.94M → 6.64M (-4.27%)

Small and large indexes show no measurable change (unknown-field savings are proportionally smaller there).

CallerIndex builder: generation-counter deduplication

buildCallerIndex now reuses the ivs interval slice across documents (resliced to zero, grown only when needed) and replaces the per-document map[edge]bool with a generation counter (map[edge]uint64). Eliminates ~2k per-load allocs on the 1k-doc case and removes all per-document map allocs on medium/large.

PopulateFromFullIndexStreaming: two-pass streaming to prevent OOM on large repos

PopulateFromFullIndex has always called LoadSCIPIndex which materialises the entire *SCIPIndex in memory before processing a single file. On a 50k-doc monorepo this peaks at ~15 GB and causes sustained GC pressure (observed: 485s first run vs a consistent 83s with streaming).

PopulateFromFullIndexStreaming replaces this with a two-pass strategy over the on-disk SCIP file (via scip.StreamDocuments), never materialising the full index:

  • Pass 1: build the symbol→file map — one *scippb.Document live at a time, freed by GC before the next arrives. Peak live heap ≈ the symbolToFile map alone.
  • Pass 2: stream documents again, extract deltas via the new proto-native extractFileDeltaFromProto (skips all convertDocument allocations), write SQL in 1000-file batches.

extractFileDeltaFromProto works directly on *scippb.Document so there are no intermediate *scip.Document / *scip.Occurrence / *scip.SymbolInformation allocations per document per pass.

Benchmark vs PopulateFromFullIndex (50k docs, Apple M4 Pro, -count=2):

current streaming delta
B/op 15.69 GB 15.23 GB -2.9%
allocs/op 166.4M 181.8M +9.3%
time (cold) 485s 83s -83%
time (warm) 122s 83s -32%

The extra allocs/op come from two proto-unmarshal passes vs one (plus convertDocument in the current path). The time improvement reflects reduced GC pressure: streaming never has more than one document live at a time, so GC never needs to scan or collect the 15 GB of live SCIPIndex data.

Incremental write path: major throughput improvements (landed in v8.4.0)

The following improvements shipped in v8.4.0 and are reflected in the v8.4.0 benchmark baseline. Documented here for completeness:

  • Parallel extractFileDelta: GOMAXPROCS worker goroutines extract file deltas concurrently during PopulateFromFullIndex. Cuts large-repo population time by the number of available cores.
  • Batched transactions (1000 files/tx): WAL stays bounded on 50k-file indexes instead of growing to multi-GB. Eliminates the 10h+ timeout on large repos.
  • PRAGMA synchronous=OFF during bulk load: safe because a failed full index is always re-run from scratch.
  • Bulk INSERT for file_symbols: 499-row multi-value INSERT batches reduce round-trips from 50k to ~100 for large repos.
  • Hoisted prepared statements in ApplyDelta: symbol, callgraph, and file_deps statements prepared once per delta instead of once per file.

Benchmark vs v8.2.1 (v8.4.0 baseline):

  • ApplyDelta/large (50k files): 50s → 42s (-16%)
  • ExtractFileDelta/50syms: 109µs → 90µs (-17%)
  • GetDependencies/1000files: 7.0ms → 6.3ms (-10%)
  • SCIP allocs geomean: -12% (backing-slice OccurrenceRef optimization)

SCIP loader: O(1) FindCallers via CallerIndex (landed in v8.4.0)

FindCallers was O(docs × funcs × occs). It now uses an inverted map built from Documents, making every caller lookup O(1). The index uses a sorted interval scan with early-break for function containment and a generation-counter for cross-document edge deduplication.

[8.3.0] - 2026-03-27

Added

Compliance Audit (ckb audit compliance)

Full regulatory compliance auditing with 131 checks across 20 frameworks:

ckb audit compliance --framework=gdpr,iso27001    # Specific frameworks
ckb audit compliance --framework=all              # All 20 frameworks
ckb audit compliance --recommend                  # Auto-detect applicable frameworks
ckb audit compliance --framework=gdpr --ci        # CI mode with exit codes

20 frameworks: GDPR, CCPA, ISO 27701, EU AI Act, ISO 27001, NIST 800-53, OWASP ASVS, SOC 2, PCI DSS, HIPAA, DORA, NIS2, FDA 21 CFR Part 11, EU CRA, SBOM/SLSA, DO-178C, IEC 61508, ISO 26262, MISRA C, IEC 62443.

Cross-framework mapping: A single finding (e.g., hardcoded credential) automatically surfaces all applicable regulations with specific clause references and CWE IDs.

Framework recommendation (--recommend): Scans codebase for indicators (HTTP handlers, PII fields, database imports, payment SDKs) and recommends applicable frameworks with confidence scores.

Output formats: human, json, markdown, sarif.

MCP tool: auditCompliance — runs compliance audit via MCP using the persistent SCIP index.

MCP Tools: listSymbols and getSymbolGraph

listSymbols — Bulk symbol listing without search query:

listSymbols(scope: "src/services/", kinds: ["function"], minLines: 30, sortBy: "complexity")

Returns complete symbol inventory with body ranges (lines, endLine) and complexity metrics (cyclomatic, cognitive). Replaces exploring 40 files one-by-one.

getSymbolGraph — Batch call graph for multiple symbols:

getSymbolGraph(symbolIds: [...30], depth: 1, direction: "callers")

Returns deduplicated nodes and edges with complexity per node. One call replaces 30 serial getCallGraph calls.

searchSymbols Enhancements

  • Complexity metrics: Results now include lines, cyclomatic, cognitive per symbol via tree-sitter enrichment
  • Server-side filtering: minLines, minComplexity, excludePatterns params — filter 80% of noise server-side instead of client-side
  • batchGet with includeCounts: Returns referenceCount, callerCount, calleeCount per symbol (parallel SCIP lookups)

Symbol Body Ranges (startLine, endLine, lines)

searchSymbols, explore keySymbols, and getSymbolGraph now return full body ranges via tree-sitter enrichment. Consumers no longer need to read source files for brace-matching.

Explore keySymbols Improvements

  • Functions rank above struct fields (behavioral analysis priority)
  • Tree-sitter supplement fills in functions when SCIP returns only types
  • Per-symbol cyclomatic and cognitive complexity

getFileComplexity in Refactor Preset

Previously only available in full preset (96 tools). Now in refactor (39 tools).

Fixed

Bug-Pattern False Positives (42 → 0)

  • defer-in-loop: Recognize func(){}() closure pattern as correct (defer fires per iteration)
  • discarded-error: Skip closure bodies in IIFE patterns; add singleReturnNew allowlist (NewScanner, NewReader, etc.); add noErrorMethods (Scan, WriteHeader, WriteJSON, WriteError, BadRequest, NotFound, InternalError)
  • missing-defer-close: Remove NewReader/NewWriter from resource-opening functions (bufio wrappers don't need Close)
  • nil-after-deref: 30-line gap threshold filters cross-scope false matches
  • shadowed-err: Only flag when outer err is standalone function-body-level :=; treat if/for/switch initializer := as scoped

All fixes use FindNodesSkipping — scope-aware tree-sitter node search that stops recursion at func_literal boundaries.

Secrets Scanner

  • Shell variable interpolation (${VAR:-default}, ${VAR:?error}) in Docker Compose URLs no longer flagged as password_in_url
  • Shell environment leak: env -i wrapper prevents user profile (.zshrc) from corrupting subprocess output

Test-Gap Detection

  • vi.mock/jest.mock module-level mocking recognized — functions covered by module mocks no longer flagged
  • Barrel/re-export files (export * from '...') skipped — pure re-exports have no logic to test

Coupling Check

  • Expanded noise filter: test files, dependency manifests (go.mod, package.json), documentation, generated directories (dist/, build/, l10n/, generated/)
  • Generated file suffixes: .pb.go, .pb.h, .pb.cc, .pb.ts, _grpc.pb.go, _pb2.py, .g.dart, .freezed.dart, .mocks.dart, _string.go, wire_gen.go, _mock.go, .bundle.js, .arb, .d.ts
  • Flutter l10n false positive fixed (#185): .arb files excluded from coupling analysis

Compliance Audit FP Reduction (11,356 → ~50 findings)

  • Deep-nesting: threshold 4→6, reset at function boundaries, 3-per-file cap
  • Dead-code: skip Go files (handled by AST-based bug-patterns)
  • Dynamic-memory: skip garbage-collected languages
  • Global-state: exclude regexp.MustCompile, errors.New, sync primitives
  • Swallowed-errors: remove overly broad _ = obj.Method() pattern
  • Eval-injection: skip Go and .github/ directories
  • Insecure-random: inline import scanning for crypto/rand vs math/rand; skip import lines
  • Path-traversal: skip filepath.Join, HasPrefix comparisons, testdata/
  • Non-FIPS-crypto: skip strings.Contains pattern matching
  • SQL injection (PCI DSS): add parameterized query detection, #nosec support
  • TODO detection: case-sensitive TEMP, skip "Stub:/Placeholder:/Note:" comments, require comment context

FTS Empty Query Bug

FTS.Search("") returned empty results (early return for empty query). Added listAll() method that queries symbols_fts_content directly. Fixes listSymbols and searchSymbols("") returning 0 on MCP.

MCP Server Warmup

Changed warmup from SearchSymbols("", 1) (cached empty results before SCIP loaded) to RefreshFTS() (populates FTS from SCIP without caching search results).

IEC 61508 Tree-Sitter Crash

complexityExceededCheck bypassed thread-safe AnalyzeFileComplexity() wrapper, calling ComplexityAnalyzer.AnalyzeFile() directly — SIGABRT when concurrent checks hit CGO.

Daemon API Endpoints (7 stubs → implementations)

  • Schedule list/detail/cancel via scheduler.ListSchedules()
  • Repo list/detail via repos.LoadRegistry()
  • Federation list/detail via federation.List()/LoadConfig()
  • CLI daemon status: HTTP health query with version/uptime display

Query Engine Stubs (4 → implementations)

  • Ownership refresh: CODEOWNERS parsing + git-blame analysis
  • Hotspot refresh: git churn data with 90-day window
  • Responsibility refresh: module responsibility extraction
  • Ownership history: storage table query

Changed

  • Score calculation: floor is 0 (not 20), per-rule deduction cap of 10 documented
  • LikelyReturnsError: removed "Scan" from error patterns, added singleReturnNew and noErrorMethods maps
  • Generated file detection: 20+ new patterns (protobuf, Go generators, Dart/Flutter, GraphQL, bundlers)
  • Per-check findings cap (50 max) in compliance engine
  • Compliance config: DefaultDaemonPort constant replaces hardcoded 9120

Performance

  • batchGet with includeCounts: parallel reference/caller/callee lookups (10-concurrent semaphore)
  • FTS multiplier: 2x → 10x when filters active (handles SCIP struct field flooding)
  • MCP index warmup: background RefreshFTS() on engine init

[8.2.0] - 2026-03-21

Added

Unified PR Review Engine (ckb review)

A comprehensive code review command that orchestrates 20 quality checks in parallel:

ckb review --base=main              # Human-readable review
ckb review --base=main --ci         # CI mode (exit 0=pass, 1=fail, 2=warn)
ckb review --base=main --post=123   # Post as PR comment
ckb review --staged                 # Review staged changes
ckb review --checks=secrets,breaking,bug-patterns  # Specific checks only

20 checks: breaking changes (SCIP), secrets, tests, complexity (tree-sitter), health scoring (8-factor weighted), coupling (git co-change), hotspots (churn ranking), risk scoring, dead code (SCIP + grep verification), test gaps, blast radius (SCIP, framework-filtered), bug patterns (10 AST rules), PR split suggestion, comment/code drift, format consistency, critical paths, traceability, reviewer independence, generated file detection, change classification.

7 output formats: human, json, markdown, sarif, codeclimate, github-actions, compliance.

Bug Pattern Detection (10 AST Rules)

Tree-sitter-based bug detection with differential analysis (only new issues reported):

  • defer-in-loop — resource leak from deferred calls in loops
  • unreachable-code — statements after return/panic
  • empty-error-branchif err != nil { } with no handling
  • unchecked-type-assertx.(string) without comma-ok
  • self-assignmentx = x (likely typo)
  • nil-after-deref — variable used before nil check
  • identical-branches — if/else with same body
  • shadowed-errerr redeclared with := in inner scope
  • discarded-error — error return value ignored (with receiver-type allowlist for strings.Builder, bytes.Buffer, hash.Hash)
  • missing-defer-close — resource opened without defer Close()

All 10 rules validated against known-buggy and clean-code corpus tests.

HoldTheLine Enforcement

Findings are post-filtered to only changed lines when HoldTheLine: true (default). Pre-existing issues on unchanged lines are suppressed. Test-gap and hotspot findings are exempt (file-level concerns).

Multi-Provider LLM Narrative (--llm)

Optional AI-powered review narrative that replaces the deterministic summary:

ckb review --base=main --llm   # Requires ANTHROPIC_API_KEY or GEMINI_API_KEY
  • Auto-detects provider from environment (Gemini or Anthropic)
  • Self-enrichment: CKB verifies own findings via findReferences and analyzeImpact before sending to LLM
  • Triage field on enriched findings (confirmed/likely-fp/verify) guides LLM reasoning
  • LLM identifies CKB false positives and deprioritizes framework noise

Finding Dismissal Store

Users can dismiss findings by editing .ckb/review-dismissals.json:

{"dismissals": [{"ruleId": "ckb/hotspots/volatile-file", "file": "cmd/ckb/daemon.go", "reason": "Expected churn"}]}

Dismissed findings are filtered from all future reviews.

MCP Tool: reviewPR

New MCP tool with compact mode for AI consumers:

reviewPR(baseBranch: "main", compact: true)

Compact mode returns ~1k tokens instead of ~30k — verdict, non-pass checks, top 10 findings, health summary. Reduces AI assistant context usage by 97%.

Supports staged, scope, compact, failOnLevel, criticalPaths parameters.

Claude Code Skill (/ckb-review)

ckb setup --tool=claude-code now installs a /ckb-review slash command that orchestrates CKB's structural analysis with LLM semantic review. Interactive setup prompts for skill installation.

PR Comment Posting (--post)

ckb review --base=main --post=123   # Posts markdown review as PR comment via gh CLI

CI Integration

  • GitHub Actions workflow with SARIF upload, PR comments, and inline annotations
  • GitLab CI with CodeClimate report
  • GitHub Action (action/ckb-review/action.yml)

Noise Reduction

  • Framework symbol filter for blast-radius (skips variables/constants — works across Go, C++, Java, Python via SCIP symbol kinds)
  • Hotspot findings capped to top 10 by churn score
  • Complexity findings require +5 cyclomatic delta minimum
  • Per-rule score cap (10 points max per ruleId)
  • Receiver-type allowlist for discarded-error (strings.Builder, bytes.Buffer, hash.Hash)
  • Dead-code grep verification catches cross-package references SCIP misses

Fixed

  • daemon.go: followLogs() deadlocked on EOF (select{} → sleep+poll)
  • daemon.go: file.Seek() error silently ignored
  • handlers_review.go: context.Background()context.WithTimeout(r.Context(), 5min)
  • cmd/ckb/review.go: err shadow at postReviewComment
  • cmd/ckb/setup.go: err shadow at promptInstallSkills
  • Config merge: DeadCodeMinConfidence and TestGapMinLines overrides from config file now work (default values no longer block merge)
  • Go bumped to 1.26.1 (4 stdlib CVEs)
  • gosec findings annotated/resolved across codebase

Changed

  • Version: 8.1.0 → 8.2.0
  • Schema version: 8.2
  • complexity.findNodes exported as FindNodes for use by bug-pattern rules
  • LLMConfig added to config with Provider, APIKey, Model fields
  • MCP reviewPR tool description updated (20 checks, staged/scope/compact params)
  • CLAUDE.md updated with review documentation

Performance

  • Tree-sitter checks serialized with proper mutex discipline (cgo safety)
  • Hotspot scores pre-computed once and shared between checks
  • Health check subprocess calls reduced ~60%
  • Batch git-blame operations for repo metrics

[8.1.0] - 2026-01-31

Added

Coverage Configuration Options

Coverage file detection is now configurable via .ckb/config.json:

{
  "coverage": {
    "paths": ["coverage/custom-lcov.info"],
    "autoDetect": true,
    "maxAge": "168h"
  }
}
  • paths: Custom paths to check for coverage files
  • autoDetect: Use language-specific auto-detection (default: true)
  • maxAge: Max age before marking as stale (default: 7 days)

Orphaned Index Detection

ckb doctor now includes an orphaned-indexes check that scans for indexes pointing to repos that no longer exist:

$ ckb doctor

✓ orphaned-indexes: Index cache: 234 MB (12 repos), 2 orphaned
  → ckb cache clean --orphaned

Test Mapping (ckb affected-tests)

New command to find tests affected by current changes:

$ ckb affected-tests

Affected Tests
──────────────────────────────────────────────────────────

Found 8 test files:
  • 5 direct (test references changed code)
  • 3 transitive (test uses affected code)

Run command:
  go test ./internal/query/... ./internal/diff/...

Features:

  • Maps changed symbols to affected test files via SCIP
  • Finds corresponding test files by naming convention (e.g., foo.gofoo_test.go)
  • Generates language-appropriate run commands
  • --format=list for CI integration

--include-tests Flag Wiring

The --include-tests flag now works end-to-end in ckb impact diff:

  • Properly sets IsTest flag on references based on file path
  • Filters test files from changed symbols when --include-tests=false

Dependency Cycle Detection (findCycles)

Detect circular dependencies in module, directory, or file dependency graphs using Tarjan's SCC algorithm:

# Via MCP
findCycles { "granularity": "directory", "targetPath": "internal/" }
  • Uses Tarjan's strongly connected components to find real cycles
  • Recommends which edge to break (lowest coupling cost)
  • Severity classification: size ≥5 = high, ≥3 = medium, 2 = low
  • Available in refactor preset

Move/Relocate Change Type

prepareChange and planRefactor now support changeType: "move" with a targetPath parameter:

prepareChange { "target": "internal/old/handler.go", "changeType": "move", "targetPath": "pkg/handler.go" }
  • Scans all source files for import path references that need updating
  • Detects target directory conflicts (existing files with same name)
  • Generates move-specific refactoring steps in planRefactor

Extract Variable Flow Analysis

prepareChange with changeType: "extract" now provides tree-sitter-based variable flow analysis when CGO is available:

  • Identifies parameters (variables defined outside selection, used inside)
  • Identifies return values (variables defined inside, used after selection)
  • Classifies local variables (defined and consumed within selection)
  • Generates language-appropriate function signatures (Go, Python, JS/TS)
  • Graceful degradation: falls back to line-count heuristics without CGO

Suggested Refactoring Detection (suggestRefactorings)

Proactive detection of refactoring opportunities by combining existing analyzers in parallel:

suggestRefactorings { "scope": "internal/query", "minSeverity": "medium" }
  • Complexity: High cyclomatic/cognitive functions → extract_function, simplify_function
  • Coupling: Highly correlated file pairs → reduce_coupling, split_file
  • Dead code: Unused symbols → remove_dead_code
  • Test gaps: High-risk untested code → add_tests
  • Each suggestion includes severity, effort estimate, and priority score
  • Available in refactor preset

[8.0.2] - 2026-01-22

Added

Grok Support in ckb setup

Grok is now a supported AI coding tool in the setup wizard:

ckb setup --tool=grok          # project-level (.grok/settings.json)
ckb setup --tool=grok --global # global (~/.grok/user-settings.json)

Uses grok mcp add CLI when available, falls back to file-based configuration. Grok's MCP format includes name and transport fields alongside the standard command/args.

MCP Registry Support

Added mcpName field to npm package.json for publishing to the official MCP Registry (io.github.simplyliz/ckb).

Compound Tool NFR Scenarios

NFR test suite expanded from 28 to 39 scenarios, adding coverage for v8.0 compound tools:

  • explore (small, large) — aggregates explainFile + searchSymbols + callGraph + hotspots
  • understand (small, large) — aggregates symbol detail + findReferences + callGraph
  • prepareChange (small, large) — aggregates impact + affectedTests + coupling
  • batchGet (small, large) — batch symbol retrieval (up to 50)
  • batchSearch (small, medium, large) — multiple concurrent searches

Changed

Dynamic NFR Baselines

NFR tests now compare PR results against the base branch (dynamic baseline) instead of static hardcoded values. Two parallel CI jobs run the tests on both HEAD and base, then a comparison job reports regressions. This catches real regressions relative to the target branch rather than drifting static numbers.

NFR Tests Scope

NFR tests now only run on PRs targeting main (develop → main merges), reducing CI noise on feature branches.

[8.0.1] - 2026-01-22

Improved

Human-Readable Output by Default

All CLI commands now default to --format=human instead of --format=json. This makes the CLI more friendly for interactive use while still supporting --format=json for scripting and automation.

Quieter Indexer Output

External SCIP indexers (scip-go, scip-typescript, etc.) no longer spam stdout during ckb index. Output is now captured and only shown on error or when using -v verbose mode.

Better Error Messages

  • ckb dead-code now clearly indicates it's for telemetry-based analysis and suggests using ckb telemetry dead-code
  • ckb impact diff no longer shows confusing "Symbol not found: diff" error; instead provides helpful guidance
  • Symbol not found errors now suggest using ckb search to find valid symbol IDs

Index Missing Guidance

ckb status now shows helpful guidance when no SCIP index is found:

  • Lists commands that work without index (git-based): hotspots, ownership, reviewers, diff-summary, pr-summary
  • Lists commands that need SCIP index: search, refs, callgraph, impact, dead-code, trace, explain

Fixed

  • Consistent --format=human support for diff-summary, concepts, and impact diff commands

[8.0.0] - 2026-01-21

Theme: Reliability, clarity, and compound operations for AI workflows.

Added

Compound Operations (5 New Tools)

Reduce AI tool calls by 60-70% with smart aggregation tools that combine multiple primitives into single, focused operations.

explore — Area Exploration

Comprehensive exploration of files, directories, or modules. Replaces the common pattern of explainFilesearchSymbolsgetCallGraphgetHotspots.

{
  "target": "internal/query",
  "depth": "standard",    // "shallow" | "standard" | "deep"
  "focus": "structure"    // "structure" | "dependencies" | "changes"
}

Returns: module overview, key symbols (ranked by importance), dependencies, recent changes, hotspots, and drilldown suggestions.

understand — Symbol Deep-Dive

Complete symbol understanding with ambiguity handling. Replaces searchSymbolsgetSymbolexplainSymbolfindReferencesgetCallGraph.

{
  "query": "HandleRequest",
  "includeReferences": true,
  "includeCallGraph": true,
  "maxReferences": 50
}

Returns: full symbol detail, explanation, references grouped by file, callers/callees, related tests, and disambiguation info when multiple matches exist.

prepareChange — Pre-Change Analysis

Impact analysis before modifying code. Combines analyzeImpact + getAffectedTests + analyzeCoupling + risk calculation.

{
  "target": "ckb:repo:sym:abc123",
  "changeType": "modify"    // "modify" | "rename" | "delete" | "extract"
}

Returns: direct dependents, transitive impact metrics, related tests, co-change files, and risk assessment with severity levels and mitigation suggestions.

batchGet — Multiple Symbols by ID

Retrieve up to 50 symbols in a single call. Returns results and errors keyed by symbol ID.

batchSearch — Multiple Searches

Execute up to 10 symbol searches in one call. Each query can have its own kind filter and scope.

SSE Streaming

Real-time feedback for long-running operations via Server-Sent Events.

Protocol:

// Request with streaming
{
  "name": "findReferences",
  "arguments": {
    "symbolId": "ckb:repo:sym:abc123",
    "stream": true,
    "chunkSize": 20
  }
}

// Initial response
{
  "streamId": "abc123",
  "streaming": true,
  "meta": { "chunkSize": 20 }
}

// MCP notifications: ckb/streamMeta, ckb/streamChunk, ckb/streamProgress, ckb/streamComplete

Streamable Tools:

  • findReferences — Stream references in chunks with progress updates
  • searchSymbols — Stream symbol search results

Event Types:

Event Purpose
meta Stream metadata (total count, chunk size, backends)
chunk Batch of items with sequence number
progress Phase updates with percentage
done Stream complete with summary
error Error with code and remediation

Enhanced getStatus

System health with actionable remediation guidance.

{
  "backends": {
    "scip": { "status": "available", "latencyMs": 12 },
    "git": { "status": "available" },
    "lsp": {
      "status": "unavailable",
      "reason": "No LSP server configured",
      "remediation": "Configure LSP server in .ckb/config.json"
    }
  },
  "index": {
    "fresh": false,
    "commitsBehind": 3,
    "lastIndexed": "2h ago",
    "symbolCount": 4521,
    "fileCount": 156
  },
  "overallHealth": "degraded",
  "suggestions": [
    "Run 'ckb index' to refresh stale index",
    "Configure LSP for enhanced code intelligence"
  ]
}

Health Tiers:

  • available — Backend working normally
  • degraded — Backend available but with warnings
  • unavailable — Backend not available, includes remediation

reindex Tool

Trigger index refresh via MCP with scope control.

// Input
{ "scope": "full", "async": false }

// Output
{
  "status": "action_required",
  "message": "Index is 3 commits behind. Run 'ckb index' to refresh."
}

Status values: skipped, action_required, started, completed

Structured Error Codes

All MCP errors now include actionable remediation guidance.

Code When Remediation
AMBIGUOUS_QUERY Multiple symbols match Narrow with scope, kind, or more specific name
PARTIAL_RESULT Some backends failed Result incomplete; check backend health
INVALID_PARAMETER Bad input Check parameter format
RESOURCE_NOT_FOUND Symbol/file doesn't exist Verify ID or path
PRECONDITION_FAILED Required condition not met Check index freshness, backend availability
OPERATION_FAILED General failure Check logs, retry

Response Metadata

All tool responses now include structured metadata for AI transparency.

ConfidenceFactor: Explains why a confidence score was assigned

{
  "score": 0.85,
  "tier": "medium",
  "factors": [
    { "factor": "scip_exact_match", "weight": 0.9 },
    { "factor": "index_slightly_stale", "weight": -0.05 }
  ]
}

CacheInfo: Cache hit/miss transparency

{
  "hit": true,
  "tier": "query_cache",
  "age": "45s",
  "key": "findReferences:abc123"
}

Code Analysis Tools

findDeadCode — Static dead code detection

Identifies symbols with no references (excluding test files, entrypoints, and interface implementations).

{
  "candidates": [
    {
      "symbolId": "ckb:repo:sym:abc123",
      "name": "unusedHelper",
      "kind": "function",
      "file": "internal/util/helpers.go",
      "confidence": 0.95,
      "reason": "No references found"
    }
  ],
  "excludedReasons": {
    "entrypoint": 12,
    "interface_impl": 8,
    "test_only": 23
  }
}

getAffectedTests — Test coverage mapping

Maps changed symbols to affected test files.

{
  "symbolId": "ckb:repo:sym:abc123",
  "affectedTests": [
    { "file": "auth/handler_test.go", "confidence": 0.95, "reason": "direct_reference" },
    { "file": "api/routes_test.go", "confidence": 0.75, "reason": "transitive" }
  ],
  "runCommand": "go test ./internal/auth/... ./internal/api/..."
}

compareAPI — Breaking change detection

Compares API surface between commits/branches.

{
  "base": "main",
  "head": "HEAD",
  "breaking": [
    {
      "symbol": "ValidateToken",
      "change": "renamed",
      "newName": "ValidateUserToken",
      "affectedCallers": 12
    }
  ],
  "additions": [...],
  "compatible": true
}

Golden Test Suite

Multi-language test fixtures for regression testing across Go, TypeScript, Python, and Rust.

Changed

  • All MCP tool handlers now use structured CkbError instead of raw fmt.Errorf
  • getStatus response includes streaming capabilities info
  • Confidence scores now include explanation factors via FromProvenance()
  • Cached responses include cache tier and age information

Files Added

Compound Operations:

  • internal/query/compound.goExplore(), Understand(), PrepareChange(), BatchGet(), BatchSearch()
  • internal/query/compound_test.go — Compound operation tests
  • internal/mcp/tool_impls_compound.go — MCP handlers for compound tools

Streaming:

  • internal/streaming/stream.go — Core Stream type with event sending, heartbeat
  • internal/streaming/chunker.go — Generic chunking by count and byte size
  • internal/streaming/mcp.go — MCP notification writer for streams
  • internal/mcp/streaming.go — StreamingHandler type, registry, wrapForStreaming
  • internal/mcp/tool_impls_streaming.go — Streaming implementations

Error Handling:

  • internal/errors/codes.go — Error code taxonomy with constructors
  • internal/errors/remediation.go — Remediation message generation

Metadata:

  • internal/envelope/confidence.go — ConfidenceFactor type and FromProvenance()
  • internal/envelope/cache.go — CacheInfo type

[7.6.0]

Added

Real Transitive Impact Analysis

The analyzeImpact tool now provides real transitive caller analysis using call graph traversal, replacing the previous stub implementation.

What's New:

  • Transitive callers: See not just direct callers, but callers-of-callers up to depth 4
  • Blast radius summary: Quick metrics showing module count, file count, unique callers, and risk level
  • Distance tracking: Each transitive caller includes its distance from the target symbol
  • Confidence decay: Confidence decreases with depth (0.85 → 0.75 → 0.65 for depths 2/3/4)

Example Output:

{
  "blastRadius": {
    "moduleCount": 4,
    "fileCount": 12,
    "uniqueCallerCount": 18,
    "riskLevel": "high"
  },
  "transitiveImpact": [
    { "kind": "transitive_caller", "symbolId": "...", "distance": 2, "confidence": 0.85 },
    { "kind": "transitive_caller", "symbolId": "...", "distance": 3, "confidence": 0.75 }
  ]
}

Blast Radius Thresholds:

Level Criteria
Low ≤2 modules AND ≤5 callers
High >5 modules OR >20 callers
Medium Everything in between

Usage:

# CLI
ckb impact <symbol-id> --depth 3

# MCP
analyzeImpact({ symbolId: "...", depth: 3 })

Files Changed:

  • internal/impact/types.go — Added BlastRadius struct with classification
  • internal/impact/analyzer.goTransitiveCallerProvider interface, transitive analysis
  • internal/query/impact.goscipCallerProvider using SCIP call graph
  • internal/mcp/tool_impls.go — Added blastRadius to MCP output

[7.5.0]

Added

Change Impact Analysis

Analyze the impact of code changes from git diffs before committing. This feature answers: "What downstream code might break?"

CLI:

ckb impact diff                # Analyze working tree changes
ckb impact diff --staged       # Analyze only staged changes
ckb impact diff --base=main    # Compare against a branch
ckb impact diff --depth=3      # Deeper transitive analysis
ckb impact diff --strict       # Fail if index is stale

MCP Tool: analyzeChange

Key Features:

  • Git diff parsing — Uses sourcegraph/go-diff to parse unified diffs into structured hunks
  • Symbol mapping — Maps changed lines to SCIP symbol definitions with confidence scoring
  • Confidence levels — 1.0 (exact definition), 0.8 (body change), 0.7 (reference), 0.3 (file-level)
  • Aggregated risk — Weighted factors: symbols changed (20%), direct impact (30%), transitive impact (20%), module spread (30%)
  • Index staleness — Warns when SCIP index is behind HEAD; --strict mode fails if stale
  • Recommendations — Actionable suggestions (review, test, split) based on analysis

Files Added:

  • internal/impact/interfaces.go — Core types (ChangedSymbol, ParsedDiff, ChangeType)
  • internal/diff/gitdiff.go — Git diff parser with source file filtering
  • internal/diff/symbolmap.go — Diff-to-symbol mapper with confidence scoring
  • internal/diff/scipadapter.go — SCIP index adapter for symbol lookup

Files Changed:

  • internal/query/impact.go — Added AnalyzeChangeSet() engine method
  • internal/mcp/tools.go — Added analyzeChange tool definition
  • internal/mcp/tool_impls.go — Added analyzeChange handler
  • cmd/ckb/impact.go — Added ckb impact diff subcommand

See [[Change-Impact-Analysis]] in the wiki for full documentation.

Token Efficiency Visibility

Users can now see CKB's token savings compared to bloated MCP servers:

Startup Banner:

CKB MCP Server v7.5.0
  Active tools: 14 / 76 (18%)
  Estimated context: ~1k tokens
  Preset: core

getStatus Response:

"preset": {
  "active": "core",
  "exposed": 14,
  "total": 76,
  "estimatedTokens": 1529,
  "fullPresetTokens": 9040,
  "tokenSavingsPercent": 83
}

This addresses community feedback about MCP tools consuming 50-80k tokens before conversations even start. CKB's preset system delivers 83% token reduction while maintaining full functionality.

Preset Discoverability (--list-presets):

$ ckb mcp --list-presets

Available presets:

  PRESET        TOOLS         TOKENS  DESCRIPTION
  ------        -----         ------  -----------
  core             14     ~2k tokens  Quick navigation, search, impact analysis (default)
  review           19     ~2k tokens  Code review with ownership and PR summaries
  refactor         19     ~2k tokens  Refactoring analysis with coupling and dead code
  federation       28     ~3k tokens  Multi-repo queries and cross-repo visibility
  docs             20     ~2k tokens  Documentation-symbol linking and coverage
  ops              25     ~2k tokens  Diagnostics, daemon, webhooks, jobs
  full             76     ~9k tokens  Complete feature set (all tools)

Use: ckb mcp --preset=<name>

Future: Per-tool token breakdown (--tokens flag showing individual tool costs) planned for a later release.

Files Changed:

  • cmd/ckb/mcp.go — Multi-line startup banner with token info, --list-presets flag
  • internal/mcp/server.go — Added EstimateActiveTokens(), EstimateFullTokens() methods
  • internal/mcp/presets.go — Added FormatTokens(), GetAllPresetInfo(), PresetDescriptions
  • internal/mcp/tool_impls.go — Token fields in getStatus response

Auto Index Updates

Automatic index maintenance across all CKB interfaces—keep your index fresh without manual intervention:

Watch Mode (CLI):

# Watch for changes and auto-reindex (standalone)
ckb index --watch
ckb index --watch --watch-interval 15s

# Watch with MCP server (existing, now configurable)
ckb mcp --watch
ckb mcp --watch --watch-interval 1m

Daemon File Watcher: The daemon's file watcher now triggers actual incremental refreshes instead of just logging. When git changes are detected, the daemon queues and executes an incremental update.

Webhook API:

# Trigger incremental refresh via HTTP
curl -X POST http://localhost:9120/api/v1/refresh

# Force full reindex
curl -X POST http://localhost:9120/api/v1/refresh -d '{"full": true}'

# Specify repository
curl -X POST http://localhost:9120/api/v1/refresh -d '{"repo": "/path/to/repo"}'

Response:

{
  "status": "queued",
  "repo": "/path/to/repo",
  "type": "incremental"
}

Index Staleness Visibility:

  • ckb status now shows commits behind HEAD and index age
  • MCP getStatus response includes index.commitsBehind, index.indexAge, index.reason
  • Fresh indexes show ✓, stale indexes show ⚠ with specific reason

Files Added:

  • internal/daemon/refresh.go — RefreshManager for incremental/full reindex
  • cmd/ckb/status_test.go — Status type tests
  • internal/daemon/refresh_test.go — RefreshManager unit tests

Files Changed:

  • cmd/ckb/index.go — Added --watch and --watch-interval flags
  • cmd/ckb/mcp.go — Added --watch-interval flag (min 5s, max 5m)
  • cmd/ckb/status.go — Added staleness display with commits behind
  • internal/daemon/daemon.go — Connected file watcher to RefreshManager
  • internal/daemon/server.go — Added /api/v1/refresh endpoint
  • internal/index/metadata.go — Added Staleness type and GetStaleness() method
  • internal/mcp/tool_impls.go — Added index staleness to getStatus response
  • CLAUDE.md — Added "Keeping Your Index Fresh" section

See [[Index-Management]] in the wiki for detailed documentation.

Multi-Language Incremental Indexing

Incremental indexing now supports multiple languages via a unified indexer registry:

Supported languages:

  • Go (scip-go)
  • TypeScript/JavaScript (scip-typescript)
  • Python (scip-python)
  • Dart (scip_dart)
  • Rust (rust-analyzer)

Features:

  • Automatic indexer detection and path resolution (including ~/go/bin)
  • Graceful degradation with install hints when indexer is missing
  • Language-specific SupportsIncremental flag for safe fallback
  • Unified IndexIncrementalWithLang(ctx, since, lang) API

Usage:

# Auto-detects language and uses incremental if available
ckb index

# Incremental not available message includes install command
Incremental not available: scip-python not installed (run: pip install scip-python)

[7.4.0]

Added

Tool Presets & Pagination

Token-optimized tool discovery reducing context overhead by up to 83%:

Presets:

Preset Tools Tokens Use Case
core (default) 14 ~1,531 Essential navigation and analysis
review 19 ~2,294 Code review: PR summary, ownership
refactor 19 ~2,216 Refactoring: coupling, dead code
docs 20 ~2,093 Documentation: coverage, staleness
ops 25 ~2,366 Operations: jobs, webhooks, metrics
federation 28 ~3,122 Multi-repo: cross-repo search
full 76 ~9,043 All tools (legacy behavior)

Features:

  • MCP-compliant paginationtools/list cursor-based pagination per spec
  • Core-first ordering — Page 1 always contains functional toolset for non-paginating clients
  • Cursor invalidation — Cursor rejected when preset or toolset changes
  • expandToolset meta-tool — Dynamic preset expansion with rate limiting (once per session)
  • tools.listChanged capability — Enables dynamic tool list updates

CLI:

ckb mcp                      # Default: core preset (14 tools)
ckb mcp --preset=review      # Code review workflow
ckb mcp --preset=full        # All 76 tools (legacy)

Setup Wizard:

  • ckb setup now prompts for preset selection
  • --preset flag for non-interactive configuration

Files Added:

  • internal/mcp/presets.go — Preset definitions and core-first ordering
  • internal/mcp/cursor.go — MCP-compliant cursor pagination

Files Changed:

  • internal/mcp/server.go — Preset management and toolset hash
  • internal/mcp/handler.go — Paginated handleListTools
  • internal/mcp/tools.goexpandToolset tool definition
  • internal/mcp/tool_impls.goexpandToolset handler with rate limiting
  • internal/mcp/capabilities.gotools.listChanged: true
  • cmd/ckb/mcp.go--preset flag
  • cmd/ckb/setup.go — Preset selection in wizard

Wide-Result Metrics Tracking

Infrastructure for monitoring tool output sizes and truncation rates:

  • getWideResultMetrics tool — Expose wide-result statistics
  • SQLite persistence — Historical tracking for optimization work
  • Per-tool aggregation — Invocations, bytes, tokens, truncations
  • Response byte tracking — Actual JSON payload size for each tool response
  • ckb metrics CLI — View aggregated metrics with --days, --tool, --format flags
  • ckb metrics export — Export versioned metrics to JSON for cross-version comparison

Tracked tools: searchSymbols, findReferences, analyzeImpact, getCallGraph, getHotspots, summarizePr

Telemetry Findings:

Tool Truncation Rate Needs Frontier?
searchSymbols 45% Yes
getHotspots 50% Yes
findReferences 18% No
getCallGraph 0% No
analyzeImpact 0% No

Files Added:

  • internal/mcp/wide_result_metrics.go — In-memory aggregation with DB persistence
  • internal/storage/metrics_store.go — SQLite metrics storage
  • cmd/ckb/metrics.go — CLI metrics command

Performance

SCIP Backend Optimizations

Major performance improvements to the SCIP backend through pre-computed indexes:

Operation Before After Improvement
FindReferences 340μs 2.5μs 136x faster
SearchSymbols 930μs 136μs 7x faster
FindSymbolLocation 70μs 28ns 2,500x faster
GetCachedSymbol 210ns 7.5ns 28x faster

Changes:

  • RefIndex: Inverted reference index built during SCIP load for O(1) reference lookups instead of O(n×m) scans
  • ConvertedSymbols Cache: Pre-converted symbols avoid repeated parsing of SCIP identifiers, visibility inference, and location lookups
  • ContainerIndex: Maps occurrence positions to containing symbols for O(1) containment lookup instead of O(n²) nested loops
  • Fast Location Lookup: findSymbolLocationFast uses RefIndex for O(k) definition lookup where k = number of occurrences
  • RateLimiter Cleanup: Added graceful shutdown with Stop() method to prevent goroutine leaks

Files Changed:

  • internal/backends/scip/loader.go — Added OccurrenceRef, RefIndex, ConvertedSymbols, ContainerIndex to SCIPIndex
  • internal/backends/scip/references.goFindReferences uses inverted index, added findContainingSymbolFast
  • internal/backends/scip/symbols.go — Added GetCachedSymbol, findSymbolLocationFast, cached SearchSymbols
  • internal/backends/limiter.go — Added done channel and Stop() method

Tests Added:

  • internal/backends/scip/performance_test.go — 11 unit tests + 10 benchmarks
  • internal/backends/limiter_test.go — 5 unit tests + 1 benchmark

Git Backend Optimizations

Major performance improvement to getHotspots by consolidating git commands:

Operation Before After Improvement
getHotspots 26.7s 498ms 53x faster

Problem: For each changed file, ran 4 separate git commands (rev-list, shortlog, log × 2). With 100+ files = 400+ process spawns.

Solution: Single git log --format=%H|%an|%aI --numstat command parses all data in one pass.

Files Changed:

  • internal/backends/git/churn.go — Rewrote GetHotspots to use single git command

Added

Standardized Response Envelope

All 76 MCP tool responses now include structured metadata in a consistent envelope format:

Envelope Schema:

{
  "schemaVersion": "1.0",
  "data": { },
  "meta": {
    "confidence": { "score": 0.85, "tier": "medium", "reasons": [] },
    "provenance": { "backends": ["scip", "git"], "repoStateId": "..." },
    "freshness": { "indexAge": { "commitsBehind": 3, "staleReason": "behind-head" } },
    "truncation": { "isTruncated": true, "shown": 10, "total": 47, "reason": "max-symbols" }
  },
  "warnings": [],
  "suggestedNextCalls": [{ "tool": "findReferences", "params": {...}, "reason": "..." }]
}

Key Features:

  • Confidence Tiers — Results scored as high (≥0.95), medium (0.70-0.94), low (0.30-0.69), or speculative (<0.30)
  • Provenance Tracking — See which backends (SCIP, LSP, git) contributed to results
  • Freshness Info — Know how stale your index is (commits behind, uncommitted changes)
  • Truncation Metadata — See how many results were trimmed and why
  • Suggested Next Calls — AI-friendly drilldown suggestions as structured tool calls
  • Cross-repo Marking — Federation queries automatically marked as speculative tier

Files Added:

  • internal/envelope/envelope.go — Core types (Response, Meta, Confidence, etc.)
  • internal/envelope/builder.go — Fluent builder API
  • internal/envelope/confidence.go — Score to tier mapping
  • internal/envelope/envelope_test.go — Comprehensive tests
  • internal/mcp/tool_helpers.go — Convenience wrappers for tool implementations
  • internal/mcp/tool_helpers_test.go — Tool helper tests

Files Changed:

  • internal/mcp/tools.go — Updated ToolHandler signature
  • internal/mcp/handler.go — Updated handleCallTool for envelope format
  • All internal/mcp/tool_impls*.go files — Refactored to return envelope responses

Update Notifications

Automatic update checking for all installation methods:

  • GitHub Releases API — Single source of truth for all install methods (npm, go install, binary)
  • Deferred notification — Shows at command START from cache (instant, no HTTP during execution)
  • Background refresh — Cache updated asynchronously for next run
  • 24-hour cache — Checks GitHub at most once per day, stored in ~/.ckb/update-check.json
  • Smart upgrade message — npm users see npm update, others see GitHub releases URL
  • Protocol-safe — Skips mcp and serve commands to avoid breaking protocols

Disable with:

export CKB_NO_UPDATE_CHECK=1

Example output (npm install):

╭─────────────────────────────────────────────────────╮
│  Update available: 7.3.0 → 7.4.0                    │
│  Run: npm update -g @tastehub/ckb                   │
╰─────────────────────────────────────────────────────╯

Example output (go install / binary):

╭─────────────────────────────────────────────────────╮
│  Update available: 7.3.0 → 7.4.0                    │
│  https://github.com/SimplyLiz/CodeMCP/releases      │
╰─────────────────────────────────────────────────────╯

Hybrid Retrieval with PPR

Graph-based retrieval enhancement using Personalized PageRank:

Results:

Metric Before After Improvement
Recall@10 62.1% 100% +61%
MRR 0.546 0.914 +67%

Components:

  • Eval Suiteckb eval command measures recall@K, MRR, latency
  • PPR Algorithm — Personalized PageRank over SCIP symbol graph
  • Seed Expansion — Expands struct fields to include methods for better cross-module discovery
  • Combined Scoring — FTS position + PPR score fusion (0.6/0.4 weights)

See Wiki for full documentation.

Files Added

  • internal/update/check.go — Core update check logic with npm registry API
  • internal/update/cache.go — 24-hour cache in ~/.ckb/update-check.json
  • internal/update/check_test.go — Tests for version comparison and caching
  • cmd/ckb/eval.go — Eval CLI command
  • internal/eval/suite.go — Eval framework with metrics
  • internal/eval/fixtures/*.json — Test fixtures
  • internal/graph/ppr.go — PPR algorithm with seed expansion
  • internal/graph/builder.go — Graph construction from SCIP
  • internal/query/ranking.go — PPR-based reranking

[7.3.0]

Added

npm Package Improvements

Better npmjs.com presence and npx reliability:

  • README on npmjs.com - Package now displays full README on npm registry
  • LICENSE included - MIT license file bundled with npm package
  • Issue tracker link - "Report a bug" link on npm page
  • npx sandbox fix - Node shim auto-detects repo root, fixing #1 support issue

How the npx fix works: The Node.js shim walks up from process.cwd() looking for .ckb/ or .git/ and sets CKB_REPO automatically. This means npx @tastehub/ckb now works from subdirectories and MCP clients that don't guarantee working directory.

Incremental Indexing v4 (Production-Grade)

Fast, reliable incremental indexing for large codebases:

Delta Artifacts:

  • ckb diff command - Generate delta manifests between snapshots
  • CI-generated diffs - O(delta) ingestion instead of O(N) comparison
  • Delta validation - Schema version, base snapshot, hash verification
  • POST /delta/ingest - Ingest delta artifacts via API
  • POST /delta/validate - Validate without applying

FTS5 Search:

  • SQLite FTS5 - Instant full-text search (replaces LIKE scans)
  • Automatic triggers - Real-time sync with symbol changes
  • FTS maintenance - Rebuild, vacuum, integrity-check
  • LIKE fallback - Graceful degradation for edge cases

Operational Hardening:

  • Compaction scheduler - Automatic snapshot cleanup, journal pruning, FTS vacuum
  • GET /health/detailed - Per-repo metrics, storage info, memory usage
  • GET /metrics - Prometheus metrics (counters, histograms, gauges)
  • Load shedding - Priority endpoints, circuit breakers, adaptive shedding

Language Quality Assessment

Per-language quality metrics and environment detection:

Quality Tiers:

  • Tier 1 (Full) - Go: full support, all features, stable
  • Tier 2 (Standard) - TypeScript, JavaScript, Python: full support, known edge cases
  • Tier 3 (Basic) - Rust, Java, Kotlin, C++, Ruby, Dart: basic support, callgraph may be incomplete
  • Tier 4 (Experimental) - C#, PHP: experimental

New Endpoints:

  • GET /meta/languages - Language quality dashboard with tier info, metrics, recommendations
  • GET /meta/python-env - Python venv detection with activation recommendations
  • GET /meta/typescript-monorepo - TypeScript monorepo detection (pnpm, lerna, nx, yarn)

Environment Detection:

  • Python virtual environment detection (.venv, venv, env, VIRTUAL_ENV)
  • Python package managers (pyproject.toml, requirements.txt, Pipfile)
  • TypeScript monorepo workspaces with per-package tsconfig status

Remote Federation Client (v3 Federation Phase 5)

Connect to remote CKB index servers and query them alongside local repositories—transforming federation from local-only aggregation to a distributed code intelligence network.

Core Features:

  • Remote Server Management — Add, remove, enable, disable remote CKB index servers
  • Hybrid Queries — Search symbols across local federation repos AND remote servers in parallel
  • Source Attribution — Results show whether they came from "local" or a named remote server
  • Graceful Degradation — Queries succeed even when some remotes are unavailable

Caching:

  • Repository list cached for 1 hour
  • Metadata cached for 1 hour
  • Symbol searches cached for 15 minutes
  • Refs and call graph always fresh (not cached)
  • Configurable per-server cache TTL

HTTP Client:

  • Bearer token authentication with environment variable expansion (${VAR})
  • Exponential backoff retry logic (max 3 retries)
  • Configurable timeouts per server
  • Response body limiting (10MB max)

CLI Commands:

# Add a remote CKB index server
ckb federation add-remote <federation> <name> --url=<url> [--token=<token>] [--cache-ttl=1h] [--timeout=30s]

# Remove a remote server
ckb federation remove-remote <federation> <name>

# List remote servers
ckb federation list-remote <federation> [--json]

# Sync metadata from remote server(s)
ckb federation sync-remote <federation> [name] [--json]

# Check remote server status
ckb federation status-remote <federation> <name> [--json]

# Enable/disable remote server
ckb federation enable-remote <federation> <name>
ckb federation disable-remote <federation> <name>

MCP Tools (7 new):

  • federationAddRemote — Add a remote server to a federation
  • federationRemoveRemote — Remove a remote server
  • federationListRemote — List remote servers in a federation
  • federationSyncRemote — Sync metadata from remote servers
  • federationStatusRemote — Get status of a remote server
  • federationSearchSymbolsHybrid — Search symbols across local + remote
  • federationListAllRepos — List repos from local and remote sources

Configuration:

[[remote_servers]]
name = "prod"
url = "https://ckb.company.com"
token = "${CKB_PROD_TOKEN}"   # Environment variable expansion
cache_ttl = "1h"
timeout = "30s"
enabled = true

Authentication & API Keys (v3 Federation Phase 4)

Scoped API key authentication for the index server, enabling secure multi-tenant access with fine-grained permissions.

Scoped API Keys:

  • read — GET requests, symbol lookup, search
  • write — POST requests, upload indexes, create repos
  • admin — Full access including token management and deletions

Per-Repository Restrictions:

  • Limit keys to specific repos using glob patterns (e.g., myorg/*)
  • Prevents cross-tenant data access in shared deployments

Rate Limiting:

  • Token bucket algorithm with configurable limits per key
  • Returns 429 Too Many Requests with Retry-After header
  • Customizable default limits and burst sizes

Token Management CLI:

# Create a new token
ckb token create --name "CI Upload" --scopes write
ckb token create --name "Read-only" --scopes read --repos "myorg/*"
ckb token create --name "Admin" --scopes admin --expires 30d

# List all tokens
ckb token list
ckb token list --show-revoked

# Revoke a token
ckb token revoke ckb_key_abc123

# Rotate a token (new secret, same ID)
ckb token rotate ckb_key_abc123

Token Format:

  • Token: ckb_sk_ prefix + 64 hex chars (shown once at creation)
  • Key ID: ckb_key_ prefix + 16 hex chars (used for management)
  • Secure bcrypt hashing for storage

Configuration:

[index_server.auth]
enabled = true
require_auth = true                    # false = unauthenticated gets read-only
legacy_token = "${CKB_LEGACY_TOKEN}"   # Backward compatibility

[[index_server.auth.static_keys]]
id = "ci-upload"
name = "CI Upload Key"
token = "${CI_CKB_TOKEN}"
scopes = ["write"]
repo_patterns = ["myorg/*"]
rate_limit = 100

[index_server.auth.rate_limiting]
enabled = true
default_limit = 60                     # Requests per minute
burst_size = 10

HTTP Headers:

  • Authorization: Bearer <token> — Authentication
  • X-RateLimit-Key: <key_id> — Rate limit tracking (response)
  • Retry-After: <seconds> — When rate limited (response)

Error Responses:

  • 401 Unauthorized — Missing/invalid/expired/revoked token
  • 403 Forbidden — Insufficient scope or repo not allowed
  • 429 Too Many Requests — Rate limited

Backward Compatibility:

  • Legacy single-token mode still works via legacy_token config
  • When require_auth = false, unauthenticated requests get read-only access

Enhanced Uploads (v3 Federation Phase 3)

Compression support, progress reporting, and incremental (delta) updates for the index upload system. Reduces upload sizes by 70-90% for typical updates.

Compression Support:

  • gzipContent-Encoding: gzip for 60-80% compression
  • zstdContent-Encoding: zstd for 70-90% compression (faster than gzip)
  • Automatic decompression on the server
  • Response includes compression_ratio showing savings

Progress Reporting:

  • Logs progress at 10MB intervals for large uploads
  • Includes bytes received, MB count, and percentage when Content-Length is known

Delta Uploads (Incremental):

  • POST /index/repos/{repo}/upload/delta — Upload only changed files
  • Requires X-CKB-Base-Commit header matching current index
  • Returns 409 Conflict with current_commit if base doesn't match
  • Suggests full upload when >50% files changed (configurable)
  • Reuses existing incremental infrastructure for efficient processing

Configuration:

[index_server]
enable_compression = true           # Default true
supported_encodings = ["gzip", "zstd"]
enable_delta_upload = true          # Default true
delta_threshold_percent = 50        # Suggest full upload if >N% changed

Delta Upload Example:

curl -X POST http://localhost:8080/index/repos/company/core-lib/upload/delta \
  -H "Content-Type: application/octet-stream" \
  -H "Content-Encoding: gzip" \
  -H "X-CKB-Base-Commit: abc123" \
  -H "X-CKB-Target-Commit: def456" \
  -H 'X-CKB-Changed-Files: [{"path":"src/main.go","change_type":"modified"}]' \
  --data-binary @partial-index.scip.gz

Index Upload (v3 Federation Phase 2)

Push SCIP indexes to the index server via HTTP, eliminating the need for local filesystem paths. This transforms CKB from a "bring your database" model to a centralized index hosting service.

REST API Endpoints:

  • POST /index/repos — Create a new repo ready for upload
  • POST /index/repos/{repo}/upload — Upload SCIP index file (supports gzip, zstd compression)
  • POST /index/repos/{repo}/upload/delta — Delta upload (incremental changes only)
  • DELETE /index/repos/{repo} — Delete an uploaded repo

Upload Features:

  • Stream large files (100MB+) without memory issues
  • Auto-create repos on first upload (configurable)
  • Metadata headers: X-CKB-Commit, X-CKB-Language, X-CKB-Indexer-Name
  • Full SCIP processing: symbols, refs, call graph extraction
  • Compression support: gzip and zstd
  • Progress logging for large uploads

Configuration:

[index_server]
enabled = true
data_dir = "~/.ckb-server"      # Server data directory
max_upload_size = 524288000     # 500MB default
allow_create_repo = true        # Allow repo creation via API
enable_compression = true       # Accept compressed uploads
enable_delta_upload = true      # Enable incremental updates

Data Directory Structure:

~/.ckb-server/
├── repos/
│   └── company-core-lib/
│       ├── ckb.db        # SQLite database
│       └── meta.json     # Repo metadata
└── uploads/              # Temp directory for uploads

Remote Index Serving (v3 Federation Phase 1)

Serve symbol indexes over HTTP for remote federation clients. This enables cross-repository code intelligence without requiring clients to have direct database access.

Core Features:

  • Index Server Mode — New --index-server flag for ckb serve enables remote index endpoints
  • Multi-Repo Support — Serve multiple repositories from a single CKB instance
  • TOML Configuration — Configure repos, privacy settings, and pagination limits via config file
  • Read-Only Connections — Index server opens databases in read-only mode for safety

REST API Endpoints:

  • GET /index/repos — List all indexed repositories
  • GET /index/repos/{repo}/meta — Repository metadata and capabilities
  • GET /index/repos/{repo}/files — List files with cursor pagination
  • GET /index/repos/{repo}/symbols — List symbols with filtering and pagination
  • GET /index/repos/{repo}/symbols/{id} — Get single symbol by ID
  • POST /index/repos/{repo}/symbols:batchGet — Batch get multiple symbols
  • GET /index/repos/{repo}/refs — List references (call edges) with pagination
  • GET /index/repos/{repo}/callgraph — List call graph edges with filtering
  • GET /index/repos/{repo}/search/symbols — Search symbols by name
  • GET /index/repos/{repo}/search/files — Search files by path

Security & Privacy:

  • HMAC-Signed Cursors — Pagination cursors are signed to prevent tampering
  • Privacy Redaction — Per-repo controls for exposing paths, docs, and signatures
  • Path Prefix Stripping — Remove sensitive path prefixes from responses

CLI:

  • ckb serve --index-server — Enable index-serving endpoints
  • ckb serve --index-config <path> — Load configuration from TOML file

Configuration Example:

[index_server]
enabled = true
max_page_size = 10000

[[repos]]
id = "company/core-lib"
name = "Core Library"
path = "/repos/core-lib"

[default_privacy]
expose_paths = true
expose_docs = true
expose_signatures = true

Doc-Symbol Linking

Bridge documentation and code with automatic symbol detection:

Core Features:

  • Backtick detection - Automatically detect Symbol.Name references in markdown
  • Directive support - <!-- ckb:symbol --> for explicit references, <!-- ckb:module --> for module linking
  • Suffix resolution - Resolve UserService.Auth to full SCIP symbol ID with confidence scoring
  • Staleness detection - Find broken references when symbols are deleted or renamed

v1.1 Enhancements:

  • CI enforcement - --fail-under flag for ckb docs coverage to enforce minimum coverage in CI
  • Rename detection - Detect when documented symbols are renamed via alias chain, suggest new names
  • known_symbols directive - <!-- ckb:known_symbols Engine, Start --> allows single-segment detection
  • Fence symbol scanning - Extract identifiers from fenced code blocks using tree-sitter (8 languages)

CLI Commands:

  • ckb docs index - Scan and index documentation for symbol references
  • ckb docs symbol <name> - Find docs referencing a symbol
  • ckb docs file <path> - Show symbols in a document
  • ckb docs stale [path] - Check for stale references (or --all for all docs)
  • ckb docs coverage - Documentation coverage statistics
  • ckb docs module <id> - Find docs linked to a module

MCP Tools:

  • indexDocs - Scan and index documentation
  • getDocsForSymbol - Find docs referencing a symbol
  • getSymbolsInDoc - List symbols in a document
  • getDocsForModule - Find docs linked to a module
  • checkDocStaleness - Check for stale references
  • getDocCoverage - Coverage statistics

Multi-Repo Management

Quick context switching between multiple repositories in MCP sessions:

Core Features:

  • Global registry - Named repo shortcuts stored at ~/.ckb/repos.json
  • Smart --repo flag - Auto-detects if argument is a path or registry name
  • Multi-engine support - Up to 5 engines in memory with LRU eviction
  • Per-repo config - Each engine loads its own .ckb/config.json
  • Repo state tracking - valid, uninitialized, missing states

CLI Commands:

  • ckb repo add [name] [path] - Register a repository (path defaults to cwd)
  • ckb repo list - List repos grouped by state
  • ckb repo remove <name> - Unregister a repo
  • ckb repo rename <old> <new> - Rename a repo alias
  • ckb repo default [name] - Get or set default repo
  • ckb repo info [name] - Show detailed repo info
  • ckb repo which - Print current repo (for scripts)
  • ckb repo check - Validate all registered repos

MCP Tools:

  • listRepos - List registered repos with state and active status
  • switchRepo - Switch active repo context
  • getActiveRepo - Get current repo info

Command Flags:

  • ckb mcp --repo <name> - Start MCP with specific repo active
  • ckb serve --repo <name> - Start HTTP server for specific repo

Incremental Indexing (Go only)

Index updates in seconds instead of full reindex—O(changed files) instead of O(entire repo).

Core Features:

  • Git-based change detection — Uses git diff -z with NUL separators for accurate tracking
  • Rename support — Properly tracks git mv with old path cleanup
  • Delta extraction — Only processes SCIP documents for changed files
  • Delete+insert pattern — Clean updates without complex diffing logic
  • Index state tracking — Tracks "partial" vs "full" state with staleness warnings

Incremental Callgraph (v1.1)

Extends incremental indexing with call graph maintenance—outgoing calls from changed files are always accurate.

  • Call edge extraction — Extracts caller→callee edges during incremental updates
  • Tiered callable detection — Uses SymbolInformation.Kind first, falls back to (). heuristic
  • Caller resolution — Resolves enclosing function for each call site via line range matching
  • Location-anchored storage — Call edges stored with (caller_file, line, col, callee_id) for precision
  • Caller-owned edges — Edges deleted and rebuilt with their owning file (no stale outgoing calls)

Transitive Invalidation (v2)

Tracks file-level dependencies and automatically queues dependent files for rescanning when their dependencies change.

  • File dependency trackingfile_deps table tracks which files reference symbols from other files
  • Rescan queuerescan_queue table with BFS depth tracking and attempt counting
  • Four invalidation modes:
    • none — Disabled (no dependency tracking)
    • lazy — Enqueue dependents, drain on next full reindex (default)
    • eager — Enqueue and drain immediately with configurable budgets
    • deferred — Enqueue and drain periodically in background
  • Budget-limited drainingMaxRescanFiles (default: 200) and MaxRescanMs (default: 1500ms) limits
  • Cascade depth controlDepth setting limits BFS traversal (default: 1 = direct dependents only)

Accuracy Guarantees:

Query Type After Incremental After Queue Drained
Go to definition Always accurate Always accurate
Find refs FROM changed files Always accurate Always accurate
Find refs TO changed symbols May be stale Accurate
Call graph (callees/outgoing) Always accurate Always accurate
Call graph (callers/incoming) May be stale Accurate

Automatic Fallback:

  • Falls back to full reindex when >50% files changed
  • Falls back on schema version mismatch
  • Falls back when no tracked commit exists

CLI Changes:

  • ckb index — Incremental by default for Go projects
  • ckb index --force — Force full reindex when accuracy is critical

Configuration (.ckb/config.json):

{
  "incremental": {
    "threshold": 50,
    "indexTests": false,
    "excludes": ["vendor", "testdata"]
  },
  "transitive": {
    "enabled": true,
    "mode": "lazy",
    "depth": 1,
    "maxRescanFiles": 200,
    "maxRescanMs": 1500
  }
}

Files Added

Incremental Indexing v4:

  • internal/diff/ - Delta artifact generation
    • types.go - Delta JSON schema types
    • generator.go - Delta generation (compare two DBs)
    • validator.go - Delta validation logic
    • hasher.go - Canonical hash computation
  • internal/storage/fts.go - FTS5 maintenance (rebuild, vacuum, integrity-check)
  • internal/daemon/compaction.go - Compaction scheduler
  • internal/api/metrics.go - Prometheus metrics exporter
  • internal/api/middleware_load.go - Load shedding middleware
  • internal/api/handlers_delta.go - Delta ingestion endpoints
  • cmd/ckb/diff.go - ckb diff CLI command

Language Quality:

  • internal/project/quality.go - Language quality assessment module
  • internal/api/handlers_quality.go - Language quality API endpoints

Remote Federation Client:

  • internal/federation/ - Remote federation client
    • remote_types.go — Response types matching index server API
    • remote_config.go — Remote server configuration and env var expansion
    • remote_client.go — HTTP client with retry logic and all API methods
    • remote_cache.go — Caching wrapper with TTL management
    • hybrid.go — Local + remote query merging engine
    • remote_test.go — Tests for remote client and configuration
  • cmd/ckb/federation_remote.go - CLI commands for remote federation
  • internal/mcp/tool_impls_v74.go - MCP tool implementations for remote federation
  • internal/api/ - Remote index serving and upload
    • index_config.go — Configuration types and TOML loading (Phase 3: compression, delta config)
    • index_types.go — API response types
    • index_cursor.go — HMAC-signed cursor pagination
    • index_repos.go — Repository handle management (Phase 1 + 2 + 3)
    • index_redaction.go — Privacy redaction logic
    • index_queries.go — Database queries for symbols, files, refs, callgraph
    • index_storage.go — Server data directory management (Phase 2)
    • index_processor.go — SCIP processing pipeline (Phase 2 + 3 delta processing)
    • handlers_index.go — HTTP handlers for all index endpoints
    • handlers_upload.go — HTTP handlers with compression/progress (Phase 2 + 3)
    • handlers_upload_delta.go — Delta upload handler (Phase 3)
    • handlers_index_test.go — Tests for cursors, redaction, handlers
    • handlers_upload_test.go — Tests for upload, compression, delta (Phase 2 + 3)

Doc-Symbol Linking:

  • internal/docs/ - New package for doc-symbol linking
    • types.go - Core types (Document, DocReference, StalenessReport, etc.)
    • scanner.go - Markdown scanning with backtick/directive/fence detection
    • resolver.go - Symbol resolution with suffix matching
    • staleness.go - Staleness checking with rename detection
    • indexer.go - Document indexing orchestration
    • store.go - SQLite persistence for documents and references
    • coverage.go - Coverage analysis
    • fence_parser.go - Tree-sitter identifier extraction from fences
  • cmd/ckb/docs.go - CLI commands
  • internal/query/docs.go - Query engine integration
  • internal/mcp/handlers_docs.go - MCP tool handlers
  • internal/incremental/ — New package for incremental indexing
    • types.go — Core types (FileState, ChangeSet, FileDelta, DeltaStats, CallEdge, TransitiveConfig)
    • store.go — SQLite persistence for indexed_files, file_symbols, index_meta
    • detector.go — Git-based and hash-based change detection
    • extractor.go — SCIP delta extraction for changed files only
    • updater.go — Database updates with delete+insert pattern
    • deps.go — Transitive invalidation with file dependency tracking and rescan queue
    • indexer.go — Orchestration and state management
    • indexer_test.go, deps_test.go, types_test.go — Tests

Changed

  • internal/federation/config.go — Added RemoteServers field to Config struct
  • internal/federation/index.go — Schema v3 with remote_servers, remote_repos, remote_cache tables
  • internal/mcp/tools.go — Registered 7 new MCP tools for remote federation
  • internal/api/server.go — Added IndexRepoManager, NewServer now returns error
  • internal/api/routes.go — Added /index/* route registration
  • cmd/ckb/serve.go — Added --index-server and --index-config flags
  • internal/storage/schema.go — Schema v8 with callgraph, file_deps, and rescan_queue tables
  • cmd/ckb/index.go — Incremental indexing flow with --force flag

[7.2.0]

Added

ckb setup - Multi-Tool MCP Configuration

  • Interactive setup wizard for configuring CKB with AI coding tools
  • Support for 6 AI tools:
    • Claude Code - .mcp.json (project) or claude mcp add (global)
    • Cursor - .cursor/mcp.json (project/global)
    • Windsurf - ~/.codeium/mcp_config.json (global only)
    • VS Code - .vscode/mcp.json (project) or code --add-mcp (global)
    • OpenCode - opencode.json (project/global)
    • Claude Desktop - Platform-specific paths (global only)
  • --tool flag to skip interactive menu
  • --npx flag for portable npx-based setup
  • Windows path support for Windsurf and Claude Desktop

ckb index - Extended Language Support

  • Added 5 new languages:
    • C/C++ via scip-clang with --compdb flag for compile_commands.json
    • Dart via scip-dart
    • Ruby via scip-ruby with sorbet/config validation
    • C# via scip-dotnet with *.csproj detection
    • PHP via scip-php with vendor/bin check
  • Bounded-depth glob scanning for nested project detection
  • Language-specific validation and prerequisite checks

Smart Indexing

  • Skip-if-fresh: ckb index automatically skips reindexing when index matches current repo state
  • Freshness tracking: Detects commits behind HEAD and uncommitted changes to tracked files
  • Index metadata: Persists index info to .ckb/index-meta.json (commit hash, file count, duration)
  • Lock file: Prevents concurrent indexing with flock-based .ckb/index.lock

ckb status - Index Freshness Display

  • New "Index Status" section showing freshness with commit hash
  • Shows stale reasons: "3 commit(s) behind HEAD", "uncommitted changes detected"
  • Displays file count for fresh indexes

ckb mcp --watch - Auto-Reindex Mode

  • New --watch flag for poll-based auto-reindexing
  • Polls every 30 seconds, reindexes when stale
  • Uses lock file to prevent conflicts with manual ckb index
  • Logs reindex activity to stderr

Explicit Analysis Tiers

  • User-controllable analysis tiers: fast, standard, full
  • CLI flag: ckb search "foo" --tier=fast
  • Environment variable: CKB_TIER=standard
  • Config file: Add "tier": "standard" to .ckb/config.json
  • Tier display in ckb status shows mode (explicit vs auto-detected)
  • Precedence: CLI flag > env var > config > auto-detect

ckb doctor --tier - Tier-Aware Diagnostics

  • New --tier flag for tier-specific tool requirement checks
  • Shows per-language tool status (installed, version, path)
  • Displays missing tools with OS-specific install commands
  • Validates prerequisites (go.mod, package.json, Cargo.toml, etc.)
  • Accepts both naming conventions: basic/fast, enhanced/standard, full
  • Capability matrix showing which features are available per language
  • JSON output with --format json for scripting

Changed

  • Tier names rebranded: Basic → Fast, Enhanced → Standard, Full → Full

  • Multi-language detection now errors instead of silently defaulting to a language

Fixed

  • Fixed Kotlin indexer URL in documentation
  • Fixed PHP indexer URL in documentation

[7.1.0] - 2024-12-XX

Zero-Friction Operation - CKB v7.1 enables code intelligence without requiring a SCIP index upfront.

Added

Tree-sitter Symbol Fallback

  • Symbol extraction for 8 languages (Go, TypeScript, JavaScript, TSX, Python, Rust, Java, Kotlin)
  • searchSymbols works without SCIP index
  • Results include Source: "treesitter" and Confidence: 0.7 for transparency

ckb index Command

  • Auto-detects project language from manifests (go.mod, package.json, Cargo.toml, etc.)
  • Checks if SCIP indexer is installed, shows install instructions if not
  • --force flag for re-indexing, --dry-run to preview
  • Language-specific troubleshooting tips on failure

Universal MCP Documentation

  • Setup instructions for Claude Code, Cursor, Windsurf, VS Code, OpenCode, Claude Desktop
  • Windows cmd /c wrapper instructions

Files Added

  • internal/symbols/treesitter.go - Tree-sitter symbol extraction
  • internal/symbols/treesitter_test.go - Tests for all 8 languages
  • internal/project/detect.go - Language and indexer detection

[7.0.0] - 2024-12-XX

Added

  • Initial npm package release via @tastehub/ckb
  • 58 MCP tools for code intelligence

[6.5.0] - 2024-12-XX

Added

Developer Intelligence

  • Symbol OriginsexplainOrigin: Why does this code exist? Git history, linked issues/PRs
  • Co-change CouplinganalyzeCoupling: Find files that historically change together
  • LLM ExportexportForLLM: Token-efficient codebase summaries with importance ranking
  • Risk AuditauditRisk: 8-factor scoring (complexity, coverage, bus factor, security, staleness, errors, coupling, churn)

[6.4.0] - 2024-12-XX

Added

Runtime Observability

  • OpenTelemetry IntegrationgetTelemetryStatus: See real call counts, not just static analysis
  • Dead Code ConfidencefindDeadCodeCandidates: Find symbols with zero runtime calls
  • Observed CallersgetObservedUsage: Enrich impact analysis with production data

[6.3.0] - 2024-12-XX

Added

Contract-Aware Analysis

  • API Boundary DetectionlistContracts: Protobuf and OpenAPI contract discovery
  • Consumer Tracking — Three evidence tiers for cross-repo dependencies
  • Cross-Repo ImpactanalyzeContractImpact: "What breaks if I change this shared API?"
  • Contract DependenciesgetContractDependencies: See consumers and dependencies

[6.2.0] - 2024-12-XX

Added

Federation & Cross-Repository

  • Federation — Query across multiple repos organization-wide
  • Federation ToolslistFederations, federationStatus, federationSearchModules, federationSearchOwnership, federationGetHotspots
  • Daemon Mode — Always-on service with HTTP API, scheduled tasks, file watching, webhooks
  • Daemon ToolsdaemonStatus, listSchedules, listWebhooks
  • Tree-sitter ComplexitygetFileComplexity: Language-agnostic cyclomatic/cognitive complexity for 7 languages

[6.1.0] - 2024-12-XX

Added

Production Ready

  • Background Jobs — Queue long operations, track progress, cancel jobs
  • Job ToolsgetJobStatus, listJobs, cancelJob
  • CI/CD IntegrationsummarizePr: PR risk analysis, ownership drift detection
  • Ownership DriftgetOwnershipDrift: CODEOWNERS vs actual ownership

[6.0.0] - 2024-12-XX

Added

Architectural Memory

  • Ownership IntelligencegetOwnership: CODEOWNERS + git blame with time-weighted analysis
  • Module ResponsibilitiesgetModuleResponsibilities: What does this module do?
  • Architectural DecisionsrecordDecision, getDecisions: ADRs with full-text search
  • Module AnnotationsannotateModule: Add module metadata
  • Architecture RefreshrefreshArchitecture: Rebuild architectural model

[5.2.0] - 2024-12-XX

Added

Discovery & Flow

  • Usage TracingtraceUsage: How is this symbol reached?
  • EntrypointslistEntrypoints: System entrypoints (API, CLI, jobs)
  • File OrientationexplainFile: File-level orientation
  • Path ExplanationexplainPath: Why does this path exist?
  • Diff SummarysummarizeDiff: What changed, what might break?
  • Architecture OverviewgetArchitecture: Module dependency overview
  • HotspotsgetHotspots: Volatile areas with trends
  • Key ConceptslistKeyConcepts: Domain concepts in codebase
  • Recently RelevantrecentlyRelevant: What matters now?

[5.1.0] - 2024-12-XX

Added

Core Navigation

  • Symbol SearchsearchSymbols: Find symbols by name with filtering
  • Symbol DetailsgetSymbol: Get symbol details
  • ReferencesfindReferences: Find all usages
  • Symbol ExplanationexplainSymbol: AI-friendly symbol explanation
  • Symbol JustificationjustifySymbol: Keep/investigate/remove verdict
  • Call GraphgetCallGraph: Caller/callee relationships
  • Module OverviewgetModuleOverview: Module statistics
  • Impact AnalysisanalyzeImpact: Change risk analysis
  • System StatusgetStatus: System health
  • Diagnosticsdoctor: System diagnostics