Skip to content

Commit ac4389d

Browse files
authored
feat: rework decision-card to make it based on AST parsing (#41)
* test: decision card output shape and scope-prefixed snippets * docs: release notes for ranked search and edit decision card * sanitize docs, one last moment fix
1 parent 03964b3 commit ac4389d

8 files changed

Lines changed: 589 additions & 49 deletions

File tree

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ These are non-negotiable. Every PR, feature, and design decision must respect th
2222
- **Never stage/commit `.planning/**`\*\* (or any other local workflow artifacts) unless the user explicitly asks in that message.
2323
- **Never use `gsd-tools ... commit` wrappers** in this repo. Use plain `git add <exact files>` and `git commit -m "..."`.
2424
- **Before every commit:** run `git status --short` and confirm staged files match intent; abort if any `.planning/**` is staged.
25-
25+
- **Avoid using `any` Type AT ALL COSTS.
2626
## Evaluation Integrity (NON-NEGOTIABLE)
2727

2828
These rules prevent metric gaming, overfitting, and false quality claims. Violation of these rules means the feature CANNOT ship.

CHANGELOG.md

Lines changed: 13 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,16 @@
22

33
## [Unreleased]
44

5-
## [Unshipped - Phase 09] - High-Signal Search + Decision Card
6-
7-
Cleaned up the edit decision card and sharpened search ranking for exact-name queries.
8-
95
### Added
106

11-
- **Definition-first ranking (SEARCH-01)**: For exact-name queries (PascalCase/camelCase), the file that *defines* a symbol now ranks above files that merely use it. Symbol-level dedup ensures multiple methods from the same class don't clog the top slots.
12-
- **Smart snippets with scope headers (SEARCH-02)**: When `includeSnippets: true`, code chunks from symbol-aware analysis include a scope comment header (`// ClassName.methodName`) before the snippet, giving structural context without extra disk reads.
13-
- **Clean decision card (PREF-01-04)**: The preflight response for `intent="edit"|"refactor"|"migrate"` is now a decision card: `ready`, `nextAction` (if not ready), `warnings`, `patterns` (do/avoid capped at 3), `bestExample` (top golden file), `impact` (caller coverage + top files), and `whatWouldHelp`. Internal fields like `evidenceLock`, `riskLevel`, `confidence` are no longer exposed.
14-
- **Impact coverage gating (PREF-02)**: When result files have known callers (from import graph), the card shows caller coverage: "X/Y callers in results". Low coverage (< 40% with > 3 total callers) triggers an epistemic stress alert.
15-
- **whatWouldHelp recommendations (PREF-03)**: When `ready=false`, concrete next steps appear: search more specifically, call `get_team_patterns`, search for uncovered callers, or check memories. Each is actionable in 1-2 sentences.
16-
17-
### Changed
18-
19-
- **Preflight shape**: `{ ready, reason?, ... }``{ ready, nextAction?, warnings?, patterns?, bestExample?, impact?, whatWouldHelp? }`. `reason` renamed to `nextAction` for clarity. No breaking changes to `ready` (stays top-level).
20-
21-
### Fixed
22-
23-
- Agents no longer parse unstable internal fields. Preflight output is stable by design.
24-
- Snippets now include scope context, reducing ambiguity for symbol-heavy edits.
25-
26-
## [Unreleased]
27-
28-
### Added
29-
30-
- **Index versioning (Phase 06)**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
31-
- **Crash-safe rebuilds (Phase 06)**: Full rebuilds write to `.staging/` and swap atomically only on success. Failed rebuilds don't corrupt the active index.
32-
- **Relationship sidecar (Phase 07)**: New `relationships.json` artifact containing file import graph, reverse imports, and symbol export index. Updated incrementally alongside the main index.
33-
- **References confidence + hints (Phase 08)**: `get_symbol_references` now includes `confidence: "syntactic"` and `isComplete: boolean` to help agents assess result completeness. `search_codebase` results now include a structured `hints` object (capped callers/consumers/tests ranked by frequency) drawn from the relationships sidecar. `get_component_usage` removed from MCP surface (11→10 tools).
7+
- **Definition-first ranking**: Exact-name searches now show the file that *defines* a symbol before files that use it. For example, searching `parseConfig` shows the function definition first, then callers.
8+
- **Scope headers in code snippets**: When requesting snippets (`includeSnippets: true`), each code block now starts with a comment like `// UserService.login()` so agents know where the code lives without extra file reads.
9+
- **Edit decision card**: When searching with `intent="edit"`, `intent="refactor"`, or `intent="migrate"`, results now include a decision card telling you whether there's enough evidence to proceed safely. The card shows: whether you're ready (`ready: true/false`), what to do next if not (`nextAction`), relevant team patterns to follow, a top example file, how many callers appear in results (`impact.coverage`), and what searches would help close gaps (`whatWouldHelp`).
10+
- **Caller coverage tracking**: The decision card shows how many of a symbol's callers are in your search results. Low coverage (less than 40% when there are lots of callers) triggers an alert so you know to search more before editing.
11+
- **Index versioning**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
12+
- **Crash-safe rebuilds**: Full rebuilds write to `.staging/` and swap atomically only on success. Failed rebuilds don't corrupt the active index.
13+
- **Relationship sidecar**: New `relationships.json` artifact containing file import graph, reverse imports, and symbol export index. Updated incrementally alongside the main index.
14+
- **References confidence + hints**: `get_symbol_references` now includes `confidence: "syntactic"` and `isComplete: boolean` to help agents assess result completeness. `search_codebase` results now include a structured `hints` object (capped callers/consumers/tests ranked by frequency) drawn from the relationships sidecar. **`get_component_usage` removed from MCP surface (11→10 tools).** If you previously used `get_component_usage`, use `get_symbol_references` for symbol usage evidence (usageCount, top snippets, callers/consumers).
3415
- Tree-sitter-backed symbol extraction is now used by the Generic analyzer when available (with safe fallbacks).
3516
- Expanded language/extension detection to improve indexing coverage (e.g. `.pyi`, `.php`, `.kt`/`.kts`, `.cc`/`.cxx`, `.cs`, `.swift`, `.scala`, `.toml`, `.xml`).
3617
- New tool: `get_symbol_references` for concrete symbol usage evidence (usageCount + top snippets).
@@ -39,8 +20,13 @@ Cleaned up the edit decision card and sharpened search ranking for exact-name qu
3920
- Second frozen eval fixture plus an in-repo controlled TypeScript codebase for fully-offline eval runs.
4021
- Regression tests covering Tree-sitter Unicode slicing, parser cleanup/reset behavior, and large/generated file skipping.
4122

23+
### Changed
24+
25+
- **Preflight response shape**: Renamed `reason` to `nextAction` for clarity. Removed internal fields (`evidenceLock`, `riskLevel`, `confidence`) so the output is stable and doesn't change shape unexpectedly.
26+
4227
### Fixed
4328

29+
- Null-pointer crash in GenericAnalyzer when chunk content is undefined.
4430
- Tree-sitter symbol extraction now treats node offsets as UTF-8 byte ranges and evicts cached parsers on failures/timeouts.
4531

4632
## [1.6.2] - 2026-02-17

README.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Here's what codebase-context does:
1616

1717
**Remembers across sessions** - Decisions, failures, workarounds that look wrong but exist for a reason - the battle scars that aren't in the comments. Recorded once, surfaced automatically so the agent doesn't "clean up" something you spent a week getting right. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted.
1818

19-
**Checks before editing** - A preflight card with risk level, patterns to use and avoid, failure warnings, and a `readyToEdit` evidence check. Catches the "confidently wrong" problem: when code, team memories, and patterns contradict each other, it tells the agent to ask instead of guess. If evidence is thin or contradictory, it says so.
19+
**Checks before editing** - Before editing something, you get a decision card showing whether there's enough evidence to proceed. If a symbol has four callers and only two appear in your search results, the card shows that coverage gap. If coverage is low, `whatWouldHelp` lists the specific searches to run before you touch anything. When code, team memories, and patterns contradict each other, it tells you to look deeper instead of guessing.
2020

2121
One tool call returns all of it. Local-first - your code never leaves your machine.
2222

@@ -119,12 +119,21 @@ This is where it all comes together. One call returns:
119119
- **Code results** with `file` (path + line range), `summary`, `score`
120120
- **Type** per result: compact `componentType:layer` (e.g., `service:data`) — helps agents orient
121121
- **Pattern signals** per result: `trend` (Rising/Declining — Stable is omitted) and `patternWarning` when using legacy code
122-
- **Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests)
122+
- **Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests) — so you see suggested next reads and know what you haven't looked at yet
123123
- **Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
124124
- **Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
125125
- **Preflight**: `ready` (boolean) with decision card when `intent="edit"|"refactor"|"migrate"`. Shows `nextAction` (if not ready), `warnings`, `patterns` (do/avoid), `bestExample`, `impact` (caller coverage), and `whatWouldHelp` (next steps). If search quality is low, `ready` is always `false`.
126126

127-
Snippets are opt-in (`includeSnippets: true`). Default output is lean — if the agent wants code, it calls `read_file`.
127+
Snippets are optional (`includeSnippets: true`). When enabled, snippets that have symbol metadata (e.g. from the Generic analyzer's AST chunking or Angular component chunks) start with a scope header so you know where the code lives (e.g. `// AuthService.getToken()` or `// SpotifyApiService`). Example:
128+
129+
```ts
130+
// AuthService.getToken()
131+
getToken(): string {
132+
return this.token;
133+
}
134+
```
135+
136+
Default output is lean — if the agent wants code, it calls `read_file`.
128137

129138
```json
130139
{
@@ -189,7 +198,7 @@ Record a decision once. It surfaces automatically in search results and prefligh
189198
| ------------------------------ | ------------------------------------------------------------------------------------------- |
190199
| `search_codebase` | Hybrid search + decision card. Pass `intent="edit"` to get `ready`, `nextAction`, patterns, caller coverage, and `whatWouldHelp`. |
191200
| `get_team_patterns` | Pattern frequencies, golden files, conflict detection |
192-
| `get_symbol_references` | Find concrete references to a symbol (usageCount + top snippets + confidence + completeness) |
201+
| `get_symbol_references` | Find concrete references to a symbol (usageCount + top snippets). `confidence: "syntactic"` = static/source-based only; no runtime or dynamic dispatch. |
193202
| `remember` | Record a convention, decision, gotcha, or failure |
194203
| `get_memory` | Query team memory with confidence decay scoring |
195204
| `get_codebase_metadata` | Project structure, frameworks, dependencies |
@@ -200,7 +209,7 @@ Record a decision once. It surfaces automatically in search results and prefligh
200209

201210
## Evaluation Harness (`npm run eval`)
202211

203-
Reproducible evaluation with frozen fixtures so ranking/chunking changes are measured honestly and regressions get caught.
212+
Reproducible evaluation with frozen fixtures so ranking/chunking changes are measured honestly and regressions get caught. **For contributors and CI:** run before releases or after changing search/ranking/chunking to guard against regressions.
204213

205214
- Two codebases: `npm run eval -- <codebaseA> <codebaseB>`
206215
- Defaults: fixture A = `tests/fixtures/eval-angular-spotify.json`, fixture B = `tests/fixtures/eval-controlled.json`
@@ -214,11 +223,13 @@ npm run eval -- tests/fixtures/codebases/eval-controlled tests/fixtures/codebase
214223
```
215224

216225
- Flags: `--help`, `--fixture-a`, `--fixture-b`, `--skip-reindex`, `--no-rerank`, `--no-redact`
226+
- To save a report for later comparison, redirect stdout (e.g. `pnpm run eval -- <path-to-angular-spotify> --skip-reindex > internal-docs/tests/eval-runs/angular-spotify-YYYY-MM-DD.txt`).
217227

218228
## How the Search Works
219229

220230
The retrieval pipeline is designed around one goal: give the agent the right context, not just any file that matches.
221231

232+
- **Definition-first ranking** - for exact-name lookups (e.g. a symbol name), the file that *defines* the symbol ranks above files that only use it.
222233
- **Intent classification** - knows whether "AuthService" is a name lookup or "how does auth work" is conceptual. Adjusts keyword/semantic weights accordingly.
223234
- **Hybrid fusion (RRF)** - combines keyword and semantic search using Reciprocal Rank Fusion instead of brittle score averaging.
224235
- **Query expansion** - conceptual queries automatically expand with domain-relevant terms (auth → login, token, session, guard).
@@ -229,13 +240,15 @@ The retrieval pipeline is designed around one goal: give the agent the right con
229240
- **Version gating** - index artifacts are versioned; mismatches trigger automatic rebuild so mixed-version data is never served.
230241
- **Auto-heal** - if the index corrupts, search triggers a full re-index automatically.
231242

243+
**Index reliability:** Rebuilds write to a staging directory and swap atomically only on success, so a failed rebuild never corrupts the active index. Version mismatches or corruption trigger an automatic full re-index (no user action required).
244+
232245
## Language Support
233246

234-
Over **30+ languages** are supported for indexing + retrieval: TypeScript/JavaScript, Python (incl `.pyi`), PHP, Ruby, Java, Kotlin (`.kt`/`.kts`), Go, Rust, C/C++ (incl `.cc`/`.cxx`), C#, Swift, Scala, Shell, plus common config/markup formats (JSON/YAML/TOML/XML, etc.).
247+
**10 languages** have full symbol extraction (Tree-sitter): TypeScript, JavaScript, Python, Java, Kotlin, C, C++, C#, Go, Rust. **30+ languages** have indexing and retrieval coverage (keyword + semantic), including PHP, Ruby, Swift, Scala, Shell, and config/markup (JSON/YAML/TOML/XML, etc.).
235248

236249
Enrichment is framework-specific: right now only **Angular** has a dedicated analyzer for rich conventions/context (signals, standalone components, control flow, DI patterns).
237250

238-
For non-Angular projects, the **Generic** analyzer still provides broad coverage, and will use Tree-sitter symbol extraction when a grammar is available (otherwise it falls back to safe parsing).
251+
For non-Angular projects, the **Generic** analyzer uses **AST-aligned chunking** when a Tree-sitter grammar is available: symbol-bounded chunks with **scope-aware prefixes** (e.g. `// ClassName.methodName`) so snippets show where code lives. Without a grammar it falls back to safe line-based chunking.
239252

240253
Structured filters available: `framework`, `language`, `componentType`, `layer` (presentation, business, data, state, core, shared).
241254

docs/capabilities.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@ Technical reference for what `codebase-context` ships today. For the user-facing
44

55
## Tool Surface
66

7-
10 MCP tools + 1 optional resource (`codebase://context`).
7+
10 MCP tools + 1 optional resource (`codebase://context`). **Migration:** `get_component_usage` was removed; use `get_symbol_references` for symbol usage evidence.
88

99
### Core Tools
1010

1111
| Tool | Input | Output |
1212
| ----------------------- | ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
1313
| `search_codebase` | `query`, optional `intent`, `limit`, `filters`, `includeSnippets` | Ranked results (`file`, `summary`, `score`, `type`, `trend`, `patternWarning`, `relationships`, `hints`) + `searchQuality` + decision card (`ready`, `nextAction`, `patterns`, `bestExample`, `impact`, `whatWouldHelp`) when `intent="edit"`. Hints capped at 3 per category. |
1414
| `get_team_patterns` | optional `category` | Pattern frequencies, trends, golden files, conflicts |
15-
| `get_symbol_references` | `symbol`, optional `limit` | Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence` ("syntactic") + `isComplete` boolean |
15+
| `get_symbol_references` | `symbol`, optional `limit` | Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence` + `isComplete`. `confidence: "syntactic"` means static/source-based only (no runtime or dynamic dispatch). Replaces the removed `get_component_usage`. |
1616
| `remember` | `type`, `category`, `memory`, `reason` | Persists to `.codebase-context/memory.json` |
1717
| `get_memory` | optional `category`, `type`, `query`, `limit` | Memories with confidence decay scoring |
1818

@@ -121,12 +121,12 @@ Returned as `preflight` when search `intent` is `edit`, `refactor`, or `migrate`
121121
## Analyzers
122122

123123
- **Angular**: signals, standalone components, control flow syntax, lifecycle hooks, DI patterns, component metadata
124-
- **Generic**: 30+ languages TypeScript, JavaScript, Python, Java, Kotlin, C/C++, C#, Go, Rust, PHP, Ruby, Swift, Scala, Shell, config/markup formats
124+
- **Generic**: 30+ have indexing/retrieval coverage including PHP, Ruby, Swift, Scala, Shell, config/markup., 10 languages have full symbol extraction (Tree-sitter: TypeScript, JavaScript, Python, Java, Kotlin, C, C++, C#, Go, Rust).
125125

126126
Notes:
127127

128128
- Language detection covers common extensions including `.pyi`, `.kt`/`.kts`, `.cc`/`.cxx`, and config formats like `.toml`/`.xml`.
129-
- When Tree-sitter grammars are present, the Generic analyzer can derive symbol components from Tree-sitter extraction (with fallbacks).
129+
- When Tree-sitter grammars are present, the Generic analyzer uses AST-aligned chunking and scope-aware prefixes for symbol-aware snippets (with fallbacks).
130130

131131
## Evaluation Harness
132132

src/analyzers/generic/index.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,10 @@ export class GenericAnalyzer implements FrameworkAnalyzer {
498498
const fileName = path.basename(chunk.filePath);
499499
const { language, componentType, content } = chunk;
500500

501+
if (!content) {
502+
return `${language} ${componentType || 'code'} in ${fileName}`;
503+
}
504+
501505
// Try to extract meaningful information
502506
const firstComment = this.extractFirstComment(content);
503507
if (firstComment) {
@@ -526,7 +530,9 @@ export class GenericAnalyzer implements FrameworkAnalyzer {
526530
return `${language} code in ${fileName}: ${firstLine ? firstLine.trim().slice(0, 60) + '...' : 'code definition'}`;
527531
}
528532

529-
private extractFirstComment(content: string): string {
533+
private extractFirstComment(content: string | null | undefined): string {
534+
if (!content) return '';
535+
530536
// Try JSDoc style
531537
const jsdocMatch = content.match(/\/\*\*\s*\n?\s*\*\s*(.+?)(?:\n|\*\/)/);
532538
if (jsdocMatch) return jsdocMatch[1].trim();

0 commit comments

Comments
 (0)