Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions .agents/skills/codestory-grounding/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,13 @@ checkout is only the tool artifact unless the user is editing CodeStory itself.
- When `packet` reports `sufficient` and `follow_up_commands` is empty, answer
from the packet; budget truncation alone is not a gap. Preserve supported-claim
wording and include a compact "Support files" list from `answer.citations` and
`sufficiency.avoid_opening`.
`sufficiency.avoid_opening`. Do not run ordinary source reads, `rg`, `grep`, or
`git show` only to verify packet citations; run more commands only for a named
unresolved gap, an edit target, or a user-requested worktree proof.
- When `packet` reports `partial`, read `sufficiency.follow_up_commands` and run
those commands in order. Prefer listed targeted `search --why` commands before
escalating to a larger packet budget. As soon as a follow-up packet becomes
sufficient, stop exploration and answer from that packet.
- When `search --why` emits `search_plan`, use its subqueries, anchor groups,
bridge evidence, next commands, and source-truth checks as the follow-up plan,
not as final answer prose.
Expand All @@ -48,9 +54,10 @@ checkout is only the tool artifact unless the user is editing CodeStory itself.
- Treat repo-text, semantic suggestions, speculative OpenAPI edges, and
cross-language framework hits as navigation hints until typed graph evidence,
snippets, trails, or direct source reads support the claim.
- If `doctor` reports semantic retrieval as partial, stale, or failed, prefer
`search --repo-text on --why`, `symbol`, `trail`, and `snippet` until a full
refresh and embedding setup restore healthy retrieval.
- If `doctor` reports retrieval as partial, stale, stubbed, hash-vector, or
failed, treat product retrieval as unavailable until `retrieval_mode=full` is
restored. Repo-text output is diagnostic only; do not use it as a substitute
for mandatory sidecar evidence.

## Command Routing

Expand All @@ -75,15 +82,15 @@ Detailed argument tables, output examples, and usage patterns for each command:
- [ground](references/ground.md) - Compact codebase context snapshot
- [doctor](references/doctor.md) - Read-only project/cache/index/retrieval health check
- [packet](references/packet.md) - Broad task packet with sufficiency contract
- [search](references/search.md) - Search indexed symbols and repo text
- [search](references/search.md) - Search mandatory sidecar indexes
- [context](references/context.md) - Deep evidence packet for a concrete target
- [symbol](references/symbol.md) - Inspect a symbol's details and relationships
- [trail](references/trail.md) - Follow a symbol's call/reference graph
- [snippet](references/snippet.md) - Fetch source code context around a symbol
- [drill](references/drill.md) - Build a repeatable evidence packet for agent-grounding drills
- [drill-suite](references/drill-suite.md) - Run a manifest-defined cross-repo real-repo agent drill matrix
- [query](references/query.md) - Structured graph query pipelines
- [explore](references/explore.md) - Interactive terminal exploration with Markdown/JSON fallback
- [explore](references/explore.md) - Interactive terminal exploration with Markdown/JSON output
- [files](references/files.md) - Indexed file inventory and coverage markers
- [affected](references/affected.md) - Changed-file impact analysis
- [bookmark](references/bookmark.md) - Save reusable investigation focus nodes
Expand Down
10 changes: 5 additions & 5 deletions .agents/skills/codestory-grounding/references/doctor.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ Reads project/cache/index/retrieval health without mutating the index. Use it at

| Path | Command | Expected result |
|------|---------|-----------------|
| Normal path | `<codestory-cli> doctor --project <target-workspace>` | Reports project root, cache path, indexed stats, retrieval state, managed embedding setup, environment hints, and next commands. |
| Failure path | If cache or index checks warn, run `index --project <target-workspace> --refresh full`; if managed embeddings are missing, run `setup embeddings --project <target-workspace>`; if semantic reports `semantic partial`, `semantic stale`, or `semantic failed`, rebuild before `context` or continue with `search --repo-text on --why` plus focused `symbol`/`trail`/`snippet`. | Separates missing index, missing managed assets, stale semantic docs, partial semantic docs, and lexical fallback. |
| Normal path | `<codestory-cli> doctor --project <target-workspace>` | Reports project root, cache path, indexed stats, retrieval state, sidecar embedding setup, environment hints, and next commands. |
| Failure path | If cache or index checks warn, run `index --project <target-workspace> --refresh full`; if mandatory sidecars are missing or stale, run the setup/index commands surfaced by `doctor`; if semantic reports `semantic partial`, `semantic stale`, or `semantic failed`, rebuild before trusting broad packet/search evidence. | Separates missing index, stale semantic docs, partial semantic docs, and mandatory retrieval setup failures. |
| Integration edge | Use doctor before `ground`, `search --why`, `explore`, `context`, or `serve`; its next commands are the safe follow-up loop. | Prevents read commands from silently querying the wrong or empty cache. |

## Notes

- `doctor` does not accept `--refresh`; it is a read-only health surface.
- The `attention:` block repeats warnings first so agents do not miss semantic partial/stale/failure messages buried in the full check list.
- Environment rows report retrieval-related variables such as `CODESTORY_EMBED_PROFILE`, `CODESTORY_EMBED_BACKEND`, and `CODESTORY_EMBED_RUNTIME_MODE`.
- The `managed_embeddings` check distinguishes missing managed ONNX assets, installed assets, disabled/hash mode, and intentionally selected external legacy llama.cpp backend state.
- Treat `semantic ok` as the only health state suitable for broad repository explanation prompts. Treat `semantic partial`, `semantic stale`, and `semantic failed` as instructions to rebuild or use lexical/repo-text fallback.
- Environment rows report retrieval-related variables such as `CODESTORY_EMBED_BACKEND`, `CODESTORY_EMBED_LLAMACPP_URL`, and sidecar enablement flags.
- The embedding checks distinguish product llama.cpp sidecar state from hash, ONNX, disabled, or stale diagnostic states.
- Treat `semantic ok` plus `retrieval_mode=full` as the health state suitable for broad repository explanation prompts. Treat `semantic partial`, `semantic stale`, `semantic failed`, and non-`full` retrieval modes as instructions to repair setup or rebuild before trusting agent-facing evidence.
- Prefer JSON for CI or doc-contract checks.
12 changes: 11 additions & 1 deletion .agents/skills/codestory-grounding/references/drill-suite.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Allowed claim classifications are `correct`, `partial`, `misleading`, and
| `--output-dir` | path | **required** | Directory for aggregate suite reports and per-case drill artifacts |
| `--refresh` | enum | `full` | Refresh strategy passed to each per-case drill: `auto`, `full`, `incremental`, `none` |
| `--format` | enum | `json` | Primary aggregate output format: `json` or `markdown` |
| `--jobs` | integer | `1` | Read-only workers for `--refresh none`; multiple cases run in parallel, a single case parallelizes anchors and bridge checks |

## Output

Expand All @@ -98,11 +99,20 @@ retrieval mode, anchor resolution, bridge status, source-truth check counts,
expected-file recall, source-truth target roles/ranking reasons, bridge
`evidence_kind`, claim classification counts, and next actions. A case can
be mechanically healthy but still `degraded` when source-truth verification is
required, bridge evidence is partial, retrieval is symbolic-only, freshness is
required, bridge evidence is partial, retrieval needs repair, freshness is
stale, expected files were missed, or the ledger records partial/materially
revised claims. A failed case is recorded as `blocked` instead of aborting the
whole suite, so other manifest cases still produce evidence.

`--jobs` is default-off and only applies to read-only `--refresh none` loops.
It leaves refreshing or indexing runs serialized, caps worker count
automatically, preserves final manifest order in aggregate reports, and writes
each single-case drill's anchor and bridge artifacts in deterministic report
order.
Measure it on the target suite before treating it as a speed-up: multi-case
manifests can benefit from parallel isolated cases, while single-case anchor
and bridge checks may be limited by storage and graph traversal contention.

Per-case `drill` runs include the broad question search plus bounded
supplemental searches for terms such as public pages, home components, Payload
collections, social feeds, comments, and store crates. Those hits are added as
Expand Down
19 changes: 16 additions & 3 deletions .agents/skills/codestory-grounding/references/drill.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Runs a deterministic evidence collection pass for a realistic codebase question.
| `--output-dir` | path | **required** | Directory for the drill report and artifacts; created if missing |
| `--refresh` | enum | `full` | Refresh strategy: `auto`, `full`, `incremental`, `none` |
| `--format` | enum | `markdown` | Primary output format: `markdown` or `json` |
| `--jobs` | integer | `1` | Read-only anchor and bridge evidence workers for `--refresh none`; capped automatically |

## Output

Expand All @@ -36,7 +37,7 @@ The report includes:
- chosen anchor, endpoint files, and source-truth verification targets
- an `evidence_packet` with typed evidence items, repo-text hints, negative evidence, source locations, confidence, and readiness status
- an Answer Readiness report with `safe_to_say`, `inferred_claims`, `needs_verification`, `next_commands`, and `source_truth_checks`
- compact mechanical status, retrieval/freshness status, bridge counts, source-truth file list plus target roles/ranking reasons, and verdict/next action in `drill-summary.json`
- compact mechanical status, retrieval/freshness status, drill runtime timings, bridge counts, source-truth file list plus target roles/ranking reasons, and verdict/next action in `drill-summary.json`
- an answer-quality contract requiring a CodeStory-only draft before source reads and source-truth verification afterward
- a fillable claim-ledger template for source-truth classification, correction counts, and material-revision tracking
- a verification checklist requiring `correct`, `partial`, `misleading`, or `unsupported` classifications
Expand All @@ -49,15 +50,27 @@ The report includes:

# JSON-first run for automation, while still writing Markdown too
<codestory-cli> drill --project <target-workspace> --refresh none --anchors EntryPoint,Coordinator,BackingStore --output-dir target/drill/entrypoint-flow --format json

# Optional read-only anchor and bridge workers against an already-fresh local index
<codestory-cli> drill --project <target-workspace> --refresh none --anchors EntryPoint,Coordinator,BackingStore --output-dir target/drill/entrypoint-flow --format json --jobs 4
```

## Interpretation

Use the drill report as the CodeStory-only phase. Draft the architecture answer from those artifacts first, then open only files named or implied by the artifacts and classify each claim against source truth. If the answer changes materially after source reads, record that as a CodeStory or agent-UX finding.

Start with `drill-summary.json` for compact health, retrieval/freshness state, bridge status, bridge `evidence_kind`, source-truth target roles, and the verdict next action, then read `evidence_packet.readiness`. Claims in `safe_to_say` are anchored enough for a draft. Claims in `inferred_claims` or `needs_verification` must stay uncertain until the listed `source_truth_checks` or equivalent source reads confirm them. Repo-text and cross-language framework hits are navigation hints unless supported by typed symbol/trail/snippet evidence or source-truth verification. A `source_truth_only` bridge is deliberately not proof; it means CodeStory found the concrete files to read but no typed graph/framework/data path strong enough to answer without source verification.
Start with `drill-summary.json` for compact health, retrieval/freshness state, drill runtime timings, bridge status, bridge `evidence_kind`, source-truth target roles, and the verdict next action, then read `evidence_packet.readiness`. Claims in `safe_to_say` are anchored enough for a draft. Claims in `inferred_claims` or `needs_verification` must stay uncertain until the listed `source_truth_checks` or equivalent source reads confirm them. Repo-text and cross-language framework hits are navigation hints unless supported by typed symbol/trail/snippet evidence or source-truth verification. A `source_truth_only` bridge is deliberately not proof; it means CodeStory found the concrete files to read but no typed graph/framework/data path strong enough to answer without source verification.

`mechanical.drill_timings` breaks the evidence-collection runtime into setup, question search, anchor resolution, supplemental search, bridge evidence, and evidence assembly. Per-anchor `timings`, command `duration_ms`, and summary `slowest_command` fields further split anchor work into search, query resolution, consumer-summary, and artifact-command costs. Use these fields to localize slow drills before changing ranking or graph traversal logic; they are diagnostic timing, not answer-quality evidence by themselves.

Consumer summaries inspect direct incoming production consumers for the selected anchor first. Related payload/API/native targets are searched only when the selected anchor has no visible graph consumers, so ordinary drills do not pay broad related-target search costs unless the direct graph evidence is missing.

If `drill-summary.json` reports stale freshness, refresh the index before promoting claims. If retrieval is not full or semantic diagnostics report degraded state, repair sidecars before trusting broad natural-language recall; use symbol, trail, snippet, and source-truth files deliberately while the run is degraded.

If `drill-summary.json` reports stale freshness, refresh the index before promoting claims. If retrieval is symbolic-only or semantic fallback is reported, broad natural-language recall is degraded even when exact anchors resolve; use repo-text, symbol, trail, snippet, and source-truth files deliberately.
`--jobs` is default-off and read-only. Use it only with `--refresh none` after
the index is fresh, and measure the run: multi-case suites can benefit from
parallel case execution, while single-case anchor resolution and bridge checks
may be limited by storage and graph traversal contention on some repos.

The optional `question_search` artifact and any `question_supplemental_searches` are intentionally partial discovery evidence. They can add public page, component, collection, and store files to the source-truth checklist when the broad question points there, but they do not prove the architecture by themselves. Use them to avoid missing verification files, then rely on each anchor's symbol/trail/explore/snippet artifacts and focused source reads before promoting claims.

Expand Down
10 changes: 5 additions & 5 deletions .agents/skills/codestory-grounding/references/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,14 @@ High-signal environment toggles:

| Variable | Use |
|----------|-----|
| `CODESTORY_HYBRID_RETRIEVAL_ENABLED=false` | Disable hybrid retrieval and use symbolic ranking. |
| `CODESTORY_SEMANTIC_DOC_SCOPE=all` | Include all-symbol semantic docs. Accepted all-symbol aliases are `all`, `full`, `all-symbols`, and `all_symbols`; omitted or other values default to durable symbols. |
| `CODESTORY_EMBED_BACKEND=onnx` | Use the managed ONNX backend. |
| `CODESTORY_EMBED_RUNTIME_MODE=hash` | Use deterministic hash embeddings for local smoke checks. |
| `CODESTORY_EMBED_BACKEND=llamacpp` | Use the mandatory local llama.cpp embedding sidecar. |
| `CODESTORY_EMBED_LLAMACPP_URL=http://127.0.0.1:8080/v1/embeddings` | Product embedding endpoint for bge-base sidecar vectors. |
| `CODESTORY_SUMMARY_ENDPOINT=local` | Enable deterministic local summaries with `--summarize`. |

Use other embedding, alias, batch-size, tokenizer, provider, llama.cpp, and
summary tuning variables only for focused profiling or compatibility work.
Use other embedding, alias, batch-size, tokenizer, provider, hash, ONNX, and
summary tuning variables only for focused diagnostics or historical comparisons.
Agent-facing retrieval requires full sidecar readiness.

## Output

Expand Down
Loading
Loading