Current deterministic baseline + literal index + GLiNER2 enrichment already performs strongly. GLiREL should be considered as an optional, targeted capability, not a default indexing path.
- Implicit cross-file relationships with weak AST linkage:
- config keys, feature flags, event names, queue topics, metric IDs, permission strings.
- Legacy aliasing and semantic drift:
- same concept represented under different historical names.
- Multi-step operational flows spread across inconsistent modules:
- action -> service -> job -> datastore -> audit.
- Prose-to-code linkage in old repos:
- behavior documented in comments/docs but weakly represented in symbols.
- Policy/business-rule impact mapping where wording varies.
- Keep GLiREL optional and off by default.
- Activate only for hard retrieval scenarios where deterministic recall is insufficient.
- Low recall from
find_symbols+find_text+find_ml_entities. - Large ambiguous candidate sets where structural edges are missing.
- Explicit user request for semantic dependency/impact/workflow mapping.
- Add targeted legacy-style case suites.
- Measure recall lift vs indexing/runtime cost.
- Proceed only if lift is material on scenarios that deterministic retrieval cannot reliably solve.
- Stable pagination + cursors everywhere (Completed)
- Implemented across
tools/list,resources/list, and high-volume query/list tools. - Deterministic ordering + resumable cursor semantics now in place.
- Explainability metadata (Completed)
find_*outputs now include compact provenance (matched_by,why_matched,signals).
- Path aliases / scopes (Completed)
- Scope-aware filtering added in CLI + MCP with default aliases and repo-local config support.
- Tool surface reduction / feature flags (Completed, iterate as needed)
- MCP tool-level disabling is supported and reflected in
tools/list. - Continue consolidation based on real agent usage telemetry.
These are now the first blockers to clear before trusting full enrichment re-evaluation runs.
- Make eval targeted, not full-repo enrichment (Completed)
- Update enrichment eval flow to process only probe-relevant paths or bounded candidate sets.
- Avoid whole-repo enrichment during eval when the metric is probe coverage.
- Keep output format comparable with prior reports.
- Ensure inference-only mode explicitly (Completed)
- Enforce inference-only execution (
eval+ no-grad/inference mode) in the GLiNER2 path. - Add explicit checks/tests so regression to training-mode behavior is caught.
- Report runtime mode in eval metadata.
- Isolate memory per repo (Completed)
- Run enrichment evaluation per repo in isolated subprocesses.
- Ensure model/process memory is released between repos.
- Preserve per-repo metrics and strict pass/fail behavior in aggregated reports.
- Reframe enrichment quality gates around practical lift
- Keep exact phrase coverage as a strict diagnostic metric.
- Treat candidate/path coverage as primary usefulness metric for semantic-span models.
- Add explicit pass criteria per metric (exact, path, semantic, candidate).
- Tune GLiNER2 extraction for phrase-heavy probes
- Add chunk-level heuristics for prose-heavy spans (comments/docstrings/long strings).
- Add model-configurable chunk size/overlap and per-language defaults.
- Evaluate recall/latency/memory tradeoffs with controlled sweeps.
- Publish eval baselines and CI policy split
- Introduce two CI gates:
- deterministic graph/literal strict gate (hard fail)
- enrichment practical-lift gate (soft/hard by threshold)
- Store current real-model baseline values in versioned report snapshots.
- Improve retrieval fusion and ranking
- Add optional fusion strategy: deterministic candidates + ML entity evidence.
- Emit compact ranking signals to explain why candidates were promoted.
- Validate token-efficiency and agent answer completeness in case-study reruns.
- Prepare targeted GLiREL pilot (after enrichment gate stabilization)
- Start with legacy-style suites where deterministic + GLiNER2 still underperform.
- Require clear lift against practical-lift baseline before expanding scope.
- Temporal queries
- "What's changed since snapshot X?"
- Diff snapshots to show added/removed/modified symbols.
- Track file authorship and modification frequency.
- Query: "Who modified this file most recently?"
- Cross-language expansion
- Index configuration files: JSON, YAML, TOML, INI.
- Index build scripts: Make, shell, Dockerfile.
- Framework-specific patterns:
- React components (props, hooks usage)
- FastAPI routes (decorators, dependencies)
- Terraform resources
- Deeper call graphs
- Transitive closure: N hops from a symbol.
- Interface/trait implementations (Go interfaces, Java interfaces, Rust traits).
- Event-driven chains: callbacks, signal handlers, pub/sub.
- Async call flows (Python asyncio, JS promises).