Skip to content

Latest commit

 

History

History
119 lines (89 loc) · 5.16 KB

File metadata and controls

119 lines (89 loc) · 5.16 KB

Roadmap

Direction to Consider: Targeted GLiREL for Legacy Semantic Gaps

Current deterministic baseline + literal index + GLiNER2 enrichment already performs strongly. GLiREL should be considered as an optional, targeted capability, not a default indexing path.

When GLiREL could add real value

  • Implicit cross-file relationships with weak AST linkage:
    • config keys, feature flags, event names, queue topics, metric IDs, permission strings.
  • Legacy aliasing and semantic drift:
    • same concept represented under different historical names.
  • Multi-step operational flows spread across inconsistent modules:
    • action -> service -> job -> datastore -> audit.
  • Prose-to-code linkage in old repos:
    • behavior documented in comments/docs but weakly represented in symbols.
  • Policy/business-rule impact mapping where wording varies.

Product stance

  • Keep GLiREL optional and off by default.
  • Activate only for hard retrieval scenarios where deterministic recall is insufficient.

Suggested activation triggers

  • Low recall from find_symbols + find_text + find_ml_entities.
  • Large ambiguous candidate sets where structural edges are missing.
  • Explicit user request for semantic dependency/impact/workflow mapping.

Evaluation requirement before implementation

  • Add targeted legacy-style case suites.
  • Measure recall lift vs indexing/runtime cost.
  • Proceed only if lift is material on scenarios that deterministic retrieval cannot reliably solve.

Active Priorities (Near-Term)

  1. Stable pagination + cursors everywhere (Completed)
  • Implemented across tools/list, resources/list, and high-volume query/list tools.
  • Deterministic ordering + resumable cursor semantics now in place.
  1. Explainability metadata (Completed)
  • find_* outputs now include compact provenance (matched_by, why_matched, signals).
  1. Path aliases / scopes (Completed)
  • Scope-aware filtering added in CLI + MCP with default aliases and repo-local config support.
  1. Tool surface reduction / feature flags (Completed, iterate as needed)
  • MCP tool-level disabling is supported and reflected in tools/list.
  • Continue consolidation based on real agent usage telemetry.

Immediate Execution Priority: Real GLiNER2 Eval Stability

These are now the first blockers to clear before trusting full enrichment re-evaluation runs.

  1. Make eval targeted, not full-repo enrichment (Completed)
  • Update enrichment eval flow to process only probe-relevant paths or bounded candidate sets.
  • Avoid whole-repo enrichment during eval when the metric is probe coverage.
  • Keep output format comparable with prior reports.
  1. Ensure inference-only mode explicitly (Completed)
  • Enforce inference-only execution (eval + no-grad/inference mode) in the GLiNER2 path.
  • Add explicit checks/tests so regression to training-mode behavior is caught.
  • Report runtime mode in eval metadata.
  1. Isolate memory per repo (Completed)
  • Run enrichment evaluation per repo in isolated subprocesses.
  • Ensure model/process memory is released between repos.
  • Preserve per-repo metrics and strict pass/fail behavior in aggregated reports.

Next Natural Priorities

  1. Reframe enrichment quality gates around practical lift
  • Keep exact phrase coverage as a strict diagnostic metric.
  • Treat candidate/path coverage as primary usefulness metric for semantic-span models.
  • Add explicit pass criteria per metric (exact, path, semantic, candidate).
  1. Tune GLiNER2 extraction for phrase-heavy probes
  • Add chunk-level heuristics for prose-heavy spans (comments/docstrings/long strings).
  • Add model-configurable chunk size/overlap and per-language defaults.
  • Evaluate recall/latency/memory tradeoffs with controlled sweeps.
  1. Publish eval baselines and CI policy split
  • Introduce two CI gates:
    • deterministic graph/literal strict gate (hard fail)
    • enrichment practical-lift gate (soft/hard by threshold)
  • Store current real-model baseline values in versioned report snapshots.
  1. Improve retrieval fusion and ranking
  • Add optional fusion strategy: deterministic candidates + ML entity evidence.
  • Emit compact ranking signals to explain why candidates were promoted.
  • Validate token-efficiency and agent answer completeness in case-study reruns.
  1. Prepare targeted GLiREL pilot (after enrichment gate stabilization)
  • Start with legacy-style suites where deterministic + GLiNER2 still underperform.
  • Require clear lift against practical-lift baseline before expanding scope.

Agent Experience Enhancements

  1. Temporal queries
  • "What's changed since snapshot X?"
  • Diff snapshots to show added/removed/modified symbols.
  • Track file authorship and modification frequency.
  • Query: "Who modified this file most recently?"
  1. Cross-language expansion
  • Index configuration files: JSON, YAML, TOML, INI.
  • Index build scripts: Make, shell, Dockerfile.
  • Framework-specific patterns:
    • React components (props, hooks usage)
    • FastAPI routes (decorators, dependencies)
    • Terraform resources
  1. Deeper call graphs
  • Transitive closure: N hops from a symbol.
  • Interface/trait implementations (Go interfaces, Java interfaces, Rust traits).
  • Event-driven chains: callbacks, signal handlers, pub/sub.
  • Async call flows (Python asyncio, JS promises).