MAP framework hardening: skill routing, conftest PYTHONPATH, pyright gate, spec citations#149
Merged
Conversation
…t_* calls Surface friction from the hogback-gap run: the skill body referenced helpers like `detect_symbol_blast_radius` and `build_json_retry_prompt` without naming a script, leaving callers to guess between map_orchestrator.py and map_step_runner.py (calling the wrong one fails with an `invalid choice` error). Routing isn't by name prefix — `record_subtask_result` lives on the orchestrator while `record_test_baseline` lives on step_runner, etc. - Add a one-line routing rule to Execution Rules in SKILL.md. - Move the full per-command table + `record_*` / `validate_*` disambiguation into efficient-reference.md (keeps SKILL.md under the 500-line compact cap enforced by test_high_traffic_workflow_skills_keep_active_bodies_compact). - Prefix every bare `detect_*`, `build_json_retry_prompt`, `log_agent_failure`, and `detect_cross_subtask_regression_risk` call in the MONITOR/ACTOR gates with `python3 .map/scripts/map_step_runner.py`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worktree-vs-editable-install footgun: running pytest from a .warp/worktrees/<feature>/ directory while `pip install -e .` still points at the main checkout silently imports stale `mapify_cli` code. Same trap hits subprocess.Popen([sys.executable, …]) helpers — fresh interpreters inherit only PYTHONPATH, not the parent's sys.path. - Prepend <this-repo>/src to sys.path so in-process imports prefer the current worktree. - Prepend <this-repo>/src to PYTHONPATH env var so test-spawned subprocesses inherit the same priority. - Idempotent: repeated insertions deduplicated, so re-importing conftest is a no-op. Also silence Pylance's "_restore_cwd is not accessed" hint via an explicit module-namespace sentinel — pytest discovers autouse fixtures by name lookup, which static analyzers can't see. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`make check` previously ran ruff + mypy on src/, but pyright deprecation diagnostics (e.g. @contextmanager + Iterator return type, the bug we just hit on hogback-gap ST-003) sailed past Monitor and only surfaced because the IDE language-server happened to emit them — without that luck, the deprecated code would have shipped. - Makefile lint: append `pyright src/` after mypy. - pyproject [project.optional-dependencies].dev: add `pyright>=1.1.400` so `pip install -e ".[dev,ssl]"` brings the binary in. - .github/workflows/ci.yml linters step: best-effort pyright run with the same `which … || echo skipping` pattern as ruff/mypy, so non- pyright runners stay green while pyright-equipped ones gate hard. Current `src/` is 0 errors / 0 warnings / 0 informations, so no existing diagnostics need fixing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The hogback-gap spec cited src/mapify_cli/__init__.py:96 for MAP_DEBUG
when the symbol had actually moved to :207, and the same spec cited
tests/test_template_sync.py:303-355 for a sync-test class that lived
at :317-356. Both stale citations propagated unchecked into research,
Actor prompts, and the final review — silent drift between the spec
and reality.
- .map/scripts/validate_spec_citations.py: greps every file:line
citation out of a spec, checks the path exists, the line range is in
bounds, and (when a backticked identifier sits within ~80 chars) that
the cited line(s) actually contain the symbol. Returns a JSON verdict
on stdout; exit 1 on any stale-citation / file-missing / line-out-of-
range, exit 2 on invocation error.
- tests/test_validate_spec_citations.py: 10 unit tests covering happy
path, stale identifier, missing file, line out of range, line-range
validation, no-identifier-window fallback, external-path skip,
extension allowlist, path-escapes-repo-root, and nearest-identifier
precedence.
- .claude/skills/map-plan/SKILL.md: insert mandatory Step 2a.5 between
spec writing and the devil's-advocate review so a red validator
blocks decomposition.
- Templates synced (src/mapify_cli/templates/{map/scripts,skills/map-plan/}).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens MAP workflow tooling by clarifying script routing, making tests prefer the current worktree source, adding Pyright to local checks, and introducing a citation validator for /map-plan specs.
Changes:
- Adds spec citation validation script, template copy, mandatory map-plan step, and unit tests.
- Documents map-efficient script routing and updates several command examples to use explicit runner paths.
- Prepends worktree
src/for pytest/import subprocess consistency and adds Pyright to dev linting.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/ci.yml |
Adds best-effort Pyright invocation to CI lint step. |
.map/scripts/validate_spec_citations.py |
Adds source validator for spec file:line citations. |
.claude/skills/map-efficient/SKILL.md |
Adds script routing rule and explicit step-runner command examples. |
.claude/skills/map-efficient/efficient-reference.md |
Adds script routing dispatcher reference. |
.claude/skills/map-plan/SKILL.md |
Adds mandatory spec citation validation step. |
Makefile |
Adds pyright src/ to lint target. |
pyproject.toml |
Adds Pyright to dev dependencies. |
src/mapify_cli/templates/map/scripts/validate_spec_citations.py |
Adds packaged template copy of citation validator. |
src/mapify_cli/templates/skills/map-efficient/SKILL.md |
Syncs map-efficient skill routing updates into templates. |
src/mapify_cli/templates/skills/map-efficient/efficient-reference.md |
Syncs routing reference into templates. |
src/mapify_cli/templates/skills/map-plan/SKILL.md |
Syncs map-plan citation validation step into templates. |
tests/conftest.py |
Ensures pytest and subprocesses import from current worktree src/. |
tests/test_validate_spec_citations.py |
Adds unit coverage for the new citation validator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "reason": f"could not read file: {exc}", | ||
| } | ||
|
|
||
| if line_no < 1 or end_no > len(lines): |
| "reason": f"could not read file: {exc}", | ||
| } | ||
|
|
||
| if line_no < 1 or end_no > len(lines): |
| - `save_*` / `load_*`: `save_research`, `load_research`, `save_artifact_manifest`, `load_artifact_manifest`, `save_learning_metrics`, `load_learning_metrics`, `load_blueprint` | ||
| - `refresh_*`: `refresh_blueprint_affected_files` | ||
| - `validate_*` (non-state): `validate_blueprint_contract`, `validate_mutation_boundary`, `validate_retry_quarantine`, `validate_run_health_report`, `validate_checkpoint` | ||
| - `record_*` (artifacts, not state): `record_test_baseline`, `record_diagnostics_baseline`, `record_scope_baseline`, `record_subtask_baseline`, `record_token_event`, `record_learning_consumption`, `record_repeated_learning_violations`, `record_workflow_fit`, `record_plan_artifacts`, `record_test_contract_handoff`, `record_review_ordering`, `record_token_budget_decision` |
| - `save_*` / `load_*`: `save_research`, `load_research`, `save_artifact_manifest`, `load_artifact_manifest`, `save_learning_metrics`, `load_learning_metrics`, `load_blueprint` | ||
| - `refresh_*`: `refresh_blueprint_affected_files` | ||
| - `validate_*` (non-state): `validate_blueprint_contract`, `validate_mutation_boundary`, `validate_retry_quarantine`, `validate_run_health_report`, `validate_checkpoint` | ||
| - `record_*` (artifacts, not state): `record_test_baseline`, `record_diagnostics_baseline`, `record_scope_baseline`, `record_subtask_baseline`, `record_token_event`, `record_learning_consumption`, `record_repeated_learning_violations`, `record_workflow_fit`, `record_plan_artifacts`, `record_test_contract_handoff`, `record_review_ordering`, `record_token_budget_decision` |
| - **`python3 .map/scripts/map_step_runner.py <cmd>`** — pure analysis/persistence helpers (no state-machine side effect): | ||
| - `detect_*` family: `detect_truncated_agent_output`, `detect_already_done`, `detect_cross_subtask_regression_risk`, `detect_actor_files_changed_mismatch`, `detect_symbol_blast_radius` | ||
| - `build_*` family: `build_context_block`, `build_json_retry_prompt`, `build_acceptance_coverage_report`, `build_prior_stage_consumption_report`, `build_retry_quarantine`, `build_handoff_bundle`, `build_review_handoff`, `build_review_prompts` | ||
| - `save_*` / `load_*`: `save_research`, `load_research`, `save_artifact_manifest`, `load_artifact_manifest`, `save_learning_metrics`, `load_learning_metrics`, `load_blueprint` |
| - **`python3 .map/scripts/map_step_runner.py <cmd>`** — pure analysis/persistence helpers (no state-machine side effect): | ||
| - `detect_*` family: `detect_truncated_agent_output`, `detect_already_done`, `detect_cross_subtask_regression_risk`, `detect_actor_files_changed_mismatch`, `detect_symbol_blast_radius` | ||
| - `build_*` family: `build_context_block`, `build_json_retry_prompt`, `build_acceptance_coverage_report`, `build_prior_stage_consumption_report`, `build_retry_quarantine`, `build_handoff_bundle`, `build_review_handoff`, `build_review_prompts` | ||
| - `save_*` / `load_*`: `save_research`, `load_research`, `save_artifact_manifest`, `load_artifact_manifest`, `save_learning_metrics`, `load_learning_metrics`, `load_blueprint` |
Three findings, all substantive:
1. validate_spec_citations.py line-range bounds check was incomplete:
- reversed ranges (`file.py:20-10`) passed because both bounds were
compared independently against the file length, not each other.
- out-of-bounds start with in-bounds end (`file.py:50-5` on a 10-line
file) passed because only `end_no > len(lines)` was guarded.
Fixed by validating all four conditions independently and reporting
a reversed-range hint in the failure reason. Two new tests cover
both edge cases: test_flags_reversed_range,
test_flags_out_of_bounds_start_with_in_bounds_end.
2. efficient-reference.md routing table listed save_*/load_* functions
that aren't CLI-dispatchable (`save_artifact_manifest`,
`save_learning_metrics`, `load_learning_metrics`, `load_blueprint`)
— they exist in map_step_runner.py as internal helpers but have no
`func_name` dispatch branch, so `python3 map_step_runner.py
save_learning_metrics` fails with "Unknown function". Pruned to the
three actually-dispatchable names: save_research, load_research,
load_artifact_manifest. Added validate_prior_stage_consumption
(was missing). Added an "if you see 'Unknown function', grep
map_step_runner.py for func_name ==" note so the dispatcher
remains the ground truth even if this list drifts.
3. Same problem in the record_* family: `record_repeated_learning_violations`
and `record_token_budget_decision` are internal helpers, not CLI
commands. `record_review_ordering` IS dispatched, but with a HYPHEN
(`record-review-ordering`) not an underscore — pure trap. Removed
the two non-dispatchable names; renamed to `record-review-ordering`
with a callout that this single command uses hyphen syntax.
Also inlined the artifact-writer family list (six write_* commands)
directly into the routing table instead of leaving it as a separate
prose sentence — now every CLI-callable name appears in one place.
make check: ruff + mypy + pyright clean; 1713 passed / 4 skipped / 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four friction fixes surfaced while running
/map-efficienton the hogback-gap PR. Each was a real footgun that either wasted iteration tokens or risked shipping a bug; each commit is independent and atomic.d13a9ad— map-efficient: document script-routing dispatcherThe skill body referenced helpers like
detect_symbol_blast_radiusandbuild_json_retry_promptwithout naming a script. Calling the wrong one fails withinvalid choice— and the routing isn't by name prefix (record_subtask_resultlives on the orchestrator whilerecord_test_baselinelives onstep_runner). Added a one-line routing rule to Execution Rules in SKILL.md (within the 500-line compact cap), a full per-command table toefficient-reference.md, and prefixed every baredetect_*/build_json_retry_prompt/log_agent_failurecall in the MONITOR/ACTOR gates with the correctpython3 .map/scripts/map_step_runner.pyinvocation.481329e— conftest: prepend worktree src to sys.path + PYTHONPATHWorktree-vs-editable-install footgun: running pytest from
.warp/worktrees/<feature>/whilepip install -e .still points at the main checkout silently imports stalemapify_cli. Same trap hitssubprocess.Popen([sys.executable, …])test helpers — fresh interpreters inherit onlyPYTHONPATH, not the parent'ssys.path. Conftest now prepends<this-repo>/srcto both, idempotently. (Also silenced Pylance's_restore_cwd is not accessedhint via an explicit module-namespace sentinel.)044895e— make check: gate on pyright src/ toomake checkpreviously ranruff+mypyonsrc/. Pyright deprecation diagnostics (e.g.@contextmanager+Iterator[T]— exactly the issue hit on hogback-gap ST-003) sailed past Monitor and only surfaced because the IDE language-server happened to emit them. Without that luck, the deprecated code would have shipped. Addedpyright src/to the Makefile lint target,pyright>=1.1.400to the[dev]extra, and a best-effortwhich pyrightline to the CI linters step.src/is already 0/0/0, so no existing diagnostics need fixing.f5847f5— /map-plan: gate spec on file:line citation validatorThe hogback-gap spec cited
src/mapify_cli/__init__.py:96forMAP_DEBUGwhen the symbol had moved to:207, and citedtests/test_template_sync.py:303-355for a class living at:317-356. Both stale citations propagated unchecked into research, Actor prompts, and review. New script.map/scripts/validate_spec_citations.pygreps everyfile:linecitation, checks the path exists and the line range is in bounds, and (when a backticked identifier sits within ~80 chars) verifies the cited line actually contains it. Returns a JSON verdict + exit 1 on stale-citation / file-missing / out-of-range. Wired into/map-planas mandatory Step 2a.5 (before devil's-advocate review). 10 unit tests for the validator.Test plan
make check— ruff clean, mypy clean (36 src files), pyright clean (37 src files, new gate), full pytest 1709 passed / 4 skipped / 0 failed / 12 deselectedpytest tests/test_validate_spec_citations.py -v— 10 passed (covering happy path, stale identifier, missing file, out-of-range line, line-range validation, no-identifier-window fallback, external-path skip, extension allowlist, path-escapes-repo-root, nearest-identifier precedence)pytest tests/test_skills.py— 223 passed (high-traffic skill compactness + run-health closeout ordering still satisfied after Script Routing edit)src/mapify_cli/__init__.py:96andtests/test_template_sync.py:303-355)make sync-templates)🤖 Generated with Claude Code