fix(pr-analyze): area-via-git-diff + skip-uncached-cross-PR#390
Merged
Conversation
…lock) T14 round-2 routing test (PRs #386 / #387 / #388) surfaced two bugs: 1. Comment-only and docs PRs analyzed with area=null even when the only changed path was a clear area match (README.md, parser comment, cli WHY-comment). `ImpactJson::changed_files()` derived from `changed_symbols[].filePath`, but `ecp impact` returns 0 changed_symbols for comment-only diffs (no symbol semantically changed). Empty changed_files → `classify_area(&[]) → None` → no `ecp:area-*` label → wrong Mergify queue. Pull the diff file list from git directly via a new `git_diff_files(baseline, pr_head)` helper that runs `git diff --name-only $baseline..$pr_head`. Now docs PRs route to docs-only queue and parser-comment PRs route to parser-changes. The unused `ImpactJson::changed_files` is removed; the `file_path` field on `ChangedSymbol` stays for future consumers and to keep the deserializer happy. 2. Cross-PR conflict deadlock: when 3 PRs pushed in parallel, each `analyze` run reads the other two's caches before they've been written. Old code reported each missing cache as a conflict (`__pending_analysis__` sentinel), so all 3 PRs ended up with `ecp/cross-pr-conflict=pending` and Mergify queued none. Spec actually said missing-cache should fall through to "let Mergify speculative trial catch real conflicts" — the conservative implementation was inconsistent with the design. Switch the `None => …` arm to `continue` so uncached siblings are skipped, not blocked. Real symbol overlap on cached siblings still reports normally. Updates `cross_pr_conflict_missing_cache_is_conservative` test (renamed `_skipped`) to assert the new behaviour, and tightens `cross_pr_conflict_excludes_self` to use cached impact (so it actually exercises the overlap path rather than the now-skipped missing-cache path).
Contributor
|
["run","cross_pr_conflict_disjoint","cross_pr_conflict_overlap","cross_pr_conflict_missing_cache_skipped","cross_pr_conflict_excludes_self"] |
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
Replaces previous test commit that conflicted with #390 refactor of __pending_analysis__ region. Same purpose: trigger area-cli routing.
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
PR #390 changed `run()` to use `git_diff_files()` for area classification (comment-only / whitespace-only diff coverage), which left `ImpactJson::changed_files` without an in-crate caller. The struct-level `#[allow(dead_code)]` doesn't cascade to methods, so clippy `-D warnings` in the pre-push hook flags this method. The method itself is intentionally retained as a library-API derived view alongside `impact_set_names` / `changed_symbol_names` — see the doc comment for the rationale. Add a method-local `#[allow(dead_code)]` so the deferred-API status is encoded in the source rather than relying on the struct attribute (which a reader could miss).
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
Replaces previous test commit that conflicted with #390 refactor of __pending_analysis__ region. Same purpose: trigger area-cli routing.
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
PR #390 changed `run()` to use `git_diff_files()` for area classification (comment-only / whitespace-only diff coverage), which left `ImpactJson::changed_files` without an in-crate caller. The struct-level `#[allow(dead_code)]` doesn't cascade to methods, so clippy `-D warnings` in the pre-push hook flags this method. The method itself is intentionally retained as a library-API derived view alongside `impact_set_names` / `changed_symbol_names` — see the doc comment for the rationale. Add a method-local `#[allow(dead_code)]` so the deferred-API status is encoded in the source rather than relying on the struct attribute (which a reader could miss).
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
…de (6 atomic commits) (#385) * test(cypher): pin executor row-width invariant (FU-2026-05-23-040) The CLI's `cypher::build_payload` collapses single-column rows to flat scalars and falls back to `Value::Null` when a row is empty. The fallback is intentionally tolerant — but if the executor ever violates `row.len() == columns.len()` (the implicit projection contract), the silent null would mimic a legitimate OPTIONAL MATCH miss. Pin the invariant on the executor side so future projection refactors can't silently regress: - `execute_inner` carries a `debug_assert!` before returning that every emitted row has `row.len() == columns.len()`. Catches violations in dev / test builds. - `row_width_matches_columns_for_all_projection_shapes` covers the three representative shapes — single-column, multi-column, and aggregation (COUNT). All 46 existing executor tests continue to pass since the invariant was already true; the assertion now codifies it. * refactor(pr-analyze): 3 minor cleanups (FU-2026-05-23-043) (1) Eliminate `changed_files_pb: Vec<PathBuf>` redundant state — was only consumed by `classify_area`. Generalised `classify_area` to `<P: AsRef<Path>>(&[P])` so callers can pass `&[String]` directly. (2) Make the non-JSON output format path an explicit error instead of a catch-all stderr fallback. `pr-analyze` output is consumed by Mergify CI hooks which require JSON; silently routing `--format llm/toon/text` to a one-line summary misleads anyone testing the flag and risks hiding bugs in the JSON serialisation path. (3) `ImpactJson::changed_files` / `impact_set_names` now hold the dedup `HashSet<&str>` against the borrowed source field instead of cloning into the set on every insert. The final `.map(...).clone()` only fires for items that survived the dedup filter, so the total clone count drops from `2 × n` to `1 × (unique_count)`. Tests: 14 pr_analyze unit tests + integration tests pass. classify_area signature change is callable from `[PathBuf]` / `[String]` alike via `AsRef<Path>` blanket impl. * refactor(shape-check): widen render_text to consume typed counters (FU-2026-05-23-038) `render_text` was reaching back into the just-serialised JSON value to read `unparseable_fetches` / `unknown_target_shapes_total` / `unknown_target_shapes_truncated` — a leaky abstraction that round-trips typed `u64` / `bool` through `serde_json::Value::as_u64()` / `as_bool()` after `build_payload_with_hints` already held the same values in proper typed form. Extend `ShapeCheckHints` with the three counter fields and pass `&hints` into `render_text` alongside the JSON value. The JSON payload still carries the same keys for the JSON/Toon output paths (unchanged contract); text rendering now bypasses the JSON detour for fields it already knew about. Tests: shape_check_cmd / _ab / _route_filter / _build_payload — all 10 passes preserved. * refactor(tests): hoist write + init_and_analyze into common module (FU-2026-05-23-037) 18 integration-test fixtures redefined the same three helpers verbatim (`ecp_bin`, `write`, `init_and_analyze`) on every file. `tests/common/mod.rs` already housed `ecp_bin` + `write_graph` + `run_git`; promote the remaining two helpers there and switch all 18 fixtures to consume from `common::*`. Two variants existed for `init_and_analyze`: - canonical (blank lines between git arg groups) — 13 files - compact (no blank lines) — 5 of the inspect_*_qualified_calls fixtures Both produced identical behaviour (single-commit main branch + `ecp admin index --repo .` with HOME=tempdir). The canonical form is what now lives in common; per-file copies removed. Files migrated: - 6 silent_drop_*.rs (silent-drop audit fixtures from PR #378) - 5 inspect_*_qualified_calls.rs - inspect_cmd / inspect_fqn / inspect_enum_variants_and_decorators - class_membership_inspect / context_cmd / impact_fqn / imports_edge_inspect All 18 fixtures (64 sub-tests across them) pass. Per-file `setup_repo*` helpers that build different git state (branches, remotes, multi-commit) remain unchanged — common's doc comment already explained when to keep them per-file. * feat(path-literal): sink-override for ext-change callees (FU-2026-05-23-024) `File::new(...).with_extension("json")` / `path.with_file_name("session.toml")` calls produce `string_literal` nodes whose values (`"json"`, `"toml"`) are too short / too plain to pass `is_path_shaped`. They were silently dropped from the path-literal index, so `ecp impact --literal config.json` (or any value-grouped query) saw an incomplete sink set. The classifier always had `with_extension` / `with_file_name` mapped to `(ExtChange, High)`, but the gate was upstream of it. Resolution: - New helper `is_ext_change_callee(callee: Option<&str>) -> bool` recognises HIGH-confidence ext-change names (`with_extension`, `with_file_name`, `set_extension`, `set_file_name`) using the same `trailing_ident` trim as `classify_sink`. - Per-lang extractors (14 languages) reordered: pull `callee` BEFORE the `is_path_shaped` check, then accept the literal when either the shape check passes OR the callee is an ext-change op. - Conservative: bare extension strings appearing in unknown callees (`format_picker("json")`) still get filtered by `is_path_shaped` so option/error tokens don't pollute the index. Tests: - New `is_ext_change_callee_recognises_canonical_names` + `classify_sink_extchange_high_for_ext_change_names` unit tests. - All 14 per-language `*_path_literals.rs` integration tests pass unchanged (46 sub-tests total) — no regression on the existing is-path-shaped path. * chore(tests): drop unused Path import in 6 silent_drop_* fixtures Leftover from the FU-037 fixture-helpers migration — the fixtures previously took `Path` as a parameter for the inline `init_and_analyze` and `write` helpers. Now that both live in `common::*`, the per-file import is dead. clippy flagged it; remove. * feat(path-literal): Kotlin chained-call sink promotion (FU-2026-05-23-023, kotlin only) `File("session_meta.json").readText()` / `.writeText(data)` chains used to classify as `sink:join|medium` because the immediate parent of the string literal is the constructor (`File`), not the read/write terminal. The LLM consumer couldn't tell read-only sinks from write-only ones — exactly the split-brain detection use case PathLiteral is built to support. Resolution (Kotlin only — Swift / Python / Rust deferred to follow-ups): - `enclosing_callee` now walks outward through `value_arguments → call_suffix → call_expression`. When the inner call is wrapped in a `navigation_expression` that's the receiver of an outer call_expression, the outer call's method name is treated as the terminal sink if it's in the high-confidence read/write whitelist (`readText` / `readBytes` / `readLines` / `writeText` / `writeBytes` / `appendText` / `appendBytes`). - `classify_sink` HIGH writes list extended with `writeText`, `writeBytes`, `appendText`, `appendBytes` to make the promotion actually classify. `readText` / `read_text` were already present. Fixture updates (`kotlin_path_literals.rs`): - `free_function_with_read_sink_classified_as_read_high` now asserts `sink:read|confidence:high` (was P2-deferred). - `method_in_class_with_write_sink_resolves_owner` asserts `sink:write|confidence:high`. Per-language Swift / Python / Rust / C++ AST shapes differ; each deserves a separate commit. Tracked under FU-2026-05-23-023. * feat(path-literal): Python chained-call sink promotion (FU-2026-05-23-023, python) Mirrors the Kotlin chain promotion from commit d052333 for Python's `pathlib.Path` idiom: ```python Path("config.json").read_text() # was: sink:join|medium Path("output.json").write_text(data) # was: sink:join|medium ``` After promotion both classify as `sink:{read,write}|confidence:high`. Python AST chain shape (from spike): ``` call (outer — read_text()) attribute (Path("x.json").read_text) call (inner — Path(...)) ← immediate parent of argument_list identifier "Path" argument_list string "x.json" ← str_node "." identifier "read_text" ← last identifier child of attribute argument_list () ``` Differs from Kotlin: `navigation_expression` → `attribute`; no `navigation_suffix` wrapper; method name is the last identifier child of the attribute node. classify_sink HIGH list extended with the snake_case pathlib equivalents of the Kotlin camelCase entries (added in commit d052333): - HIGH reads: `read_bytes` (already had `read_text`) - HIGH writes: `write_text`, `write_bytes` Tests: `pathlib_chain_read_text_classified_high` + `pathlib_chain_write_text_classified_high`. Full ecp-analyzer suite passes (no regression). * feat(path-literal): Swift labelled-arg sink promotion (FU-2026-05-23-023, swift) Swift's path-I/O idiom routes through *argument labels*, not chained method names like Kotlin / Python. Two shapes covered: ```swift let s = try String(contentsOfFile: "config.json") // labelled-arg constructor try "x".write(toFile: "output.json", atomically: ..., encoding: ...) // labelled-arg method ``` In both, the literal sits inside a `value_argument` whose label (`contentsOfFile`, `toFile`) carries the file-op semantics, not the function callee (`String` / `write`). Pre-existing latent bug: `child_by_field_name("function")` returned None unconditionally because the Swift grammar has no named `function` field, so every Swift path literal was emitting `sink:free|confidence:high` regardless of context. Resolution: - `enclosing_callee` walks up via `child(0)` (which actually works) and tracks the nearest enclosing `value_argument`. When the argument label is in the HIGH-confidence whitelist (`contentsOfFile` / `contentsOf` / `toFile`), prefer it over the function callee text. Applies whether the function is a bare `simple_identifier` (constructor) or a `navigation_expression` (method chain). - `is_high_confidence_label` whitelist matches `classify_sink`'s new Swift entries below; spurious match on non-file labels avoided. classify_sink HIGH list extended: - HIGH reads: `contentsOfFile`, `contentsOf` (String / NSString / Data initialisers) - HIGH writes: `toFile` (String / Data `write(toFile:...)`) Tests: - `function_with_read_sink` — assertion tightened from no-sink-check to `sink:read|confidence:high`. - `labelled_arg_constructor_contentsoffile` (new). - `navigation_chain_write_tofile` (new). Full ecp-analyzer suite continues passing (no regression). * feat(path-literal): Rust chained-call sink promotion (FU-2026-05-23-023, rust) Closes the 4th and final lang for FU-023. Mirrors the Kotlin / Python chain promotion pattern for Rust's `File::open + chain` idiom: ```rust File::open("session_meta.json").unwrap().read_to_string(&mut s).unwrap(); File::create("output.json").unwrap().write_all(data).unwrap(); ``` Both used to surface as `sink:open-read|medium` / `sink:open-write|high` (stuck on the constructor leg); now promote to `sink:read|high` and `sink:write|high` respectively. Rust AST chain shape (from spike): ``` string_literal ↑ arguments ↑ call_expression (inner: File::open(...)) ↑ field_expression (.unwrap) ← adapter step ↑ call_expression (.unwrap()) ↑ field_expression (.read_to_string) ← terminal step ↑ call_expression (.read_to_string(...)) ← TERMINAL ↑ ... (further .unwrap()/.expect() ignored) ``` Differs from Kotlin / Python — Rust uses `field_expression` (with `field_identifier` as the method name) instead of Kotlin's `navigation_expression` / Python's `attribute`. `.unwrap()` / `.expect()` / `.ok()` etc. are real chain links rather than implicit in the language; the walk steps THROUGH them via the new `is_transparent_adapter` helper rather than terminating. Whitelist: `read_to_string`, `read_to_end`, `read_text`, `write_all`, `write_text`, plus the four `with_*_extension` / `set_*_file_name` ext-change names already used by FU-024. No new entries needed in `classify_sink` — Rust idioms were already covered there. Tests: `file_open_chain_with_unwrap_promotes_to_read_high` + `file_create_chain_write_all_promotes_to_write_high`. Full ecp-analyzer suite passes (12 / 12 in rust_path_literals; no regression). * feat(path-literal): C++ chained-call sink promotion (FU-2026-05-23-023, c++) C++ chain idiom turns out structurally identical to Rust: `call_expression > field_expression > call_expression` with a `field_identifier` carrying the method name. Earlier judgement that "constructor temporaries can't be chained" was wrong — C++11+ permits rvalue method calls, and `std::filesystem::path(...).replace_extension(...)` is a real C++17 idiom (not just a theoretical chain). ```cpp std::filesystem::path("config.json").replace_extension("toml") // ^ "config.json" ^ "toml" // inner ctor terminal ext-change ``` Without promotion, "config.json" classified as `sink:join|medium` (constructor) and "toml" was dropped entirely (too short for `is_path_shaped`). After promotion both reach `sink:ext-change|high`: - chain walk promotes "config.json" to `replace_extension` - FU-024 `is_ext_change_callee` sink-override accepts the short "toml" because the call is a known ext-change op `classify_sink` ExtChange list + `is_ext_change_callee` whitelist extended: - `replace_extension` (C++17 `std::filesystem::path::replace_extension`) - `replace_filename` (mirror — `replace_filename` is the path-component equivalent of `set_file_name`) Stream `ifstream/ofstream` `.read()` / `.write()` intentionally not promoted — they're MEDIUM-confidence by design (overloaded with non-file stream IO). Tests: `filesystem_path_chain_replace_extension_promotes_to_ext_change_high` asserts both the primary-path-shaped literal AND the short ext literal surface as `sink:ext-change|confidence:high`. 14-lang path_literals fixtures still pass — `replace_extension` was previously absent from the classifier, so no existing test exercised the (now-classified) name. * chore(pr-analyze): allow dead_code on ImpactJson::changed_files PR #390 changed `run()` to use `git_diff_files()` for area classification (comment-only / whitespace-only diff coverage), which left `ImpactJson::changed_files` without an in-crate caller. The struct-level `#[allow(dead_code)]` doesn't cascade to methods, so clippy `-D warnings` in the pre-push hook flags this method. The method itself is intentionally retained as a library-API derived view alongside `impact_set_names` / `changed_symbol_names` — see the doc comment for the rationale. Add a method-local `#[allow(dead_code)]` so the deferred-API status is encoded in the source rather than relying on the struct attribute (which a reader could miss).
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
Replaces previous test commit that conflicted with #390 refactor of __pending_analysis__ region. Same purpose: trigger area-cli routing.
coseto6125
added a commit
that referenced
this pull request
May 23, 2026
Replaces previous test commit that conflicted with #390 refactor of __pending_analysis__ region. Same purpose: trigger area-cli routing.
coseto6125
added a commit
that referenced
this pull request
May 25, 2026
Replaces previous test commit that conflicted with #390 refactor of __pending_analysis__ region. Same purpose: trigger area-cli routing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Unblocks T14 round-2 routing test (PRs #386/#387/#388). See commit body for both bug root-causes.