Skip to content

fix(pr-analyze): area-via-git-diff + skip-uncached-cross-PR#390

Merged
coseto6125 merged 1 commit into
mainfrom
fix/pr-analyze-area-race
May 23, 2026
Merged

fix(pr-analyze): area-via-git-diff + skip-uncached-cross-PR#390
coseto6125 merged 1 commit into
mainfrom
fix/pr-analyze-area-race

Conversation

@coseto6125
Copy link
Copy Markdown
Owner

Unblocks T14 round-2 routing test (PRs #386/#387/#388). See commit body for both bug root-causes.

…lock)

T14 round-2 routing test (PRs #386 / #387 / #388) surfaced two bugs:

1. Comment-only and docs PRs analyzed with area=null even when the
   only changed path was a clear area match (README.md, parser
   comment, cli WHY-comment).  `ImpactJson::changed_files()` derived
   from `changed_symbols[].filePath`, but `ecp impact` returns 0
   changed_symbols for comment-only diffs (no symbol semantically
   changed).  Empty changed_files → `classify_area(&[]) → None` →
   no `ecp:area-*` label → wrong Mergify queue.

   Pull the diff file list from git directly via a new
   `git_diff_files(baseline, pr_head)` helper that runs
   `git diff --name-only $baseline..$pr_head`.  Now docs PRs route
   to docs-only queue and parser-comment PRs route to parser-changes.

   The unused `ImpactJson::changed_files` is removed; the `file_path`
   field on `ChangedSymbol` stays for future consumers and to keep
   the deserializer happy.

2. Cross-PR conflict deadlock: when 3 PRs pushed in parallel, each
   `analyze` run reads the other two's caches before they've been
   written.  Old code reported each missing cache as a conflict
   (`__pending_analysis__` sentinel), so all 3 PRs ended up with
   `ecp/cross-pr-conflict=pending` and Mergify queued none.

   Spec actually said missing-cache should fall through to "let
   Mergify speculative trial catch real conflicts" — the conservative
   implementation was inconsistent with the design. Switch the
   `None => …` arm to `continue` so uncached siblings are skipped,
   not blocked. Real symbol overlap on cached siblings still reports
   normally.

   Updates `cross_pr_conflict_missing_cache_is_conservative` test
   (renamed `_skipped`) to assert the new behaviour, and tightens
   `cross_pr_conflict_excludes_self` to use cached impact (so it
   actually exercises the overlap path rather than the now-skipped
   missing-cache path).
@coseto6125 coseto6125 added the ci:run fire CI test matrix on this PR label May 23, 2026
@coseto6125 coseto6125 enabled auto-merge (squash) May 23, 2026 11:37
@github-actions
Copy link
Copy Markdown
Contributor

["run","cross_pr_conflict_disjoint","cross_pr_conflict_overlap","cross_pr_conflict_missing_cache_skipped","cross_pr_conflict_excludes_self"]

@github-actions github-actions Bot added ecp:area-cli ecp signal ecp:risk-low ecp signal labels May 23, 2026
@coseto6125 coseto6125 merged commit c75f3e4 into main May 23, 2026
15 of 22 checks passed
@coseto6125 coseto6125 deleted the fix/pr-analyze-area-race branch May 23, 2026 11:55
coseto6125 added a commit that referenced this pull request May 23, 2026
Replaces previous test commit that conflicted with #390 refactor of
__pending_analysis__ region. Same purpose: trigger area-cli routing.
coseto6125 added a commit that referenced this pull request May 23, 2026
PR #390 changed `run()` to use `git_diff_files()` for area classification
(comment-only / whitespace-only diff coverage), which left
`ImpactJson::changed_files` without an in-crate caller. The struct-level
`#[allow(dead_code)]` doesn't cascade to methods, so clippy `-D warnings`
in the pre-push hook flags this method.

The method itself is intentionally retained as a library-API derived
view alongside `impact_set_names` / `changed_symbol_names` — see the doc
comment for the rationale. Add a method-local `#[allow(dead_code)]` so
the deferred-API status is encoded in the source rather than relying on
the struct attribute (which a reader could miss).
coseto6125 added a commit that referenced this pull request May 23, 2026
Replaces previous test commit that conflicted with #390 refactor of
__pending_analysis__ region. Same purpose: trigger area-cli routing.
coseto6125 added a commit that referenced this pull request May 23, 2026
PR #390 changed `run()` to use `git_diff_files()` for area classification
(comment-only / whitespace-only diff coverage), which left
`ImpactJson::changed_files` without an in-crate caller. The struct-level
`#[allow(dead_code)]` doesn't cascade to methods, so clippy `-D warnings`
in the pre-push hook flags this method.

The method itself is intentionally retained as a library-API derived
view alongside `impact_set_names` / `changed_symbol_names` — see the doc
comment for the rationale. Add a method-local `#[allow(dead_code)]` so
the deferred-API status is encoded in the source rather than relying on
the struct attribute (which a reader could miss).
coseto6125 added a commit that referenced this pull request May 23, 2026
…de (6 atomic commits) (#385)

* test(cypher): pin executor row-width invariant (FU-2026-05-23-040)

The CLI's `cypher::build_payload` collapses single-column rows to flat
scalars and falls back to `Value::Null` when a row is empty. The
fallback is intentionally tolerant — but if the executor ever violates
`row.len() == columns.len()` (the implicit projection contract), the
silent null would mimic a legitimate OPTIONAL MATCH miss.

Pin the invariant on the executor side so future projection refactors
can't silently regress:

- `execute_inner` carries a `debug_assert!` before returning that every
  emitted row has `row.len() == columns.len()`. Catches violations in
  dev / test builds.
- `row_width_matches_columns_for_all_projection_shapes` covers the three
  representative shapes — single-column, multi-column, and aggregation
  (COUNT). All 46 existing executor tests continue to pass since the
  invariant was already true; the assertion now codifies it.

* refactor(pr-analyze): 3 minor cleanups (FU-2026-05-23-043)

(1) Eliminate `changed_files_pb: Vec<PathBuf>` redundant state — was
    only consumed by `classify_area`. Generalised `classify_area` to
    `<P: AsRef<Path>>(&[P])` so callers can pass `&[String]` directly.
(2) Make the non-JSON output format path an explicit error instead of
    a catch-all stderr fallback. `pr-analyze` output is consumed by
    Mergify CI hooks which require JSON; silently routing `--format
    llm/toon/text` to a one-line summary misleads anyone testing the
    flag and risks hiding bugs in the JSON serialisation path.
(3) `ImpactJson::changed_files` / `impact_set_names` now hold the
    dedup `HashSet<&str>` against the borrowed source field instead of
    cloning into the set on every insert. The final `.map(...).clone()`
    only fires for items that survived the dedup filter, so the total
    clone count drops from `2 × n` to `1 × (unique_count)`.

Tests: 14 pr_analyze unit tests + integration tests pass. classify_area
signature change is callable from `[PathBuf]` / `[String]` alike via
`AsRef<Path>` blanket impl.

* refactor(shape-check): widen render_text to consume typed counters (FU-2026-05-23-038)

`render_text` was reaching back into the just-serialised JSON value to
read `unparseable_fetches` / `unknown_target_shapes_total` /
`unknown_target_shapes_truncated` — a leaky abstraction that round-trips
typed `u64` / `bool` through `serde_json::Value::as_u64()` / `as_bool()`
after `build_payload_with_hints` already held the same values in proper
typed form.

Extend `ShapeCheckHints` with the three counter fields and pass `&hints`
into `render_text` alongside the JSON value. The JSON payload still
carries the same keys for the JSON/Toon output paths (unchanged
contract); text rendering now bypasses the JSON detour for fields it
already knew about.

Tests: shape_check_cmd / _ab / _route_filter / _build_payload — all 10
passes preserved.

* refactor(tests): hoist write + init_and_analyze into common module (FU-2026-05-23-037)

18 integration-test fixtures redefined the same three helpers verbatim
(`ecp_bin`, `write`, `init_and_analyze`) on every file. `tests/common/mod.rs`
already housed `ecp_bin` + `write_graph` + `run_git`; promote the
remaining two helpers there and switch all 18 fixtures to consume from
`common::*`.

Two variants existed for `init_and_analyze`:
- canonical (blank lines between git arg groups) — 13 files
- compact (no blank lines) — 5 of the inspect_*_qualified_calls fixtures

Both produced identical behaviour (single-commit main branch + `ecp admin
index --repo .` with HOME=tempdir). The canonical form is what now
lives in common; per-file copies removed.

Files migrated:
- 6 silent_drop_*.rs (silent-drop audit fixtures from PR #378)
- 5 inspect_*_qualified_calls.rs
- inspect_cmd / inspect_fqn / inspect_enum_variants_and_decorators
- class_membership_inspect / context_cmd / impact_fqn / imports_edge_inspect

All 18 fixtures (64 sub-tests across them) pass. Per-file `setup_repo*`
helpers that build different git state (branches, remotes, multi-commit)
remain unchanged — common's doc comment already explained when to keep
them per-file.

* feat(path-literal): sink-override for ext-change callees (FU-2026-05-23-024)

`File::new(...).with_extension("json")` / `path.with_file_name("session.toml")`
calls produce `string_literal` nodes whose values (`"json"`, `"toml"`)
are too short / too plain to pass `is_path_shaped`. They were silently
dropped from the path-literal index, so `ecp impact --literal config.json`
(or any value-grouped query) saw an incomplete sink set.

The classifier always had `with_extension` / `with_file_name` mapped to
`(ExtChange, High)`, but the gate was upstream of it.

Resolution:
- New helper `is_ext_change_callee(callee: Option<&str>) -> bool`
  recognises HIGH-confidence ext-change names (`with_extension`,
  `with_file_name`, `set_extension`, `set_file_name`) using the same
  `trailing_ident` trim as `classify_sink`.
- Per-lang extractors (14 languages) reordered: pull `callee` BEFORE
  the `is_path_shaped` check, then accept the literal when either the
  shape check passes OR the callee is an ext-change op.
- Conservative: bare extension strings appearing in unknown callees
  (`format_picker("json")`) still get filtered by `is_path_shaped` so
  option/error tokens don't pollute the index.

Tests:
- New `is_ext_change_callee_recognises_canonical_names` +
  `classify_sink_extchange_high_for_ext_change_names` unit tests.
- All 14 per-language `*_path_literals.rs` integration tests pass
  unchanged (46 sub-tests total) — no regression on the existing
  is-path-shaped path.

* chore(tests): drop unused Path import in 6 silent_drop_* fixtures

Leftover from the FU-037 fixture-helpers migration — the fixtures
previously took `Path` as a parameter for the inline `init_and_analyze`
and `write` helpers. Now that both live in `common::*`, the per-file
import is dead. clippy flagged it; remove.

* feat(path-literal): Kotlin chained-call sink promotion (FU-2026-05-23-023, kotlin only)

`File("session_meta.json").readText()` / `.writeText(data)` chains
used to classify as `sink:join|medium` because the immediate parent of
the string literal is the constructor (`File`), not the read/write
terminal. The LLM consumer couldn't tell read-only sinks from write-only
ones — exactly the split-brain detection use case PathLiteral is built
to support.

Resolution (Kotlin only — Swift / Python / Rust deferred to follow-ups):
- `enclosing_callee` now walks outward through `value_arguments →
  call_suffix → call_expression`. When the inner call is wrapped in a
  `navigation_expression` that's the receiver of an outer call_expression,
  the outer call's method name is treated as the terminal sink if it's
  in the high-confidence read/write whitelist (`readText` / `readBytes`
  / `readLines` / `writeText` / `writeBytes` / `appendText` / `appendBytes`).
- `classify_sink` HIGH writes list extended with `writeText`, `writeBytes`,
  `appendText`, `appendBytes` to make the promotion actually classify.
  `readText` / `read_text` were already present.

Fixture updates (`kotlin_path_literals.rs`):
- `free_function_with_read_sink_classified_as_read_high` now asserts
  `sink:read|confidence:high` (was P2-deferred).
- `method_in_class_with_write_sink_resolves_owner` asserts
  `sink:write|confidence:high`.

Per-language Swift / Python / Rust / C++ AST shapes differ; each
deserves a separate commit. Tracked under FU-2026-05-23-023.

* feat(path-literal): Python chained-call sink promotion (FU-2026-05-23-023, python)

Mirrors the Kotlin chain promotion from commit d052333 for Python's
`pathlib.Path` idiom:

```python
Path("config.json").read_text()       # was: sink:join|medium
Path("output.json").write_text(data)  # was: sink:join|medium
```

After promotion both classify as `sink:{read,write}|confidence:high`.

Python AST chain shape (from spike):
```
call (outer — read_text())
  attribute (Path("x.json").read_text)
    call (inner — Path(...))         ← immediate parent of argument_list
      identifier "Path"
      argument_list
        string "x.json"               ← str_node
    "."
    identifier "read_text"             ← last identifier child of attribute
  argument_list ()
```

Differs from Kotlin: `navigation_expression` → `attribute`; no
`navigation_suffix` wrapper; method name is the last identifier child
of the attribute node.

classify_sink HIGH list extended with the snake_case pathlib equivalents
of the Kotlin camelCase entries (added in commit d052333):
- HIGH reads: `read_bytes` (already had `read_text`)
- HIGH writes: `write_text`, `write_bytes`

Tests: `pathlib_chain_read_text_classified_high` +
`pathlib_chain_write_text_classified_high`. Full ecp-analyzer suite
passes (no regression).

* feat(path-literal): Swift labelled-arg sink promotion (FU-2026-05-23-023, swift)

Swift's path-I/O idiom routes through *argument labels*, not chained
method names like Kotlin / Python. Two shapes covered:

```swift
let s = try String(contentsOfFile: "config.json")      // labelled-arg constructor
try "x".write(toFile: "output.json", atomically: ..., encoding: ...)  // labelled-arg method
```

In both, the literal sits inside a `value_argument` whose label
(`contentsOfFile`, `toFile`) carries the file-op semantics, not the
function callee (`String` / `write`). Pre-existing latent bug:
`child_by_field_name("function")` returned None unconditionally because
the Swift grammar has no named `function` field, so every Swift path
literal was emitting `sink:free|confidence:high` regardless of context.

Resolution:
- `enclosing_callee` walks up via `child(0)` (which actually works) and
  tracks the nearest enclosing `value_argument`. When the argument label
  is in the HIGH-confidence whitelist (`contentsOfFile` / `contentsOf` /
  `toFile`), prefer it over the function callee text. Applies whether the
  function is a bare `simple_identifier` (constructor) or a
  `navigation_expression` (method chain).
- `is_high_confidence_label` whitelist matches `classify_sink`'s new
  Swift entries below; spurious match on non-file labels avoided.

classify_sink HIGH list extended:
- HIGH reads: `contentsOfFile`, `contentsOf` (String / NSString /
  Data initialisers)
- HIGH writes: `toFile` (String / Data `write(toFile:...)`)

Tests:
- `function_with_read_sink` — assertion tightened from no-sink-check to
  `sink:read|confidence:high`.
- `labelled_arg_constructor_contentsoffile` (new).
- `navigation_chain_write_tofile` (new).

Full ecp-analyzer suite continues passing (no regression).

* feat(path-literal): Rust chained-call sink promotion (FU-2026-05-23-023, rust)

Closes the 4th and final lang for FU-023. Mirrors the Kotlin / Python
chain promotion pattern for Rust's `File::open + chain` idiom:

```rust
File::open("session_meta.json").unwrap().read_to_string(&mut s).unwrap();
File::create("output.json").unwrap().write_all(data).unwrap();
```

Both used to surface as `sink:open-read|medium` / `sink:open-write|high`
(stuck on the constructor leg); now promote to `sink:read|high` and
`sink:write|high` respectively.

Rust AST chain shape (from spike):
```
string_literal
  ↑ arguments
  ↑ call_expression (inner: File::open(...))
  ↑ field_expression (.unwrap)              ← adapter step
  ↑ call_expression (.unwrap())
  ↑ field_expression (.read_to_string)      ← terminal step
  ↑ call_expression (.read_to_string(...))  ← TERMINAL
  ↑ ... (further .unwrap()/.expect() ignored)
```

Differs from Kotlin / Python — Rust uses `field_expression` (with
`field_identifier` as the method name) instead of Kotlin's
`navigation_expression` / Python's `attribute`. `.unwrap()` /
`.expect()` / `.ok()` etc. are real chain links rather than implicit
in the language; the walk steps THROUGH them via the new
`is_transparent_adapter` helper rather than terminating.

Whitelist: `read_to_string`, `read_to_end`, `read_text`, `write_all`,
`write_text`, plus the four `with_*_extension` / `set_*_file_name`
ext-change names already used by FU-024. No new entries needed in
`classify_sink` — Rust idioms were already covered there.

Tests: `file_open_chain_with_unwrap_promotes_to_read_high` +
`file_create_chain_write_all_promotes_to_write_high`. Full ecp-analyzer
suite passes (12 / 12 in rust_path_literals; no regression).

* feat(path-literal): C++ chained-call sink promotion (FU-2026-05-23-023, c++)

C++ chain idiom turns out structurally identical to Rust:
`call_expression > field_expression > call_expression` with a
`field_identifier` carrying the method name. Earlier judgement that
"constructor temporaries can't be chained" was wrong — C++11+ permits
rvalue method calls, and `std::filesystem::path(...).replace_extension(...)`
is a real C++17 idiom (not just a theoretical chain).

```cpp
std::filesystem::path("config.json").replace_extension("toml")
//                    ^ "config.json"                ^ "toml"
//                    inner ctor                      terminal ext-change
```

Without promotion, "config.json" classified as `sink:join|medium`
(constructor) and "toml" was dropped entirely (too short for
`is_path_shaped`). After promotion both reach `sink:ext-change|high`:
- chain walk promotes "config.json" to `replace_extension`
- FU-024 `is_ext_change_callee` sink-override accepts the short "toml"
  because the call is a known ext-change op

`classify_sink` ExtChange list + `is_ext_change_callee` whitelist
extended:
- `replace_extension` (C++17 `std::filesystem::path::replace_extension`)
- `replace_filename` (mirror — `replace_filename` is the path-component
  equivalent of `set_file_name`)

Stream `ifstream/ofstream` `.read()` / `.write()` intentionally not
promoted — they're MEDIUM-confidence by design (overloaded with
non-file stream IO).

Tests: `filesystem_path_chain_replace_extension_promotes_to_ext_change_high`
asserts both the primary-path-shaped literal AND the short ext literal
surface as `sink:ext-change|confidence:high`. 14-lang path_literals
fixtures still pass — `replace_extension` was previously absent from the
classifier, so no existing test exercised the (now-classified) name.

* chore(pr-analyze): allow dead_code on ImpactJson::changed_files

PR #390 changed `run()` to use `git_diff_files()` for area classification
(comment-only / whitespace-only diff coverage), which left
`ImpactJson::changed_files` without an in-crate caller. The struct-level
`#[allow(dead_code)]` doesn't cascade to methods, so clippy `-D warnings`
in the pre-push hook flags this method.

The method itself is intentionally retained as a library-API derived
view alongside `impact_set_names` / `changed_symbol_names` — see the doc
comment for the rationale. Add a method-local `#[allow(dead_code)]` so
the deferred-API status is encoded in the source rather than relying on
the struct attribute (which a reader could miss).
coseto6125 added a commit that referenced this pull request May 23, 2026
Replaces previous test commit that conflicted with #390 refactor of
__pending_analysis__ region. Same purpose: trigger area-cli routing.
coseto6125 added a commit that referenced this pull request May 23, 2026
Replaces previous test commit that conflicted with #390 refactor of
__pending_analysis__ region. Same purpose: trigger area-cli routing.
coseto6125 added a commit that referenced this pull request May 25, 2026
Replaces previous test commit that conflicted with #390 refactor of
__pending_analysis__ region. Same purpose: trigger area-cli routing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:run fire CI test matrix on this PR ecp:area-cli ecp signal ecp:risk-low ecp signal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant