Skip to content

Commit 0be1402

Browse files
committed
feat(res-to-affine): Phase 2c — walker covers all 6 anti-patterns + default-engine flip (Refs #57)
Brings the tree-sitter walker to full parity with the Phase-1 regex scanner on the three remaining ported kinds (raw-js, untyped-exception, mutable-global), adds the two anti-patterns that were explicitly deferred from Phase 1 because they need real AST (inline-callback-record, oversized-function), and flips --engine=walker to the default in main.ml. Scope (per PR #322's "What is NOT in this PR" / Phase 2c plan): - raw-js: any extension_expression node (covers %raw / %bs.raw). - untyped-exception: try_expression, call_expression with function text "raise", and member_expression / value_identifier_path whose text starts with "Js.Exn" or ends with "Promise.catch". - mutable-global: top-level let_declaration whose let_binding body is a call to ref(), plus top-level mutation_expression (:=). The "at module top level" predicate now walks the ancestor chain outward, refusing if a function or let_binding body intervenes before source_file / module_declaration. - inline-callback-record: record literal or call_expression with ≥3 inline function values (directly, or wrapped in labeled_argument / record_field). Threshold matches LESSONS.md. - oversized-function: function node whose stop.row - start.row +1 exceeds 50 source rows. Findings deduped by (kind, line) to match the regex scanner's "one match per regex per line" contract. CLI default flips to --engine=walker; the walker auto-falls-back to the scanner if the vendored grammar is missing or tree-sitter parse fails (unchanged fallback path from #322). Tests - test_walker.ml now exercises every kind on the existing fixtures/sample.res (raw-js line 11, mutable-global lines 14+15, untyped-exception lines 19/22/28). All gated on tree-sitter + vendored grammar being present, same skip discipline as Phase 2b. - New fixtures/phase2c.res exercises the two walker-only kinds: a 4-field handlers record + a Widget.make call with 3 labelled- argument lambdas (inline-callback-record), plus a let-bound function spanning 60 source rows (oversized-function). Docs - tools/res-to-affine/README.md updated: walker is the default, coverage matrix shows ✓ for all six kinds, what-gets-flagged table replaces the old Phase-1/Phase-2 split. - walker.mli docstring updated for Phase 2c scope. - scanner.mli kind type extended with Inline_callback_record and Oversized_function; scanner.ml gives them labels + guidance. The scanner.ml scan pipeline is unchanged (these two kinds are walker-only by construction). Test plan - [ ] CI build green (dune build + dune runtest). - [ ] CI migration-assistant job green. - [ ] Local: just install-grammar && dune runtest tools/res-to-affine/ reports all walker tests OK. Local-build caveat: container has no OCaml toolchain (per CLAUDE.md §Agent operations notes); dune build / dune runtest were not run locally. CI is the source of truth. Refs #57
1 parent b89eb1a commit 0be1402

8 files changed

Lines changed: 529 additions & 122 deletions

File tree

tools/res-to-affine/README.md

Lines changed: 55 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,14 @@ and the broader `idaptik` migration.
1616
## Usage
1717

1818
```sh
19-
# print skeleton to stdout (default: regex scanner)
19+
# print skeleton to stdout (default: tree-sitter AST walker, Phase 2c)
2020
dune exec tools/res-to-affine/main.exe -- path/to/Foo.res
2121

2222
# or write to a file
2323
dune exec tools/res-to-affine/main.exe -- path/to/Foo.res -o Foo.affine
2424

25-
# opt into the Phase-2 tree-sitter AST walker
26-
dune exec tools/res-to-affine/main.exe -- --engine=walker path/to/Foo.res
25+
# opt back into the Phase-1 line-regex scanner (no grammar required)
26+
dune exec tools/res-to-affine/main.exe -- --engine=scanner path/to/Foo.res
2727
```
2828

2929
The output is **not compilable**. It is a starting point for the human:
@@ -36,8 +36,8 @@ needs re-decomposing.
3636

3737
| `--engine` | Implementation | When to use |
3838
|---|---|---|
39-
| `scanner` (default) | Line-anchored regex over the raw source (`scanner.ml`). | Default — no prerequisites, ships with the binary. |
40-
| `walker` | Shells out to the vendored `tree-sitter` CLI, walks the AST (`walker.ml`). | When the regex's false-positive surface matters — eliminates the `let _ = chained.call()` class of misfire that the column-0 anchor in #319 had to band-aid. |
39+
| `walker` (default) | Shells out to the vendored `tree-sitter` CLI, walks the AST (`walker.ml`). | Default since Phase 2c — covers all six anti-patterns including the two that the scanner cannot see (inline callback records, oversized functions) and eliminates the `let _ = chained.call()` / line-anchored false-positive classes. |
40+
| `scanner` | Line-anchored regex over the raw source (`scanner.ml`). | Fallback when the vendored grammar is unavailable (no `tree-sitter` CLI, missing `tools/vendor/tree-sitter-rescript/`). Detects four of the six anti-patterns only. |
4141

4242
The walker requires the vendored `tree-sitter-rescript` grammar to be
4343
built first:
@@ -50,42 +50,37 @@ just install-grammar
5050
If the grammar isn't built or the `tree-sitter` CLI isn't on PATH, the
5151
walker auto-falls-back to the scanner and prints the reason to stderr.
5252

53-
## What gets flagged (Phase 1)
53+
## What gets flagged
5454

5555
The six anti-patterns surfaced in the
56-
[idaptik Wave 3 pilot](https://github.com/hyperpolymath/idaptik/blob/main/migration/main/LESSONS.md),
57-
of which the line-based scanner reliably detects four:
56+
[idaptik Wave 3 pilot](https://github.com/hyperpolymath/idaptik/blob/main/migration/main/LESSONS.md):
5857

59-
| Tag | Detection | AffineScript answer |
58+
| Tag | Detection (walker, default) | AffineScript answer |
6059
|---|---|---|
61-
| `side-effect-import` | `let _ = Mod.foo` at top level | Explicit registration call |
62-
| `raw-js` | `%raw(...)` or `[%bs.raw ...]` | Typed extern (`ABI-FFI-README.md`) |
63-
| `untyped-exception` | `Promise.catch`, `Js.Exn`, `raise`, `try` | `Result[E, A]` / `Validation[E, A]` |
64-
| `mutable-global` | `:=` operator | Affine record threaded through |
60+
| `side-effect-import` | `let _ = Mod.foo` at module top level (structural — not nested inside a function body) | Explicit registration call |
61+
| `raw-js` | `extension_expression` node — any `%name(...)` or `[%bs.name ...]` | Typed extern (`ABI-FFI-README.md`) |
62+
| `untyped-exception` | `try_expression`, `raise(...)` call, `Js.Exn.*` reference, `Promise.catch` member access | `Result[E, A]` / `Validation[E, A]` |
63+
| `mutable-global` | Top-level `let x = ref(...)` (call-of-`ref` body) OR top-level `mutation_expression` (`x := y`) | Affine record threaded through |
64+
| `inline-callback-record` | ≥ 3 inline `function` values in one `record` literal OR one call's `arguments` list (via `labeled_argument` or direct) | Row-polymorphic handler record (LESSONS.md §callback-record) |
65+
| `oversized-function` | `function` node whose row span exceeds 50 source lines | Re-decompose before porting; do not transliterate |
6566

66-
Deferred to Phase 2 (need real AST):
67-
68-
- **inline lambda callback record** — N ≥ 3 `~handler: (...) =>` lambdas
69-
inside one record literal (collapse to a row-polymorphic record).
70-
- **oversized function** — function body > ~50 LOC (decompose).
71-
72-
### Walker coverage (Phase 2)
67+
### Walker vs scanner coverage
7368

7469
| Anti-pattern | Scanner (regex) | Walker (AST) |
7570
|---|---|---|
76-
| `side-effect-import` || Phase 2b |
77-
| `raw-js` || Phase 2c |
78-
| `untyped-exception` || Phase 2c |
79-
| `mutable-global` || Phase 2c |
80-
| inline lambda callback record || Phase 2c |
81-
| oversized function || Phase 2c |
82-
83-
The walker improves on the regex by being structural: it only reports
84-
`side-effect-import` when `let _ = Mod.value` sits at module top level
85-
(direct child of `source_file` or a `module_declaration` body), not when
86-
the same shape appears inside a function body — where it is normally a
87-
ReScript "discard the return value of a chained call" idiom, not a
88-
module-load side effect.
71+
| `side-effect-import` ||(since Phase 2b, #322) |
72+
| `raw-js` || ✓ (since Phase 2c) |
73+
| `untyped-exception` || ✓ (since Phase 2c) |
74+
| `mutable-global` || ✓ (since Phase 2c) |
75+
| `inline-callback-record` || ✓ (since Phase 2c, walker-only by construction) |
76+
| `oversized-function` || ✓ (since Phase 2c, walker-only by construction) |
77+
78+
The walker improves on the regex by being structural: it reports
79+
`side-effect-import` only when `let _ = Mod.value` sits at module top
80+
level, distinguishes a `try { ... }` expression from the identifier
81+
`try`, only flags `Mutable_global` for module-scoped state (not local
82+
refs inside a function body), and dedupes structurally-overlapping
83+
findings on the same line.
8984

9085
## Why a skeleton and not a transliteration
9186

@@ -119,18 +114,27 @@ to tree-sitter in Phase 2 behind something that already pays its way.
119114

120115
### Phase 2 — tree-sitter AST walker
121116

122-
- Install the pinned grammar from
123-
`editors/tree-sitter-rescript/` (manifest-only vendoring of
124-
`rescript-lang/tree-sitter-rescript@990214a`).
125-
- Replace `Scanner` with a walker over the s-expression output of
126-
`tree-sitter parse --quiet`, parsed by the existing `sexplib0`
127-
dependency.
128-
- Adds the two deferred patterns (callback records, oversized
129-
functions) and unlocks **structural** translation of trivial forms
130-
(e.g. `option<X>``Option[X]`, `result<X, Y>``Result[Y, X]`,
131-
`switch x { | A => ... }``match x { A => ... }`).
132-
- The `Emitter` interface does not change: same skeleton shape, same
133-
marker schema, richer body.
117+
Vendoring of the pinned grammar
118+
(`rescript-lang/tree-sitter-rescript@990214a`) lives in
119+
`editors/tree-sitter-rescript/`; `install.sh` materialises the
120+
parser into `tools/vendor/tree-sitter-rescript/`.
121+
122+
- **Phase 2a (#321)**`just install-grammar`, the
123+
`migration-assistant` CI job that runs it, dual install path
124+
(`cargo install tree-sitter-cli` or `npm install -g
125+
tree-sitter-cli`).
126+
- **Phase 2b (#322)** — the walker itself: subprocess to the
127+
`tree-sitter` CLI, hand-rolled s-expression parser over the
128+
default `[row, col]`-annotated output, AST-based detection of
129+
`side-effect-import` only.
130+
- **Phase 2c (this revision)** — walker covers all six anti-patterns
131+
including the two that the scanner cannot see; `--engine=walker`
132+
becomes the CLI default. The `Emitter` interface does not change;
133+
the marker schema is the same.
134+
135+
Walker output is deduplicated by `(kind, line)` so structurally-
136+
overlapping AST matches don't inflate the bullet count above what
137+
the line-based scanner would produce on the same file.
134138

135139
### Phase 3 — partial translation
136140

@@ -166,9 +170,12 @@ cd tools/res-to-affine/test
166170
```
167171

168172
The fixture under `test/fixtures/sample.res` is synthetic and exercises
169-
every Phase-1 anti-pattern. Real `.res` files from the estate (e.g.
170-
`gitbot-fleet/bots/sustainabot/bot-integration/src/*.res`) can be run
171-
ad hoc through the CLI without changes to the test suite.
173+
every Phase-1 anti-pattern; `test/fixtures/phase2c.res` exercises the
174+
two anti-patterns that are walker-only by construction
175+
(`inline-callback-record`, `oversized-function`). Real `.res` files
176+
from the estate (e.g. `gitbot-fleet/bots/sustainabot/bot-integration/
177+
src/*.res`) can be run ad hoc through the CLI without changes to the
178+
test suite.
172179

173180
## Non-goals
174181

tools/res-to-affine/main.ml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -87,16 +87,16 @@ let output_arg =
8787

8888
let engine_arg =
8989
let doc =
90-
"Detection engine: 'scanner' (default, line-regex, Phase 1) or \
91-
'walker' (tree-sitter AST, Phase 2). The walker requires the \
90+
"Detection engine: 'walker' (default, tree-sitter AST, Phase 2c) \
91+
or 'scanner' (line-regex, Phase 1). The walker requires the \
9292
vendored grammar to be built — see `just install-grammar`. \
9393
Falls back to 'scanner' if the grammar is missing or \
9494
tree-sitter parse fails."
9595
in
9696
Cmdliner.Arg.(
9797
value & opt
9898
(enum ["scanner", Scanner_engine; "walker", Walker_engine])
99-
Scanner_engine &
99+
Walker_engine &
100100
info ["engine"] ~docv:"ENGINE" ~doc)
101101

102102
let grammar_dir_arg =

tools/res-to-affine/scanner.ml

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,16 @@ type kind =
66
| Raw_js
77
| Untyped_exception
88
| Mutable_global
9+
| Inline_callback_record
10+
| Oversized_function
911

1012
let kind_to_label = function
11-
| Side_effect_import -> "side-effect-import"
12-
| Raw_js -> "raw-js"
13-
| Untyped_exception -> "untyped-exception"
14-
| Mutable_global -> "mutable-global"
13+
| Side_effect_import -> "side-effect-import"
14+
| Raw_js -> "raw-js"
15+
| Untyped_exception -> "untyped-exception"
16+
| Mutable_global -> "mutable-global"
17+
| Inline_callback_record -> "inline-callback-record"
18+
| Oversized_function -> "oversized-function"
1519

1620
let kind_to_guidance = function
1721
| Side_effect_import ->
@@ -29,6 +33,14 @@ let kind_to_guidance = function
2933
"Top-level mutable global. AffineScript does not encourage \
3034
module-scoped mutation; pass state as an affine record or \
3135
through an effect handler."
36+
| Inline_callback_record ->
37+
"3+ inline callbacks in one record/call. Lift to a row-polymorphic \
38+
handler record (see LESSONS.md §callback-record) so each handler \
39+
is named and individually overridable."
40+
| Oversized_function ->
41+
"Function spans >50 source lines. Re-decompose before porting; a \
42+
direct translation will preserve the size and the implicit \
43+
contract it bakes in."
3244

3345
type finding = {
3446
kind : kind;

tools/res-to-affine/scanner.mli

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,21 +10,24 @@
1010
using the vendored [editors/tree-sitter-rescript] grammar. The
1111
[Emitter] interface is stable across the two implementations.
1212
13-
Patterns detected today:
13+
Patterns detected today by the Phase-1 scanner (line/regex):
1414
- {b side-effect import} : [let _ = Mod.foo] (ReScript module-load hack)
1515
- {b raw JS} : any line containing [%raw] (typed FFI required)
1616
- {b untyped exception} : [Promise.catch], [Js.Exn], [raise], [try]
1717
- {b mutable global} : top-level [ref] or [:=] assignment
1818
19-
Deferred to Phase 2 (need real AST):
20-
- {b inline lambda callback record} : N>=3 [~handler: (...) =>] in a record
21-
- {b oversized function} : function body >50 LOC *)
19+
Additional kinds detected by the Phase-2 walker only (need real AST):
20+
- {b inline callback record} : N>=3 inline function values in a record
21+
literal or a single call's labelled-argument list
22+
- {b oversized function} : function spans more than 50 source lines *)
2223

2324
type kind =
2425
| Side_effect_import
2526
| Raw_js
2627
| Untyped_exception
2728
| Mutable_global
29+
| Inline_callback_record
30+
| Oversized_function
2831

2932
val kind_to_label : kind -> string
3033
(** Short tag used in emitted comment markers (e.g. ["side-effect-import"]). *)
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
// SPDX-License-Identifier: MIT
2+
// Synthetic fixture for the two anti-patterns that were explicitly
3+
// deferred from Phase 1 entirely because they need real AST:
4+
// 1. inline-callback-record — 3+ inline function values in a single
5+
// record literal or a single call's labelled-argument list
6+
// 2. oversized-function — function spans more than 50 source rows
7+
// Walker-only by construction; the line-regex scanner does not detect
8+
// either pattern.
9+
10+
open Types
11+
12+
// --- inline-callback-record: a record literal with 4 inline lambdas
13+
let handlers = {
14+
onMount: () => Js.log("mounted"),
15+
onUnmount: () => Js.log("unmounted"),
16+
onClick: e => Js.log(e),
17+
onHover: e => Js.log(e),
18+
}
19+
20+
// --- inline-callback-record at a call site: 3 labelled-argument lambdas
21+
let _ = Widget.make(
22+
~onMount=() => Js.log("mounted"),
23+
~onUnmount=() => Js.log("unmounted"),
24+
~onClick=e => Js.log(e),
25+
)
26+
27+
// --- oversized-function: 60 source-row span. Body intentionally tedious
28+
// to surface the row-span proxy the walker uses.
29+
let huge = id => {
30+
let a01 = id + 1
31+
let a02 = a01 + 1
32+
let a03 = a02 + 1
33+
let a04 = a03 + 1
34+
let a05 = a04 + 1
35+
let a06 = a05 + 1
36+
let a07 = a06 + 1
37+
let a08 = a07 + 1
38+
let a09 = a08 + 1
39+
let a10 = a09 + 1
40+
let a11 = a10 + 1
41+
let a12 = a11 + 1
42+
let a13 = a12 + 1
43+
let a14 = a13 + 1
44+
let a15 = a14 + 1
45+
let a16 = a15 + 1
46+
let a17 = a16 + 1
47+
let a18 = a17 + 1
48+
let a19 = a18 + 1
49+
let a20 = a19 + 1
50+
let a21 = a20 + 1
51+
let a22 = a21 + 1
52+
let a23 = a22 + 1
53+
let a24 = a23 + 1
54+
let a25 = a24 + 1
55+
let a26 = a25 + 1
56+
let a27 = a26 + 1
57+
let a28 = a27 + 1
58+
let a29 = a28 + 1
59+
let a30 = a29 + 1
60+
let a31 = a30 + 1
61+
let a32 = a31 + 1
62+
let a33 = a32 + 1
63+
let a34 = a33 + 1
64+
let a35 = a34 + 1
65+
let a36 = a35 + 1
66+
let a37 = a36 + 1
67+
let a38 = a37 + 1
68+
let a39 = a38 + 1
69+
let a40 = a39 + 1
70+
let a41 = a40 + 1
71+
let a42 = a41 + 1
72+
let a43 = a42 + 1
73+
let a44 = a43 + 1
74+
let a45 = a44 + 1
75+
let a46 = a45 + 1
76+
let a47 = a46 + 1
77+
let a48 = a47 + 1
78+
let a49 = a48 + 1
79+
let a50 = a49 + 1
80+
let a51 = a50 + 1
81+
let a52 = a51 + 1
82+
a52
83+
}

0 commit comments

Comments
 (0)