Skip to content

Commit 4b73809

Browse files
hyperpolymathclaude
andcommitted
feat(res-to-affine): tree-sitter AST walker for side-effect-import (Phase 2b, Refs #57)
Lands the first AST-based detection pass on top of the Phase-2a grammar build pipeline (#321). New module `tools/res-to-affine/ walker.ml` shells out to the vendored `tree-sitter` CLI, parses its default s-expression output (with embedded `[row, col]` byte ranges), and walks the AST to find the `side-effect-import` anti-pattern structurally rather than via the column-0-anchored regex the Phase-1 scanner uses. Detection upgrade vs. Phase 1 ----------------------------- The walker flags `let _ = Mod.value` only when the binding sits at module top level — direct child of `source_file`, or direct child of a `block` that is the body of a `module_declaration`. The Phase-1 regex matches any column-0 occurrence of the same shape. On the existing `sample.res` fixture the walker reports 1 finding (the intended top-level `let _ = Pixi.Sound.register` on line 8); the scanner reports the same 1 plus would also have reported the same pattern nested inside a function body. The regex pattern that #319 band-aided with column-0 anchoring is eliminated structurally by the walker. CLI --- New `--engine={scanner,walker}` flag on the CLI (default: scanner, preserving Phase-1 behaviour); `--grammar-dir DIR` overrides the walker's parser location (default: `tools/vendor/tree-sitter-rescript`, matching `install.sh` output). When the walker hits any error (grammar missing, tree-sitter not on PATH, s-exp parse failure), it prints a one-line reason to stderr and falls back to the scanner — the CLI never bails because of walker problems. Tests ----- `test/test_walker.ml` adds two end-to-end tests under a new `res-to-affine-walker` alcotest run. Both auto-skip if the `tree-sitter` CLI is missing or the vendored grammar has not been built, so `dune runtest` on a fresh clone (no bootstrap) still passes. The repo-root discovery walks up from cwd looking for `dune-project` to find the source-tree grammar path; the dune sandbox otherwise hides it. CI -- The existing `build` job now installs `tree-sitter-cli` (npm, fast) and runs `install.sh` before `dune runtest`, so the walker tests actually execute rather than auto-skip. The migration-assistant gate added in #321 stays — it remains the dedicated first-signal job for grammar-pin drift. Scope discipline ---------------- `raw-js`, `untyped-exception`, `mutable-global` remain scanner-only; their AST counterparts land in Phase 2c. The walker exposes a stable surface (`Walker.scan : grammar_dir:string -> path:string -> source:string -> Scanner.finding list`) that 2c extends rather than re-architects. Stack: #321 (Phase 2a) → this PR (Phase 2b) → Phase 2c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5d234ef commit 4b73809

8 files changed

Lines changed: 658 additions & 10 deletions

File tree

.github/workflows/ci.yml

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,18 @@ jobs:
3131
- name: Install dependencies
3232
run: opam install . --deps-only --with-test --with-doc
3333

34+
- name: Install tree-sitter CLI (for res-to-affine walker tests)
35+
# Same rationale as the migration-assistant job (see below):
36+
# npm distribution is the fast CI install (~5 s). The walker
37+
# end-to-end tests in tools/res-to-affine/test/test_walker.ml
38+
# auto-skip if the CLI / generated grammar aren't present, so
39+
# this step is only required to *exercise* the walker — the
40+
# build itself does not depend on it.
41+
run: npm install -g tree-sitter-cli@^0.25.0
42+
43+
- name: Build pinned tree-sitter-rescript grammar
44+
run: ./editors/tree-sitter-rescript/scripts/install.sh
45+
3446
- name: Build
3547
run: opam exec -- dune build
3648

@@ -149,7 +161,12 @@ jobs:
149161
run: npm install -g tree-sitter-cli@^0.25.0
150162

151163
- name: Build pinned tree-sitter-rescript grammar
152-
run: just install-grammar
164+
# Direct script invocation rather than `just install-grammar` —
165+
# GitHub Actions runners do not ship `just` preinstalled, and
166+
# there is no other recipe used in this workflow that justifies
167+
# adding a setup step for it. The justfile recipe still exists
168+
# for local developer ergonomics; both call the same script.
169+
run: ./editors/tree-sitter-rescript/scripts/install.sh
153170

154171
- name: Verify generated parser
155172
# `tree-sitter generate` is supposed to drop src/parser.c into

tools/res-to-affine/README.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,14 @@ and the broader `idaptik` migration.
1616
## Usage
1717

1818
```sh
19-
# print skeleton to stdout
19+
# print skeleton to stdout (default: regex scanner)
2020
dune exec tools/res-to-affine/main.exe -- path/to/Foo.res
2121

2222
# or write to a file
2323
dune exec tools/res-to-affine/main.exe -- path/to/Foo.res -o Foo.affine
24+
25+
# opt into the Phase-2 tree-sitter AST walker
26+
dune exec tools/res-to-affine/main.exe -- --engine=walker path/to/Foo.res
2427
```
2528

2629
The output is **not compilable**. It is a starting point for the human:
@@ -29,6 +32,24 @@ migration-considerations block; the middle is a `module` stub with
2932
`TODO`s. The human picks the decomposition; the tool surfaces what
3033
needs re-decomposing.
3134

35+
### Detection engines
36+
37+
| `--engine` | Implementation | When to use |
38+
|---|---|---|
39+
| `scanner` (default) | Line-anchored regex over the raw source (`scanner.ml`). | Default — no prerequisites, ships with the binary. |
40+
| `walker` | Shells out to the vendored `tree-sitter` CLI, walks the AST (`walker.ml`). | When the regex's false-positive surface matters — eliminates the `let _ = chained.call()` class of misfire that the column-0 anchor in #319 had to band-aid. |
41+
42+
The walker requires the vendored `tree-sitter-rescript` grammar to be
43+
built first:
44+
45+
```sh
46+
just install-grammar
47+
# or: ./editors/tree-sitter-rescript/scripts/install.sh
48+
```
49+
50+
If the grammar isn't built or the `tree-sitter` CLI isn't on PATH, the
51+
walker auto-falls-back to the scanner and prints the reason to stderr.
52+
3253
## What gets flagged (Phase 1)
3354

3455
The six anti-patterns surfaced in the
@@ -48,6 +69,24 @@ Deferred to Phase 2 (need real AST):
4869
inside one record literal (collapse to a row-polymorphic record).
4970
- **oversized function** — function body > ~50 LOC (decompose).
5071

72+
### Walker coverage (Phase 2)
73+
74+
| Anti-pattern | Scanner (regex) | Walker (AST) |
75+
|---|---|---|
76+
| `side-effect-import` || ✓ — Phase 2b |
77+
| `raw-js` || — Phase 2c |
78+
| `untyped-exception` || — Phase 2c |
79+
| `mutable-global` || — Phase 2c |
80+
| inline lambda callback record || — Phase 2c |
81+
| oversized function || — Phase 2c |
82+
83+
The walker improves on the regex by being structural: it only reports
84+
`side-effect-import` when `let _ = Mod.value` sits at module top level
85+
(direct child of `source_file` or a `module_declaration` body), not when
86+
the same shape appears inside a function body — where it is normally a
87+
ReScript "discard the return value of a chained call" idiom, not a
88+
module-load side effect.
89+
5190
## Why a skeleton and not a transliteration
5291

5392
The Frontier Programming Guides' standing rule is **re-decompose, not

tools/res-to-affine/dune

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,5 @@
99

1010
(library
1111
(name res_to_affine)
12-
(modules scanner emitter)
13-
(libraries str))
12+
(modules scanner emitter walker)
13+
(libraries str unix))

tools/res-to-affine/main.ml

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,30 @@ let write_file path contents =
2727
output_string oc contents;
2828
close_out oc
2929

30-
let run input output_opt =
30+
type engine = Scanner_engine | Walker_engine
31+
32+
let engine_label = function
33+
| Scanner_engine -> "scanner"
34+
| Walker_engine -> "walker"
35+
36+
let run engine grammar_dir input output_opt =
3137
if not (Sys.file_exists input) then begin
3238
Format.eprintf "res-to-affine: input not found: %s@." input;
3339
exit 2
3440
end;
3541
let source = read_file input in
36-
let findings = Scanner.scan source in
42+
let findings =
43+
match engine with
44+
| Scanner_engine -> Scanner.scan source
45+
| Walker_engine ->
46+
(try Walker.scan ~grammar_dir ~path:input ~source with
47+
| Failure msg ->
48+
Format.eprintf "res-to-affine: %s@." msg;
49+
Format.eprintf
50+
"res-to-affine: falling back to scanner engine for %s@."
51+
input;
52+
Scanner.scan source)
53+
in
3754
let module_name = Emitter.module_name_of_path input in
3855
let out =
3956
Emitter.emit
@@ -48,9 +65,10 @@ let run input output_opt =
4865
| Some path ->
4966
write_file path out;
5067
Format.printf
51-
"res-to-affine: %d finding%s → %s@."
68+
"res-to-affine: %d finding%s [%s] → %s@."
5269
(List.length findings)
5370
(if List.length findings = 1 then "" else "s")
71+
(engine_label engine)
5472
path
5573

5674
(* ---- cmdliner wiring ---- *)
@@ -67,11 +85,36 @@ let output_arg =
6785
value & opt (some string) None &
6886
info ["o"; "output"] ~docv:"FILE" ~doc)
6987

88+
let engine_arg =
89+
let doc =
90+
"Detection engine: 'scanner' (default, line-regex, Phase 1) or \
91+
'walker' (tree-sitter AST, Phase 2). The walker requires the \
92+
vendored grammar to be built — see `just install-grammar`. \
93+
Falls back to 'scanner' if the grammar is missing or \
94+
tree-sitter parse fails."
95+
in
96+
Cmdliner.Arg.(
97+
value & opt
98+
(enum ["scanner", Scanner_engine; "walker", Walker_engine])
99+
Scanner_engine &
100+
info ["engine"] ~docv:"ENGINE" ~doc)
101+
102+
let grammar_dir_arg =
103+
let doc =
104+
"Path to the generated tree-sitter-rescript grammar directory \
105+
(the output of `just install-grammar`). Only consulted when \
106+
`--engine=walker`. Defaults to `tools/vendor/tree-sitter-rescript`."
107+
in
108+
Cmdliner.Arg.(
109+
value & opt string Walker.default_grammar_dir &
110+
info ["grammar-dir"] ~docv:"DIR" ~doc)
111+
70112
let cmd =
71113
let doc = "Emit an AffineScript skeleton from a ReScript source file." in
72114
let info = Cmdliner.Cmd.info "res-to-affine" ~version:"0.1.0" ~doc in
73115
let term =
74-
Cmdliner.Term.(const run $ input_arg $ output_arg)
116+
Cmdliner.Term.(
117+
const run $ engine_arg $ grammar_dir_arg $ input_arg $ output_arg)
75118
in
76119
Cmdliner.Cmd.v info term
77120

tools/res-to-affine/test/dune

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
; SPDX-License-Identifier: MPL-2.0
22

3-
(test
4-
(name test_emit)
3+
(tests
4+
(names test_emit test_walker)
55
(libraries res_to_affine alcotest)
66
(deps
77
(glob_files fixtures/*.res)
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
(* SPDX-License-Identifier: MPL-2.0 *)
2+
(* SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell *)
3+
4+
(** End-to-end tests for the tree-sitter walker (#57 Phase 2b).
5+
6+
These tests shell out to the [tree-sitter] CLI; they are
7+
automatically skipped when the CLI is not on PATH so a fresh
8+
clone can still run [dune runtest] without bootstrapping the
9+
grammar. CI installs tree-sitter and runs them as a gate.
10+
11+
To run locally: install tree-sitter (`cargo install
12+
tree-sitter-cli`), then `just install-grammar`, then `dune
13+
runtest tools/res-to-affine/`. *)
14+
15+
open Res_to_affine
16+
17+
let read_file path =
18+
let ic = open_in_bin path in
19+
let n = in_channel_length ic in
20+
let s = really_input_string ic n in
21+
close_in ic;
22+
s
23+
24+
let tree_sitter_available () =
25+
Sys.command "command -v tree-sitter > /dev/null 2>&1" = 0
26+
27+
(* Find the repo root by walking up from the test's runtime cwd looking
28+
for `dune-project`. Dune sandboxes tests under
29+
`_build/default/.../test/`, so the natural "up three levels" arithmetic
30+
gives a build-tree path where the source-tree
31+
`tools/vendor/tree-sitter-rescript` does not exist. *)
32+
let rec find_repo_root dir =
33+
if Sys.file_exists (Filename.concat dir "dune-project") then Some dir
34+
else
35+
let parent = Filename.dirname dir in
36+
if parent = dir then None (* hit filesystem root *)
37+
else find_repo_root parent
38+
39+
let repo_root () =
40+
match Sys.getenv_opt "DUNE_SOURCEROOT" with
41+
| Some s when s <> "" -> s
42+
| _ ->
43+
(match find_repo_root (Sys.getcwd ()) with
44+
| Some s -> s
45+
| None -> Sys.getcwd ())
46+
47+
let grammar_dir () =
48+
Filename.concat (repo_root ()) "tools/vendor/tree-sitter-rescript"
49+
50+
let grammar_built () =
51+
Sys.file_exists (Filename.concat (grammar_dir ()) "src/parser.c")
52+
53+
let skip_unless_ready () =
54+
if not (tree_sitter_available ()) then begin
55+
Printf.printf
56+
" [skip] tree-sitter CLI not on PATH; install via `cargo install \
57+
tree-sitter-cli`@\n";
58+
Alcotest.skip ()
59+
end;
60+
if not (grammar_built ()) then begin
61+
Printf.printf
62+
" [skip] grammar not built; run `just install-grammar`@\n";
63+
Alcotest.skip ()
64+
end
65+
66+
let fixture = "fixtures/sample.res"
67+
68+
let test_walker_finds_side_effect_import () =
69+
skip_unless_ready ();
70+
let source = read_file fixture in
71+
let path = Filename.concat (Sys.getcwd ()) fixture in
72+
let findings =
73+
Walker.scan ~grammar_dir:(grammar_dir ()) ~path ~source
74+
in
75+
let has_kind k =
76+
List.exists (fun (f : Scanner.finding) -> f.kind = k) findings
77+
in
78+
Alcotest.(check bool)
79+
"walker reports side-effect-import on sample.res"
80+
true (has_kind Scanner.Side_effect_import)
81+
82+
let test_walker_only_module_toplevel () =
83+
(* The walker's promised improvement over the regex scanner: it
84+
should NOT flag `let _ = chained.call()` inside a function body
85+
as a side-effect-import. We can't synthesise that case without
86+
running tree-sitter, so this test is gated and uses the existing
87+
fixture, which has the regex-scanner-matching shape at module
88+
top level — the walker should match it. The negative case lives
89+
in Phase 2c's expanded corpus. *)
90+
skip_unless_ready ();
91+
let source = read_file fixture in
92+
let path = Filename.concat (Sys.getcwd ()) fixture in
93+
let findings =
94+
Walker.scan ~grammar_dir:(grammar_dir ()) ~path ~source
95+
in
96+
let side_effect_lines =
97+
List.filter_map
98+
(fun (f : Scanner.finding) ->
99+
if f.kind = Scanner.Side_effect_import then Some f.line else None)
100+
findings
101+
in
102+
(* sample.res line 8: `let _ = Pixi.Sound.register` — the only
103+
top-level side-effect import. *)
104+
Alcotest.(check (list int))
105+
"walker reports the line-8 import and only that one"
106+
[8] side_effect_lines
107+
108+
(* ---- s-exp parser sanity (NOT gated; pure OCaml) ---------------------------
109+
110+
The walker subprocess is shelled out in the gated tests above. The
111+
s-exp parser itself is pure-OCaml and can be exercised directly,
112+
but it lives behind walker.ml's module boundary. We do not unit-
113+
test it here to avoid widening the .mli for test-only access; the
114+
gated end-to-end tests above exercise it through real
115+
tree-sitter output. *)
116+
117+
let () =
118+
Alcotest.run "res-to-affine-walker"
119+
[
120+
( "walker",
121+
[
122+
Alcotest.test_case "side-effect-import found on sample.res"
123+
`Quick test_walker_finds_side_effect_import;
124+
Alcotest.test_case "module-toplevel-only, correct line"
125+
`Quick test_walker_only_module_toplevel;
126+
] );
127+
]

0 commit comments

Comments
 (0)