Skip to content

Commit c392e49

Browse files
docs: clarify codegen environment rules for top-level const and fn bindings (#91)
* docs: clarify codegen environment rules for top-level const and fn bindings (closes #89) Add const_decl, extern_type_decl, extern_fn_decl to the top_level grammar in SPEC.md 2.1. Add sections 2.9 and 2.10 covering their syntax. Add section 8 (Codegen Module Environment) documenting the func_indices dual-use encoding: non-negative keys are Wasm function indices, negative keys are -(global_idx+1) for constants. Update the func_indices field comment in lib/codegen.ml to match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: tighten SPEC.md codegen-environment section, drop unmerged extern grammar Review pass on the #89 spec changes against the current main: - 1.2 / 2.1 / 2.10: remove `extern`, `extern_type_decl`, and `extern_fn_decl`. `extern` is not yet a reserved keyword on main and is not parsed by the current lib/parser.mly; the AST has no `TopExternFn` / `TopExternType` variants. Documenting these forms here advertises features that arrive with a separate, unmerged PR (#92, fix/issue-42-extern-parsing). Spec re-introduces these sections when the implementation lands. - 2.9: fix the cross-reference (was "see §5.1" which is Primitive Types; correct target is §8). Rephrase to make clear that the initializer must reduce to a Wasm constant expression in the linear-memory backend. - 8.1: fix `gen_program` -> `generate_module` (the actual fold-over-decls entry point in lib/codegen.ml). Rewrite the section to: * separate the encoding (data layout) from the lookup (instruction selection) so the reader understands what's currently implemented vs. what issue #73 still needs; * call out that `ExprApp (ExprVar _, _)` currently emits `call k` unconditionally, so the sign-test decode path for the negative sentinel is the remaining piece tracked under #73 rather than being described as already implemented; * spell out per-case `gen_decl` behaviour (TopFn registers before generating its body; TopConst registers after appending the global). - 8.2: removed alongside the extern grammar — same rationale as 2.10. The lib/codegen.ml `func_indices` field doc-comment from the original PR was accurate and stays. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: restore extern coverage after rebase onto main (#90 merged) PR #90 landed `extern fn` / `extern type` parsing and codegen on main while this branch was in review. Restore the SPEC.md coverage to match what actually shipped: - 1.2: re-add `extern` to the keyword list. - 2.1: re-add `extern_fn_decl` and `extern_type_decl` to `top_level`. Note the parser uses two separate productions that both feed back into `TopFn` / `TopType` AST variants (with `FnExtern` / `TyExtern` as the body kind); the spec describes the surface grammar, not the AST shape. - 2.10: re-introduce, but with the actual parsed shape (the productions accept `type_params`, the fn form accepts `effects`, and both accept optional `visibility`) rather than the simplified form from the original PR. Clarify the runtime contract: extern fn lowers to a Wasm import, extern type generates no artifact. - 8.1 / 8.2: split the `TopFn` population case into the `FnExtern` / non-`FnExtern` variants so the description matches the guard pattern in lib/codegen.ml. Hard-coded `"env"` Wasm module name is called out explicitly. Also update the lib/codegen.ml doc-comment on `func_indices` to mention both `TopFn` paths (defined and extern) and clarify that insertion order is source order. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent d126286 commit c392e49

2 files changed

Lines changed: 125 additions & 2 deletions

File tree

docs/specs/SPEC.md

Lines changed: 117 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ row_var = ".." lower_ident
2525
### 1.2 Keywords
2626

2727
```
28-
fn let mut own ref type struct enum trait impl effect handle
28+
fn let const extern mut own ref type struct enum trait impl effect handle
2929
resume handler match if else while for return break continue in
3030
true false where total module use pub as unsafe assume transmute
3131
forget Nat Int Bool Float String Type Row
@@ -68,6 +68,7 @@ Special: \ (row restriction)
6868
```ebnf
6969
program = [module_decl] {import_decl} {top_level}
7070
top_level = fn_decl | type_decl | trait_decl | impl_block | effect_decl
71+
| const_decl | extern_fn_decl | extern_type_decl
7172
```
7273

7374
### 2.2 Type Declarations
@@ -194,6 +195,44 @@ impl_block = "impl" [type_params] [trait_ref "for"] type_expr
194195
[where_clause] "{" {impl_item} "}"
195196
```
196197

198+
### 2.9 Constant Declarations
199+
200+
```ebnf
201+
const_decl = [visibility] "const" LOWER_IDENT ":" type_expr "=" expr ";"
202+
```
203+
204+
A top-level `const` binding compiles to an immutable WebAssembly global. The
205+
initializer expression must reduce to a Wasm constant expression (a literal
206+
or a constant arithmetic combination thereof); non-constant initializers are
207+
not yet supported by the linear-memory backend.
208+
209+
Both function names and const names are registered in the same codegen name
210+
environment so that later top-level declarations may refer to either kind of
211+
binding by name. See §8 (*Codegen Module Environment*) for the encoding and
212+
the current single-pass population order.
213+
214+
### 2.10 Extern Declarations
215+
216+
```ebnf
217+
extern_fn_decl = [visibility] "extern" "fn" LOWER_IDENT
218+
[type_params] "(" [param_list] ")"
219+
["->" type_expr] ["/" effects] ";"
220+
extern_type_decl = [visibility] "extern" "type" UPPER_IDENT
221+
[type_params] ";"
222+
```
223+
224+
`extern fn` declares a function whose implementation is supplied by the host
225+
environment at link time. The linear-memory WebAssembly backend lowers each
226+
`extern fn` to an `(import "env" "<name>" (func …))` entry; the import slot
227+
is registered in the codegen name environment so call sites resolve through
228+
`call k` exactly as for locally-defined functions (see §8).
229+
230+
`extern type` declares an opaque, host-provided type. It carries no runtime
231+
representation and generates no Wasm artifact; the typechecker treats the
232+
name as a nominal opaque type whose internal structure is unknown.
233+
234+
Both forms are terminated by `;` and carry no body.
235+
197236
## 3. Type System
198237

199238
### 3.1 Judgement Forms
@@ -524,6 +563,83 @@ Compiles to (ownership removed):
524563
(call $close (local.get $file)))
525564
```
526565

566+
## 8. Codegen Module Environment
567+
568+
This section describes how the WebAssembly code generator (`lib/codegen.ml`)
569+
builds its name environment. It is implementation documentation aimed at
570+
contributors; the language semantics are fully specified in §2–4.
571+
572+
### 8.1 Name Environment (`func_indices`)
573+
574+
The codegen context maintains a single association list
575+
576+
```ocaml
577+
func_indices : (string * int) list
578+
```
579+
580+
that maps every top-level name visible at later declaration sites to an
581+
integer key. Two distinct kinds of binding share this table:
582+
583+
| Source declaration | Key value | Meaning |
584+
|--------------------|-----------|---------|
585+
| `fn f(…) { … }` | `k ≥ 0` | WebAssembly function index (imports + defined functions, combined) |
586+
| `const C: T = e` | `-(g + 1)`, where `g` is the global's index in the Wasm `globals` vector | Negative sentinel reserved for constants |
587+
588+
Sign-based partitioning is deliberate: `k ≥ 0` decodes directly as a Wasm
589+
`funcidx`, and `k < 0` recovers the global index as `g = -(k + 1)`. A
590+
single integer per name keeps the lookup uniform across both kinds of binding.
591+
592+
**Population.** Top-level declarations are visited in source order by
593+
`gen_decl`, which is folded over `prog.prog_decls` from `generate_module`.
594+
The relevant cases are:
595+
596+
- `TopFn fd` with `fd.fd_body <> FnExtern` — picks the next Wasm function
597+
index (`import_func_count ctx + List.length ctx.funcs`), registers
598+
`(fd.fd_name.name, func_idx)` in `func_indices` *before* generating the
599+
body so the body may recursively refer to its own name, then appends the
600+
emitted function to `ctx.funcs`.
601+
- `TopFn fd` with `fd.fd_body = FnExtern` — emits a Wasm import (module
602+
`"env"`, name `fd.fd_name.name`) and registers
603+
`(fd.fd_name.name, import_func_idx)` in `func_indices`, where
604+
`import_func_idx` is the number of imports before adding this one. No
605+
function body is generated. See §8.2.
606+
- `TopConst tc` — generates the global initializer, appends the global to
607+
`ctx.globals`, then registers `(tc.tc_name.name, -(global_idx + 1))` in
608+
`func_indices`.
609+
610+
Because population is strictly single-pass and in declaration order,
611+
forward references (to either functions or constants declared later in the
612+
file) are not supported by the current backend.
613+
614+
**Call-site lookup.** The `ExprApp (ExprVar id, _)` branch of `gen_expr`
615+
consults `func_indices` to translate a direct call into a Wasm `call k`
616+
instruction. Decoding the negative sentinel back to a `global.get`
617+
needed to make a bare `const` identifier usable inside another top-level
618+
declaration's body — is tracked as a known gap in issue #73. The encoding
619+
documented in this section is the data layout the fix relies on; the
620+
call-site decode path will land alongside that fix.
621+
622+
### 8.2 Extern Bindings
623+
624+
An `extern fn name(…) -> Ret;` declaration produces a `TopFn` with
625+
`fd_body = FnExtern`. Codegen lowers it to a Wasm import:
626+
627+
```
628+
(import "env" "<name>" (func (param …) (result …)))
629+
```
630+
631+
The resulting import function index is positive (it counts among the
632+
combined "imports + defined functions" view used by every other call
633+
site), so the name is registered in `func_indices` with `k ≥ 0` and call
634+
sites resolve through `call k` indistinguishably from a locally-defined
635+
function. The Wasm module name is currently hard-coded to `"env"`,
636+
matching the convention adopted by the Node-CJS host shim.
637+
638+
An `extern type Name;` declaration produces a `TopType` with
639+
`td_body = TyExtern`. It generates no Wasm artifact — opaque types are
640+
purely a typechecker concern — and the codegen `TopType TyExtern` case
641+
returns the unchanged context.
642+
527643
## Appendix: Grammar Reference
528644

529645
See the full specification at `affinescript-spec.md` for:

lib/codegen.ml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,14 @@ type context = {
2828
locals : (string * int) list; (** local variable name to index map *)
2929
next_local : int; (** next available local index *)
3030
loop_depth : int; (** current loop nesting depth *)
31-
func_indices : (string * int) list; (** function name to index map *)
31+
func_indices : (string * int) list;
32+
(** Top-level name environment shared by functions and constants.
33+
- [k >= 0]: Wasm function index (imports + defined functions).
34+
Populated by both [TopFn] (defined function) and
35+
[TopFn _ with fd_body = FnExtern] (host-supplied import).
36+
- [k < 0]: Constant (global): actual global index is [-(k+1)].
37+
Populated by [TopConst].
38+
Entries are inserted in source declaration order by [gen_decl]. *)
3239
lambda_funcs : func list; (** lifted lambda functions *)
3340
next_lambda_id : int; (** next lambda function ID *)
3441
heap_ptr : int option; (** global index for heap pointer, if initialized *)

0 commit comments

Comments
 (0)