Skip to content

Commit e6f2dbd

Browse files
docs: renew SPEC §8 as real spec; expand codegen-environment to full realisation (#106)
* docs: thin SPEC.adoc §8 to forward-reference codegen-environment.adoc PR #94 landed the full codegen environment reference as docs/specs/codegen-environment.adoc. §8 in SPEC.adoc is now redundant; replace its body with a one-paragraph forward-reference to avoid maintaining two copies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: promote SPEC §8 to real spec; expand codegen-environment to full realisation Reverses direction from the prior commit on this branch (which thinned §8 to a one-paragraph forward-reference). SPEC.adoc §8 — "Top-Level Binding Environment" - Renamed from "Codegen Module Environment" (which was wrong layer — it described implementation, not language). - Now a proper, target-agnostic specification: - 8.1 Top-Level Kinds — table of all eight constructors of `top_level`, what each binds, whether it has a runtime artefact. - 8.2 Declaration Order and Visibility — source-order processing, no forward references, recommended ordering. - 8.3 Identifier Resolution — local → variant tag → top-level lookup order with positional rules. - 8.4 Cross-Module Bindings — fn/extern fn flow, const restriction documented as known gap. - 8.5 Conformance Criteria — six MUST clauses (C1–C6) that any code generator has to satisfy. - Voice matches §1–§7 (judgement-form clauses, target-agnostic). codegen-environment.adoc — full WebAssembly realisation reference - Reframed as "WebAssembly Realisation of SPEC §8". - Added `ctx` record reproduction with field-by-field semantics. - Promoted `func_indices` encoding to its own §3 with a decode table. - §4 walks every `gen_decl` arm (TopFn, FnExtern legacy, TopExternFn, TopConst, TopType, TopExternType, TopEffect/Trait/Impl) in implementation order, naming concrete steps and side-effects. - §5 documents `gen_imports` end-to-end (load, find, intern, import, register) plus glob expansion. - §6 documents the actual ExprVar/ExprApp resolution paths. - §7 cross-walks the SPEC §8.5 criteria C1–C6 against codegen.ml sites. - §8 per-target matrix covers js/rust/ocaml/codegen_gc/codegen_node plus the loud-fail policy for the remaining backends. - §9 worked example traces a const-then-fn program through codegen. - §10 records #73 as CLOSED, since the negative-sentinel ExprVar arm at lib/codegen.ml:442–445 resolves it. Net effect: SPEC.adoc gains a real §8 instead of a placeholder pointer; codegen-environment.adoc becomes a usable implementation manual for contributors landing new back-ends. --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 7433881 commit e6f2dbd

2 files changed

Lines changed: 458 additions & 123 deletions

File tree

docs/specs/SPEC.adoc

Lines changed: 127 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -650,92 +650,148 @@ Compiles to (ownership removed):
650650
(call $close (local.get $file)))
651651
----
652652

653-
== 8. Codegen Module Environment
653+
== 8. Top-Level Binding Environment
654654

655-
This section describes how the WebAssembly code generator (`lib/codegen.ml`)
656-
builds its name environment. It is implementation documentation aimed at
657-
contributors; the language semantics are fully specified in §2–4.
655+
This section specifies how top-level declarations populate the binding
656+
environment that subsequent declarations and their bodies resolve against.
657+
It complements §3 (which gives the typing judgements) by fixing the
658+
*operational* rules every conforming code generator must obey, independent
659+
of any concrete target.
658660

659-
=== 8.1 Name Environment (`func_indices`)
661+
The WebAssembly realisation of these rules is documented at
662+
link:codegen-environment.adoc[`docs/specs/codegen-environment.adoc`].
660663

661-
The codegen context maintains a single association list
664+
=== 8.1 Top-Level Kinds
662665

663-
[source,ocaml]
664-
----
665-
func_indices : (string * int) list
666-
----
667-
668-
that maps every top-level name visible at later declaration sites to an
669-
integer key. Two distinct kinds of binding share this table:
666+
A program is an ordered sequence of top-level declarations. Each kind
667+
contributes to the binding environment as follows:
670668

671-
[cols="2,2,3", options="header"]
669+
[cols="1,2,2", options="header"]
672670
|===
673-
| Source declaration | Key value | Meaning
671+
| Declaration | Binds | Runtime artefact
672+
673+
| `fn f(…) { … }` (§2.4)
674+
| `f` as a function value
675+
| Yes — function in the target module
676+
677+
| `extern fn f(…) -> T;` (§2.10)
678+
| `f` as a function value
679+
| Yes — host-supplied import
680+
681+
| `const c: T = e;` (§2.9)
682+
| `c` as an immutable value of type `T`
683+
| Yes — same-type immutable cell
684+
685+
| `type T = …;` (§2.2)
686+
| `T` as a type
687+
| No — compile-time only
688+
689+
| `extern type T;` (§2.10)
690+
| `T` as an opaque type
691+
| No — compile-time only
674692

675-
| `fn f(…) { … }`
676-
| `k ≥ 0`
677-
| WebAssembly function index (imports + defined functions, combined)
693+
| `effect E { … }` (§2.7)
694+
| `E` and its operations
695+
| No — handled by effect lowering (§7)
678696

679-
| `const C: T = e`
680-
| `-(g + 1)`, where `g` is the global's index in the Wasm `globals` vector
681-
| Negative sentinel reserved for constants
697+
| `trait Tr { … }` (§2.8)
698+
| `Tr` as a trait
699+
| No — compile-time only
700+
701+
| `impl Tr for T { … }` (§2.8)
702+
| Trait dictionary for `(Tr, T)`
703+
| No — compile-time only
682704
|===
683705

684-
Sign-based partitioning is deliberate: `k ≥ 0` decodes directly as a Wasm
685-
`funcidx`, and `k < 0` recovers the global index as `g = -(k + 1)`. A
686-
single integer per name keeps the lookup uniform across both kinds of binding.
687-
688-
*Population.* Top-level declarations are visited in source order by
689-
`gen_decl`, which is folded over `prog.prog_decls` from `generate_module`.
690-
The relevant cases are:
691-
692-
- `TopFn fd` with `fd.fd_body <> FnExtern` — picks the next Wasm function
693-
index (`import_func_count ctx + List.length ctx.funcs`), registers
694-
`(fd.fd_name.name, func_idx)` in `func_indices` _before_ generating the
695-
body so the body may recursively refer to its own name, then appends the
696-
emitted function to `ctx.funcs`.
697-
- `TopFn fd` with `fd.fd_body = FnExtern` — emits a Wasm import (module
698-
`"env"`, name `fd.fd_name.name`) and registers
699-
`(fd.fd_name.name, import_func_idx)` in `func_indices`, where
700-
`import_func_idx` is the number of imports before adding this one. No
701-
function body is generated. See §8.2.
702-
- `TopConst tc` — generates the global initializer, appends the global to
703-
`ctx.globals`, then registers `(tc.tc_name.name, -(global_idx + 1))` in
704-
`func_indices`.
705-
706-
Because population is strictly single-pass and in declaration order,
707-
forward references (to either functions or constants declared later in the
708-
file) are not supported by the current backend.
709-
710-
*Call-site lookup.* The `ExprApp (ExprVar id, _)` branch of `gen_expr`
711-
consults `func_indices` to translate a direct call into a Wasm `call k`
712-
instruction. Decoding the negative sentinel back to a `global.get` —
713-
needed to make a bare `const` identifier usable inside another top-level
714-
declaration's body — is tracked as a known gap in issue #73. The encoding
715-
documented in this section is the data layout the fix relies on; the
716-
call-site decode path will land alongside that fix.
717-
718-
=== 8.2 Extern Bindings
719-
720-
An `extern fn name(…) -> Ret;` declaration produces a `TopFn` with
721-
`fd_body = FnExtern`. Codegen lowers it to a Wasm import:
706+
=== 8.2 Declaration Order and Visibility
722707

723-
[source]
708+
Top-level declarations are processed in source order. The binding
709+
environment in scope when a declaration is processed contains exactly
710+
the names of declarations that *precede* it in source order, together
711+
with the names introduced by the module's `use` clauses (§8.4).
712+
713+
A reference to a name not yet bound at its use site is a static error:
714+
715+
[source,affinescript]
724716
----
725-
(import "env" "<name>" (func (param …) (result …)))
717+
fn use_pi() -> Float { pi } // ERROR: `pi` not yet declared
718+
const pi: Float = 3.141592;
719+
----
720+
721+
Reordering resolves it:
722+
723+
[source,affinescript]
726724
----
725+
const pi: Float = 3.141592;
726+
fn use_pi() -> Float { pi } // OK: `pi` is in scope here
727+
----
728+
729+
The recommended source ordering — types, effects, constants, traits,
730+
impls, functions — is sufficient (though not necessary) to avoid every
731+
forward-reference error.
732+
733+
=== 8.3 Identifier Resolution
734+
735+
A bare identifier `x` in expression position resolves by the following
736+
lookup order, in the context where the expression appears:
737+
738+
. Local bindings introduced by enclosing `let`, lambda parameters, or
739+
function parameters.
740+
. Variant constructors of any in-scope `enum` type, resolved to their
741+
tag (§2.2).
742+
. Top-level bindings: `fn`, `extern fn`, and `const` names registered
743+
under §8.1.
744+
745+
A call expression `f(args)` resolves `f` against the same environment.
746+
A name bound to a `const` may appear in expression position only; a
747+
name bound to a `fn` or `extern fn` may appear in either expression or
748+
call position. The well-formedness of these positions is established by
749+
the type system (§3); a backend may rely on the typechecker having
750+
rejected ill-positioned references.
751+
752+
=== 8.4 Cross-Module Bindings
753+
754+
Imports (`use M::{…}`, `use M::*`) extend the binding environment of
755+
the importer with the public top-level names of the imported module,
756+
under the alias chosen by the import form (§2.1). The order in which
757+
import-introduced names enter the environment is *before* every
758+
local top-level declaration of the importer.
759+
760+
Status of cross-module flow at v0.1:
761+
762+
* `fn` and `extern fn` items flow across module boundaries.
763+
* `const` items do not yet flow across module boundaries; a `const`
764+
declared in module `M` and named by `use M::{c}` is a known
765+
restriction, not a language-level prohibition.
766+
767+
=== 8.5 Conformance Criteria
768+
769+
A conforming code generator MUST:
770+
771+
C1:: Process top-level declarations in source order.
772+
773+
C2:: Register a top-level name in its environment *before* generating
774+
the body of any later declaration that may reference it.
775+
776+
C3:: For each runtime-bearing declaration (`fn`, `extern fn`, `const`)
777+
emit an artefact whose denotation matches §2 and §3 — functions are
778+
first-class values, constants are immutable cells of their declared
779+
type.
780+
781+
C4:: For each compile-time declaration (`type`, `extern type`, `effect`,
782+
`trait`, `impl`) record sufficient information in the environment for
783+
subsequent declarations to typecheck and lower correctly, without
784+
necessarily emitting any target artefact.
785+
786+
C5:: Report a static error (rather than emit a malformed module) when
787+
an identifier escapes its lexical scope or names a binding not yet in
788+
the environment under C1–C2.
727789

728-
The resulting import function index is positive (it counts among the
729-
combined "imports + defined functions" view used by every other call
730-
site), so the name is registered in `func_indices` with `k ≥ 0` and call
731-
sites resolve through `call k` indistinguishably from a locally-defined
732-
function. The Wasm module name is currently hard-coded to `"env"`,
733-
matching the convention adopted by the Node-CJS host shim.
790+
C6:: If a binding kind is unsupported by the target, raise
791+
`UnsupportedFeature` at the declaration site, never silently drop it.
734792

735-
An `extern type Name;` declaration produces a `TopType` with
736-
`td_body = TyExtern`. It generates no Wasm artifact — opaque types are
737-
purely a typechecker concern — and the codegen `TopType TyExtern` case
738-
returns the unchanged context.
793+
Implementations MAY use any internal representation for the binding
794+
environment; C1–C6 fix what is observable, not how it is stored.
739795

740796
== Appendix: Grammar Reference
741797

0 commit comments

Comments
 (0)