From a2c4906bcefef3301a3b6ad9e129bb491401d486 Mon Sep 17 00:00:00 2001 From: sstefdev Date: Mon, 25 May 2026 18:35:18 +0200 Subject: [PATCH] spec: migrate RFCs to spec/rfcs/ with template + lifecycle doc (closes #66) --- CHANGELOG.md | 8 ++ spec/README.md | 9 +- spec/rfcs/0000-template.md | 72 ++++++++++++++ spec/rfcs/0001-memory-model.md | 120 ++++++++++++++++++++++++ spec/rfcs/0002-extern-host-abi.md | 57 +++++++++++ spec/rfcs/0003-project-metadata.md | 77 +++++++++++++++ spec/rfcs/0004-cross-engine-state.md | 74 +++++++++++++++ spec/rfcs/0005-third-party-protocols.md | 104 ++++++++++++++++++++ spec/rfcs/README.md | 93 ++++++++++++++++++ 9 files changed, 610 insertions(+), 4 deletions(-) create mode 100644 spec/rfcs/0000-template.md create mode 100644 spec/rfcs/0001-memory-model.md create mode 100644 spec/rfcs/0002-extern-host-abi.md create mode 100644 spec/rfcs/0003-project-metadata.md create mode 100644 spec/rfcs/0004-cross-engine-state.md create mode 100644 spec/rfcs/0005-third-party-protocols.md create mode 100644 spec/rfcs/README.md diff --git a/CHANGELOG.md b/CHANGELOG.md index d90e952..35075e9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,14 @@ Prose references a version as `v0.X.Y`; headings stay bare `[0.X.Y]`. ### Added +- RFC process. `spec/rfcs/` directory with a README documenting the lifecycle (draft → accepted → implemented / superseded / rejected), a `0000-template.md` for new RFCs, and the five existing RFCs migrated into structured files with YAML frontmatter (number, title, status, authors, tracking issue, created date): + - `0001-memory-model.md` (tracking #42) + - `0002-extern-host-abi.md` (tracking #55) + - `0003-project-metadata.md` (tracking #45) + - `0004-cross-engine-state.md` (tracking #46) + - `0005-third-party-protocols.md` (tracking #65) + + GitHub issues stay as the discussion thread; RFC files are the source of truth for proposal text. `spec/README.md` updated with a pointer to the RFC index and the lifecycle. Closes #66. - Subsystem keys in the chain manifest accept arbitrary identifiers, not just the five stdlib axes (`consensus`, `gas`, `state`, `exec`, `da`). A chain can declare any axis it cares about as a first-class subsystem: `privacy: GrothProver`, `mev: FlashbotsBundler`, etc. Lifts the parser-layer gatekeeping that was blocking third-party protocol extensions at the chain layer. Closes #64. - Parser test reworked: `test_error_unknown_subsystem_key` (which asserted "unknown:" was rejected) replaced by `test_arbitrary_subsystem_key_parses` (asserts `privacy: ...` works) plus `test_subsystem_key_can_still_be_stdlib_keyword` (regression guard that `consensus:`, `state:`, etc. still parse the same way). - `spec/grammar.ebnf` updated: `SubsystemKey = IDENT` instead of the enumerated five names; comment explains the stdlib axes are still conventions, just not parser-level requirements. diff --git a/spec/README.md b/spec/README.md index dacc8fa..37b4a40 100644 --- a/spec/README.md +++ b/spec/README.md @@ -15,10 +15,11 @@ The spec evolves alongside the compiler. Sections marked **stable** match what t | [Protocols: State](protocols/state.md) | draft (issue [#8](https://github.com/cleave-lang/cleave/issues/8)) | First full write-up landed; iterating | | [Protocols: DataAvailability](protocols/da.md) | draft (issue [#16](https://github.com/cleave-lang/cleave/issues/16)) | First full write-up landed; iterating | | [Effect system](effects.md) | draft (issue [#9](https://github.com/cleave-lang/cleave/issues/9)) | First full write-up landed; iterating | -| Module bodies (state, gas, fn, effect) | planned | Parser does not yet handle these | -| Type system | planned | No type checker exists yet | -| ABI (WASM hostcalls) | planned | Codegen lands in v0.3 | -| Standard library reference | planned | First impls land in v0.2 | +| [ABI (WASM hostcalls)](abi/wasm.md) | stable | Hostcall surface that codegen targets and the runtime implements | +| [RFCs (active proposals)](rfcs/README.md) | varies | See `rfcs/` for active design proposals; structure + lifecycle documented in the directory README | +| Module bodies (state, gas, fn, effect) | stable | Parser, type checker, codegen all handle these now | +| Type system | stable (basic) | Primitive types + fn types + opaque generics; sum types gated on RFC #42 | +| Standard library reference | planned | First impls land in v0.5+ (#53 consensus, #54 state) | ## Versioning diff --git a/spec/rfcs/0000-template.md b/spec/rfcs/0000-template.md new file mode 100644 index 0000000..c3ec45a --- /dev/null +++ b/spec/rfcs/0000-template.md @@ -0,0 +1,72 @@ +--- +rfc: 0000 +title: "RFC template (copy this for new RFCs)" +status: draft +authors: ["Your Name"] +tracking: https://github.com/cleave-lang/cleave/issues/NNN +created: YYYY-MM-DD +--- + +# Summary + +One paragraph: what does this RFC propose? Stated as a noun phrase ("A type-level effect system for...") rather than a verb ("We should add..."). Someone reading just this paragraph should be able to decide whether the RFC is relevant to them. + +# Motivation + +Why does this need to happen? What's broken / missing / suboptimal without it? Concrete examples beat abstract claims. Link to issues, prior discussions, real bugs that this would prevent. + +# Design + +The proposed change. Long-form. Sections as needed. + +## Sub-design topics + +Break into subsections when there's structure: API surface, type rules, runtime semantics, error messages, etc. Show code examples in the language being designed. + +## What changes externally + +What does a developer using Cleave see differently? New syntax? New error messages? Different runtime behavior? Different gas costs? + +## What changes internally + +What does the compiler / runtime have to do differently? Touch which files? What's the migration path for existing code? + +# Alternatives + +What other designs were considered? Why is this one preferred? + +- **Alternative A**: brief description, pros, cons +- **Alternative B**: same shape +- **Do nothing**: what happens if we don't do this? (Always worth considering.) + +# Drawbacks + +The cost of saying yes. There always is one. Be honest: + +- Implementation complexity +- Audit surface +- Developer-facing complexity +- Risk of being wrong (and the cost to revert) +- Interaction with other planned work + +# Open questions + +Things the author can't decide alone or doesn't know yet. Each is a bullet, ideally with an open-ended question mark. Resolved questions move to the Design section; rejected directions move to Alternatives. + +# Reversibility + +How hard is this to undo if it turns out wrong? + +- **High**: a flag flip, a backward-compatible deprecation +- **Medium**: a breaking compiler version + migration tool +- **Low**: every contract on every chain has to remigrate; in practice we don't reverse it + +State this honestly. Low-reversibility decisions deserve longer review windows. + +# Related work + +Prior art in other languages / chains / academic papers. Cite specifically; vague references don't help. + +# Implementation roadmap + +If accepted, what happens next? Sub-issues, ordering, dependencies. Keep this light; the real plan emerges from implementation discussion. diff --git a/spec/rfcs/0001-memory-model.md b/spec/rfcs/0001-memory-model.md new file mode 100644 index 0000000..ab2186d --- /dev/null +++ b/spec/rfcs/0001-memory-model.md @@ -0,0 +1,120 @@ +--- +rfc: 0001 +title: "Memory model for Cleave (ownership, GC, escape hatches)" +status: draft +authors: ["Cleave Labs"] +tracking: https://github.com/cleave-lang/cleave/issues/42 +created: 2026-05-23 +--- + +**Status:** draft. Open for discussion before any implementation work. + +Cleave has no memory model today. The current compiler / runtime sidesteps the question entirely because the language surface is so small (only integers, bool, str, char, type params, record literals; no pointers, references, or allocation primitives). That has to change before we can: + +- Implement standard-library collections (\`Vec\`, \`Map\`, \`HashMap\`) which the protocol RFCs (#6, #7, #8, #16) reference +- Compile \`Result\` / \`Option\` (codegen #17 deliberately rejected these as out of scope) +- Self-host the compiler (#20, the v0.4 milestone) +- Land any non-trivial application module + +This RFC picks the memory model. Once picked, every contract on every chain built with Cleave depends on the choice; changing it later breaks the world. So it's worth being deliberate. + +## Hard constraints + +Blockchain languages have constraints that rule out most of the conventional design space: + +1. **Deterministic execution.** Two nodes running the same transaction against the same state must produce byte-identical post-state. Rules out non-deterministic GC pause patterns. +2. **Gas metering.** Every allocation must charge gas with bounded worst-case cost. Rules out implementations whose cost is opaque to the gas accountant. +3. **Bounded execution.** A transaction has finite gas. Memory growth must be metered against that budget so a malicious contract cannot exhaust node memory. +4. **Persistent state round-trips.** Anything written to chain state must serialize. Rules out in-memory cycles in stored data structures. +5. **No security footguns at the language layer.** Memory corruption in shared chain state is catastrophic. One bug, billions lost. The cost of a memory bug here is not comparable to a bug in a normal program. + +## Design space + +| Approach | Determinism | Gas accounting | Dev ergonomics | Risk | +|---|---|---|---|---| +| **GC (mark + sweep, RC, generational)** | hard to make deterministic across nodes; pause times must be metered | gas charge per allocation is doable; gas charge per collection cycle is awkward | familiar from JS/Python/Go | implementation complexity, audit surface | +| **Linear / affine ownership (Rust, Move)** | trivially deterministic; no GC | every alloc has a known site, gas is straightforward | steeper learning curve; lifetime annotations | borrow-checker frustration; UX cost | +| **Per-transaction arena** (auto-free at tx end) | trivially deterministic | one arena = one gas allocation pool | very simple model | cannot share heap across txs; collections that outlive a tx are awkward | +| **All-by-value with copy-on-write** | trivially deterministic | gas scales with data size on every copy | simple to teach | quadratic cost on large structures | +| **Reference-counted, cycles statically prohibited** | deterministic; RC ops are bounded | gas per inc/dec | middle ground | type system has to forbid cycles, which is a non-trivial restriction | + +## Recommended approach + +**Linear / affine ownership, no opt-out, with a typed \`extern host\` escape hatch for raw performance.** + +The reasoning: + +### Why ownership, not GC + +The Sui Move + Aptos Move ecosystem has demonstrated that linear / affine type systems work for smart contracts at scale, and the model maps cleanly onto deterministic gas. GC is theoretically possible (the JVM family proved that determinism is achievable with care), but it adds an audit surface that we should not pay for at v0. We are not Solidity; we do not need to inherit the EVM's "everything is a 32-byte word" model. We are not the JVM; we do not need to inherit JIT-friendly GC complexity. + +The ergonomic cost of ownership is real, but on a blockchain the cost of memory bugs is far higher than the cost of a stricter type system. Stefan should treat the borrow checker as a feature, not a tax. + +### Why no opt-out + +\"Let me manage memory myself\" almost always translates to \"I want raw pointer arithmetic to be fast.\" In a gas-metered environment, the upper bound on that speedup is whatever gas allows. The downside is severe: memory corruption in shared chain state means an exploit pattern that no other system in the world has to defend against, because no other system has \"a million people send signed transactions to your VM\" as its threat model. + +Other languages that exposed memory escape hatches: + +- **C**: every CVE-2024-* shows the cost. +- **Rust \`unsafe\`**: works because Rust has external auditors and a culture of writing wrapper crates. We do not yet have that culture. +- **Solidity inline assembly**: the source of approximately every published smart-contract exploit since 2019. + +The pattern that does work in a blockchain context: type-safe escape hatches at well-defined boundaries. + +### What the escape hatch should be + +\`\`\`cleave +extern host fn blake3(input: bytes) -> [u8; 32] +extern host fn bls12_381_aggregate(sigs: Vec) -> Signature +\`\`\` + +A declaration that says \"this function is provided by the host runtime, runs outside the WASM gas meter, has a known typed signature, and the host is responsible for implementing it safely.\" Cryptographic primitives, hash functions, signature verification, pairing operations, big-integer arithmetic. All the things people would want unsafe memory ops for in conventional languages are better served as native host functions in a blockchain. + +This pattern is established: Substrate calls them \"host functions,\" Solana calls them \"syscalls,\" Cosmos SDK has \"keepers.\" Cleave should adopt it under the \`extern host\` name (or similar; bikeshed-able). + +## Open questions + +1. **Stack vs heap distinction.** Do small types (u64, bool, fixed-size arrays) live exclusively on the WASM operand stack? Larger types in linear memory? Cleave-managed heap on top of linear memory? +2. **String representation.** UTF-8 byte slice? Length-prefixed? Null-terminated? Borrowed-by-default vs owned? +3. **Collection types in the stdlib.** Are \`Vec\`, \`HashMap\`, \`Map\` part of the language (built-in syntax) or stdlib types (defined in \`.cv\` once self-hosting lands)? +4. **Move vs copy semantics.** Does \`let x = y\` move \`y\` (Rust default) or copy \`y\` (most languages)? Move-by-default is the rigorous choice; copy-by-default is the gentle one. +5. **References / borrows.** Do we want a borrow checker (Rust-style) or are linear types enough (Move-style)? Linear is simpler and works for most contract patterns. +6. **Drop semantics.** When does memory get freed? End of scope? End of transaction? At an explicit \`drop()\` call? +7. **Persistent state types.** What's the contract between in-memory values and \`state\` slots? Do all serializable types go through a single trait? +8. **Cross-engine memory.** When a Cleave WASM module reads a value written by a Solidity contract, how does the type system understand the byte layout? +9. **Failure modes.** What happens on out-of-memory? Trap? Allocate-result-Result? +10. **\`extern host\` security model.** Who can declare host functions? Are they per-chain (manifest declares allowed extensions) or per-module? How is the host-function ABI versioned? + +## Implementation roadmap + +If we land on \"ownership + extern host,\" the work breaks into: + +1. **Type system: linearity tracking.** Extend the type checker (#34 surface) with use-counting per binding. Diagnostic when a moved value is used again. Pure mechanical pass; no codegen changes. +2. **AST + grammar: \`drop\` keyword, move/borrow annotations** (if we add them). Maybe \`&T\` for borrows; maybe Move-style implicit. RFC-level decision. +3. **Codegen: heap allocation.** \`bump-allocate\` primitive in linear memory; \`drop\` emits a free or a no-op depending on allocator. Most likely choice: per-call arena that resets between calls, plus per-instance arena for state slots. +4. **Stdlib: \`Vec\`, \`String\`, \`Bytes\`.** Built on the allocator. Maybe defined in \`.cv\` once self-hosting lands; until then, defined as compiler intrinsics. +5. **\`extern host\` syntax + ABI.** Grammar addition. Codegen lowers to WASM imports under a new namespace (e.g. \`host\` instead of \`env\`). Runtime exposes a registration API. + +Estimate: each of (1) through (5) is its own issue, each comparable in size to v0.3's parser-extension PR. Total: ~3000-5000 LoC, several weeks of work, plus design churn from this RFC. + +## Reversibility + +**Low.** Once contracts ship against a memory model, changing it requires either: + +- A breaking compiler version bump and a chain hard fork (every deployed contract recompiles, every state slot remigrates) +- Multi-version support in the runtime (load v1 and v2 contracts side by side, each with its own runtime semantics) + +This is why the RFC matters more than feature issues. Worth absorbing several weeks of discussion before committing. + +## Related work to read before commenting + +- Move's \"Resource\" type and Sui's \"object-centric\" extension +- Rust's RFC #1444 (linear types attempt) and the reasoning for why borrow-checking won +- The \"What every systems programmer should know about concurrency\" line of thinking applied to multi-validator consensus +- Solidity's inline assembly + the post-mortems on every assembly-related exploit since 2019 +- Substrate's host function ABI design (\`pallet-contracts\` runtime API) + +## Discussion + +Comment thread on this issue. RFC stays draft for at least two weeks before any \"recommended approach\" gets promoted to a binding decision. diff --git a/spec/rfcs/0002-extern-host-abi.md b/spec/rfcs/0002-extern-host-abi.md new file mode 100644 index 0000000..88819ca --- /dev/null +++ b/spec/rfcs/0002-extern-host-abi.md @@ -0,0 +1,57 @@ +--- +rfc: 0002 +title: "extern host ABI for native function declarations" +status: draft +authors: ["Cleave Labs"] +tracking: https://github.com/cleave-lang/cleave/issues/55 +created: 2026-05-25 +--- + +Promised in the memory-model RFC (#42) as the escape hatch for raw-performance primitives (crypto, hashing, signature verification) that should not run inside the WASM gas meter. + +## Status + +Draft. Open for discussion. + +## Context + +Smart-contract platforms universally need a host-function mechanism for expensive primitives. Substrate calls them "host functions," Solana calls them "syscalls," EVM has "precompiles," Cosmos SDK has "keepers." Each ecosystem has reinvented this with subtly different semantics. + +Cleave needs one too. The memory-model RFC argues it should be the ONLY escape from the safe-language envelope (no `unsafe` block, no raw pointer ops). That puts more weight on getting the ABI right. + +## Strawman + +```cleave +extern host fn blake3(input: bytes) -> [u8; 32] +extern host fn bls12_381_aggregate(sigs: Vec) -> Signature +extern host fn ec_recover(message: [u8; 32], v: u8, r: [u8; 32], s: [u8; 32]) -> Address +``` + +- `extern host fn` is a declaration, not a definition. The function body lives in the host (Rust / C / etc.). +- Codegen lowers calls to WASM imports under a `host` namespace (vs the `env` namespace used by `state_get` / `state_set`). +- The runtime crate exposes a registration API: `runtime.register_host_fn("blake3", impl)`. +- Per-chain manifest declares which host functions are allowed: `host_functions: [blake3, bls12_381_aggregate]`. + +## Questions + +1. **Who declares the allow-list?** Per-chain (chain manifest) or per-module (module declaration)? +2. **Versioning.** If `blake3`'s implementation changes, how does the runtime know it's the new version? +3. **Determinism.** Host functions must be deterministic across nodes. How do we enforce that the registered implementation actually is? +4. **Gas accounting.** Host functions run outside the WASM gas meter. Do they have their own gas surface? Charged per-call? Per byte of input? +5. **Type marshaling.** WASM linear memory vs Cleave values. Vec, String, bytes need ABI conventions. +6. **Trust model.** Host functions can do anything the host can do. Who audits them? Is there a "stdlib host function set" that's blessed? +7. **Composition.** Can a host function call back into the runtime to read state? (Probably no, but worth being explicit.) + +## Reversibility + +Medium. Once contracts call `host::blake3(...)`, the ABI for `blake3` is frozen. Adding new host functions is easy; changing existing ones requires a chain hard fork. + +## Related work + +- Substrate `pallet-contracts` runtime API +- Solana SBPF syscalls (`sol_log`, `sol_keccak256`, `sol_invoke_signed`) +- EVM precompiles (the 0x01..0x0a address space) +- WebAssembly Component Model (the longer-term industry direction) + +Discussion: comment thread. RFC stays draft for at least two weeks before any binding decision. + diff --git a/spec/rfcs/0003-project-metadata.md b/spec/rfcs/0003-project-metadata.md new file mode 100644 index 0000000..f165bdd --- /dev/null +++ b/spec/rfcs/0003-project-metadata.md @@ -0,0 +1,77 @@ +--- +rfc: 0003 +title: "Project metadata format and build system for Cleave projects" +status: draft +authors: ["Cleave Labs"] +tracking: https://github.com/cleave-lang/cleave/issues/45 +created: 2026-05-25 +--- + +Open question from the project's earliest design conversations. Every developer-facing language needs an answer for: + +- How do I declare a project? +- How do I declare dependencies? +- How do I build it? +- Where do dependencies come from? + +This RFC lays out the design space. + +## Status + +Draft. Open for discussion. + +## Constraints unique to a chain language + +1. **Reproducibility is non-negotiable.** Two builds of the same source from the same dependencies must produce byte-identical WASM. A chain depends on it. +2. **Dependencies are part of the security model.** A compromised dependency means a compromised contract means user funds at risk. Higher bar than `npm install`. +3. **No global mutable registry of human-readable names.** A name like `cleave-stdlib` is too valuable a target. Either content-addressed (IPFS-style) or namespaced under verified publishers (Maven-style). + +## Strawman + +A `Cleave.toml` (or `cleave.toml`) at the project root: + +```toml +[project] +name = "my-token" +version = "0.1.0" +authors = ["alice@example.com"] +license = "Apache-2.0" + +[chain] +manifest = "chains/mainnet.cv" + +[dependencies] +stdlib = { content-hash = "blake3:abc..." } +my-utility = { git = "https://github.com/...", rev = "..." } +``` + +`cleavec build` reads it, resolves deps, compiles every `.cv` in `src/`, emits artifacts to `build/`. + +## Questions + +1. **TOML or something else?** TOML is the default but Cleave isn't dogmatic about it. We discussed this very early in the project; the conversation didn't conclude. +2. **Lockfile.** Committed `Cleave.lock` with content-addressed pins for reproducibility? +3. **Workspace.** Multi-package projects? Sub-package dependencies? +4. **Build artifacts.** Per-module `.wasm` per chain? One bundle? Cross-chain compile? +5. **Build script.** Hook for codegen / asset processing during build (like Rust's `build.rs`? Probably no, since determinism is paramount. +6. **Package registry.** Necessary? Or do we stay git-only forever? +7. **Vendoring.** Inline-copy dependency source for audit? Default-off, opt-in? +8. **Conditional compilation.** Per-chain features (e.g. EVM-enabled vs Wasm-only)? Or always-compile-everything? + +## Related work + +- Cargo (Rust): the closest reference +- Move's +- Solana's +- Foundry's +- Hardhat's + +## Out of scope for this RFC + +- A package registry implementation (its own RFC; needed only if we go that route) +- IDE integration (LSP, project discovery) +- CI templates + +Discussion: comment thread. RFC stays draft for at least two weeks. +EOF +) diff --git a/spec/rfcs/0004-cross-engine-state.md b/spec/rfcs/0004-cross-engine-state.md new file mode 100644 index 0000000..ed3e5ed --- /dev/null +++ b/spec/rfcs/0004-cross-engine-state.md @@ -0,0 +1,74 @@ +--- +rfc: 0004 +title: "Cross-engine state sharing (one state root across WASM and EVM)" +status: draft +authors: ["Cleave Labs"] +tracking: https://github.com/cleave-lang/cleave/issues/46 +created: 2026-05-25 +--- + +The runtime ships two engines (Wasmtime + REVM, as of v0.3). Each has its own state backend today. The original #19 issue specified "State writes from REVM and the Cleave VM coexist in the same state tree (the chain has one state root)" as an acceptance criterion that was deferred. + +This RFC picks the design before the implementation issue starts. + +## Status + +Draft. + +## The shape of the problem + +WASM and EVM have different storage semantics: + +| | WASM (Cleave) | EVM | +|---|---|---| +| Key | u32 slot index | (Address, 32-byte storage key) | +| Value | u64 (with widening) | 32-byte word | +| Scope | per-module | per-contract | +| Identity | source-declared slot | deterministic deploy address | + +A unified state needs an answer for: + +1. How do WASM modules and EVM contracts address each other's state? +2. Can a WASM module read an EVM contract's storage and vice versa? +3. Does the state root commit both engines uniformly? + +## Design candidates + +### (a) Two namespaces, single root + +``` +state_root = merkle(wasm_state || evm_state) +``` + +Both engines have their own KV; the chain commits them under a single root for hash purposes but engines don't see each other's data. Simple; "one state root" claim is true; but cross-engine reads still need a bridge layer. + +### (b) Unified KV addressed by `(engine, address, key)` + +Both engines write through one KV. WASM module identity maps to a synthetic Address. State keys are unified bytes. Engines can read each other's storage with the right `(engine, address, key)` tuple. + +### (c) EVM-shaped unified state, WASM is the second-class citizen + +Make WASM `state_set(slot)` desugar to `evm_set(THIS_MODULE_ADDR, slot_as_word, value_as_word)`. Universal store; WASM and EVM speak the same storage substrate. + +### (d) WASM-shaped unified state, EVM is the second-class citizen + +Reverse of (c). Less likely since the EVM storage shape (32-byte words, address-keyed) is more general. + +## Questions + +1. **Pick a candidate.** Which of (a)-(d), or a fifth, do we want? +2. **Synthetic addresses.** If WASM modules need addresses, how are they assigned? Hash of source? Deploy-order index? +3. **Cross-engine call semantics.** If a WASM module CAN read EVM storage, can it also call an EVM contract directly? Or only via host functions? +4. **State proofs.** Do light clients verify each engine's state separately, or unified? +5. **Migration.** v0.3 chains shipped with split state. Upgrading them to unified state is its own headache. +6. **Performance.** Unified KV adds an indirection. Bench impact? + +## Reversibility + +Low. Once contracts depend on a specific cross-engine convention, changing it requires a hard fork. + +## Related + +- Companion implementation issue (separate) +- Memory model RFC (#42) — interacts with state value shapes +- DataAvailabilityProtocol RFC (#16, closed) — state and DA both go through commit, must compose diff --git a/spec/rfcs/0005-third-party-protocols.md b/spec/rfcs/0005-third-party-protocols.md new file mode 100644 index 0000000..84d135a --- /dev/null +++ b/spec/rfcs/0005-third-party-protocols.md @@ -0,0 +1,104 @@ +--- +rfc: 0005 +title: "Third-party protocol extension story (publish, discover, audit, version)" +status: draft +authors: ["Cleave Labs"] +tracking: https://github.com/cleave-lang/cleave/issues/65 +created: 2026-05-25 +--- + +**Status:** draft. Open for discussion. + +Cleave's design promises chain composability: a chain manifest names protocol implementations for each subsystem, and any module satisfying the protocol can be plugged in. The standard library will ship `Tendermint` (consensus), `SparseMerkle` (state), `Multidim` (gas), `NativeDA` (da), and others. **The promise extends to third-party implementations too: anyone should be able to publish a `MyPoUA` consensus implementation that other chains can adopt.** + +Today nothing about that workflow exists. This RFC scopes what we need. + +## What this RFC covers + +The end-to-end story for "I built a custom consensus protocol; another team wants to use it on their chain": + +1. **Author** writes a Cleave module declaring `protocol MyPoUA implements ConsensusProtocol { ... }` +2. Author **publishes** it somewhere +3. Consumer **discovers** it +4. Consumer **adds it as a dependency** of their chain project +5. Consumer's chain manifest **references** it: `consensus: MyPoUA<...>` +6. The tooling **verifies** the implementation satisfies `ConsensusProtocol` at compile time +7. At chain launch, the runtime **loads** the implementation alongside the rest + +Each step has open questions. + +## Open questions + +### 1. Publishing + +- Git URLs (Cargo-style): `MyPoUA = { git = "https://github.com/author/myorg", rev = "abc123" }`. Pros: no registry needed, content-addressed via git SHA. Cons: discovery is hard. +- Content-addressed registry (IPFS / Sigstore / dedicated). Pros: discovery + signing built in. Cons: infrastructure to maintain. +- Dedicated registry with namespacing (npm / crates.io). Pros: human-friendly names. Cons: name squatting, governance, trust. + +### 2. Trust model + +A malicious consensus implementation can rewrite history. A malicious state-protocol implementation can silently lose data. A malicious DA implementation can withhold blocks. The blast radius of a bad third-party protocol module is catastrophic, far worse than a bad npm package. + +Possible models: + +- **Caveat emptor.** Chain operators audit before adopting. Same as today's "trust the developer" model that has produced every published smart-contract exploit. +- **Reproducible builds + signatures.** Authors sign releases; consumers pin to specific signed builds. Doesn't prevent malice but makes it attributable. +- **Trusted set.** A community-curated list of audited protocol implementations; chains can reference either trusted or arbitrary ones with a warning. +- **Formal verification gate.** Protocols ship with proofs of key properties (safety, liveness, etc.). High bar; eliminates a class of bugs but raises the cost of publication. +- **Sandboxing at runtime.** Even a malicious protocol implementation is constrained by what the runtime exposes. The host could limit syscall surface so the protocol cannot escape its lane. Works for some surface (no DB writes outside state) but not others (consensus controls block order by definition). + +### 3. Version pinning + upgrade + +A chain in production pins to specific protocol versions. When the author publishes an upgrade, how does the chain adopt it? + +- Manual: chain operators bump version in manifest, rebuild, hard-fork +- Automatic with consent: a governance vote upgrades a soft-coded version +- Never: once deployed, never upgrade + +Same questions for security patches. + +### 4. Interface satisfaction checking + +How does the compiler verify that `MyPoUA implements ConsensusProtocol` actually satisfies the protocol contract? Today the type checker treats `implements` as a structural assertion that the names match; it does not verify behavior. + +- Structural: check method signatures match +- Semantic with annotations: protocol declares postconditions, impl asserts they hold +- Runtime conformance tests: protocol ships a test suite, impl must pass +- Formal: impl ships proofs + +### 5. Multiple competing implementations + +If chain A uses `MyPoUA` and chain B uses `TheirPoUA`, both implementing `ConsensusProtocol`, do they interop? Probably not at the consensus layer (each chain has its own). But what about cross-chain bridges that span both? + +### 6. Deprecation + sunset + +A protocol implementation might be abandoned, get superseded, or be revealed to have a bug. How do consumers know? How do they migrate? + +## Strawman: minimum-viable v0 + +Pick the simplest answer to each question that does not foreclose better answers later: + +1. **Publishing**: git URLs only; no registry +2. **Trust**: caveat emptor + reproducible builds via the Cleave build system +3. **Versioning**: pin to git SHA in the chain's project manifest +4. **Interface**: structural check on method signatures (extension of #34 typecheck) +5. **Interop**: out of scope for v0 +6. **Deprecation**: signal via README and abandoned-tag conventions + +This is "what Cargo does, plus content-addressed pinning, minus a registry." Most of the open questions get deferred to "v0.5+ governance" decisions. + +## Reversibility + +High up-front. We can ship the strawman and gather data. Each later change (add a registry, add signing, add proofs) is a strict extension. + +## Related issues + +- #45 (RFC: project metadata + build system) — companion. The build system needs to understand third-party protocol modules. +- #53 (stdlib: ConsensusProtocol impl) — protocol satisfaction can only be checked once a real protocol exists +- #54 (stdlib: StateProtocol impl) — same +- The grammar work to lift subsystem-key hardcoding (separate issue) — prerequisite for new third-party subsystems beyond consensus/gas/state/exec/da + +## Discussion + +Comment thread. RFC stays draft until the strawman either solidifies or gets replaced. No timeline; this is a long-tail design effort. + diff --git a/spec/rfcs/README.md b/spec/rfcs/README.md new file mode 100644 index 0000000..605a539 --- /dev/null +++ b/spec/rfcs/README.md @@ -0,0 +1,93 @@ +# Cleave RFCs + +This directory holds Cleave's RFC (Request for Comments) documents. RFCs are how we propose, discuss, and record substantial changes to the language, runtime, and ecosystem before they land in code. + +If you have an idea that's small and uncontroversial (a one-line grammar fix, a bug, a new test), open a normal issue or PR. If it's load-bearing and reversible only with a hard fork (memory model, ABI, consensus interface), it's an RFC. + +## What gets an RFC + +Anything that satisfies one or more of: + +- Changes the language surface (grammar, types, semantics) +- Changes a stable ABI (hostcall surface, WASM module shape, storage layout) +- Changes the standard-library protocol contracts (consensus, gas, state, da, exec) +- Imposes a new build-time or runtime-time requirement on chain operators +- Locks in a decision that's expensive to revisit (security model, gas accounting, determinism guarantees) + +If you're unsure, ask in the tracking issue. Better to over-RFC than to ship a load-bearing decision without one. + +## How to open one + +1. **Open a tracking issue** in `cleave-lang/cleave` with the `design` label and `RFC: ` as the title. Describe the problem in 2–4 paragraphs and link to prior art if any. +2. **Discuss informally** in the issue thread to confirm the problem is real and the RFC route is the right shape. +3. **Open a PR** adding a file to this directory: copy `0000-template.md` to `NNNN-short-slug.md` (next available number). The PR body should link back to the tracking issue. +4. **Review** happens on the PR. The author keeps a `Status: draft` frontmatter line until the conversation reaches a decision. +5. **Acceptance**: a maintainer marks the RFC `Status: accepted`, merges the PR. Implementation work then references the RFC by number. + +A first-time author can skip step 3 if they're not comfortable with the PR workflow; we'll happily co-author from issue text. + +## Lifecycle + +| Status | Meaning | +|---|---| +| `draft` | Proposed, under discussion. The default for any new RFC. | +| `accepted` | Approved by maintainers. Implementation can begin. The RFC file is the source of truth; tracking-issue discussion may continue. | +| `implemented` | Code shipped. RFC stays in the repo as historical record. | +| `superseded` | Replaced by a later RFC. Frontmatter cites the successor. | +| `rejected` | Decided against. Stays in the repo as historical record so the same proposal does not re-emerge without context. | + +State transitions happen via a PR that updates the frontmatter. No automation. + +## Frontmatter format + +Every RFC starts with YAML frontmatter: + +```yaml +--- +rfc: 0001 +title: "Memory model for Cleave (ownership, GC, escape hatches)" +status: draft +authors: ["Cleave Labs"] +tracking: https://github.com/cleave-lang/cleave/issues/42 +created: 2026-05-23 +--- +``` + +Fields: + +- `rfc`: zero-padded sequence number, matches the filename +- `title`: human-readable, can change as the proposal evolves +- `status`: one of the values in the lifecycle table +- `authors`: list of names or handles; not enforced +- `tracking`: link to the GitHub issue that hosts discussion +- `created`: ISO-8601 date +- `superseded_by` / `supersedes`: optional, when applicable +- `implemented_in`: optional, link to merge commit / release tag once shipped + +## File layout + +``` +spec/rfcs/ + README.md # this file + 0000-template.md # copy this for new RFCs + NNNN-short-slug.md # one file per RFC +``` + +`spec/protocols/` continues to hold the standard-library protocol specs (consensus, gas, state, da, effects). Those are different: they're stable interface documents, not proposals. An RFC that proposes a NEW stdlib protocol goes through this directory first; once accepted and implemented, the stable spec lands in `spec/protocols/`. + +## Numbering + +Strict monotonic. Pick the next available number when you open the PR. If two PRs race, the later one rebases. + +## What an RFC is not + +- **Not a binding spec.** The RFC describes intent. The compiler and runtime are the spec. If they diverge, the RFC documents what we meant and the code documents what we shipped; reconciling them is its own follow-up. +- **Not a roadmap.** RFCs can sit in `draft` for months. Acceptance does not commit to an implementation timeline; that's tracked on the implementation issues. +- **Not a vote.** Maintainers make the final call. The RFC process is for surfacing arguments, not counting noses. + +## Related + +- [`spec/grammar.ebnf`](../grammar.ebnf): reference grammar +- [`spec/abi/wasm.md`](../abi/wasm.md): hostcall ABI +- [`spec/protocols/`](../protocols/): stable standard-library protocol specs +- [`spec/effects.md`](../effects.md): effect system reference