Skip to content

Opt-in schema evolution: #[msgpack(default)] and #[msgpack(allow_unknown_fields)]#37

Merged
nuskey8 merged 6 commits into
nuskey8:mainfrom
farhan-syah:feat/opt-in-schema-evolution
May 9, 2026
Merged

Opt-in schema evolution: #[msgpack(default)] and #[msgpack(allow_unknown_fields)]#37
nuskey8 merged 6 commits into
nuskey8:mainfrom
farhan-syah:feat/opt-in-schema-evolution

Conversation

@farhan-syah
Copy link
Copy Markdown
Contributor

@farhan-syah farhan-syah commented Apr 29, 2026

Why

I'm using zerompk as a wire/storage format in NodeDB - where I can set that schema needs to evolve over time — adding fields in newer writers, removing fields in newer readers — without forcing a synchronized deploy of every producer and consumer. zerompk's strict-by-default decoding is the right default (and the security section makes a good case for it), but it means any schema change is a hard break across the wire.

serde solves this with #[serde(default)] and #[serde(deny_unknown_fields)]. This PR adds the equivalent for zerompk, scoped narrowly so the strict path is preserved bit-for-bit and only opted-in users pay any runtime cost.

I've been running this on a fork (farhan-syah/zerompk) for a bit. Opening this PR in case it's useful upstream. Totally fine if it isn't a fit — happy to keep it on the fork or rework anything.

What's added

Two orthogonal opt-ins on #[msgpack(map)] structs:

Field-level #[msgpack(default)] — fills a missing key with Default::default(). Also accepts #[msgpack(default = "path::to::fn")] to call a named function, mirroring serde(default = "..."). Only affects missing-key handling; unknown keys still error (with the offending key name surfaced in KeyNotFound).

Struct-level #[msgpack(allow_unknown_fields)] — skips unknown keys via a new Read::skip_value() trait method (which respects MAX_DEPTH). Only affects unknown-key handling; missing required keys still error.

The two compose. Three resulting modes:

has default allow_unknown_fields decoder
no no strict check_map_len(N)byte-identical to current codegen
yes no read_map_len, fill missing, error on unknown
any yes read_map_len, fill missing, skip unknown

Design choices worth flagging

  • The two opt-ins are orthogonal on purpose. An earlier draft of mine had a single default attribute that implicitly switched the whole decoder into "tolerate unknown keys too" — but that bundles two unrelated decisions into one attribute. Splitting them matches serde's mental model and lets users pick the evolution direction they actually need.
  • Array-mode rejects default at compile time. Arrays have no field names, so silently accepting shorter/longer arrays would hide corruption rather than evolve schema. The error points at the offending field and suggests #[msgpack(map)].
  • Strict-path codegen is unchanged. Verified the emitted tokens are bit-identical for any struct that doesn't opt in. No perf cost for existing users.
  • Enum-variant fields reject default. It was previously a silent no-op (parsed but ignored in codegen) — now an error so users don't write code that looks like it does evolution but doesn't.
  • skip_value() is independently useful. It's exposed on the Read trait so non-derive consumers can use it (log parsers, proxies, partial decoders). Implemented on both SliceReader and IOReader, with the existing MAX_DEPTH guard preserved.

Tests

zerompk/tests/schema_evolution.rs (new) — 9 tests covering each cell of the matrix: defaults fill missing keys; defaults alone reject unknowns (with the correct key name); allow_unknown_fields alone skips extras; allow_unknown_fields alone still requires non-default keys; both compose; default = "path" form; strict mode rejects missing and extra keys; round-trip preservation.

All 158 existing + new unit/integration tests pass. Clippy clean. Existing fuzz targets (decode_no_panic, roundtrip) ran 30s each post-change without crashes.

Compatibility

  • No changes to the public API — only new opt-in attributes.
  • No new dependencies.
  • no_std preserved (uses core::default::Default).
  • Existing structs decode and encode identically.

Things I'm uncertain about / open to changing

  • Attribute name allow_unknown_fields — verbose. Could be lenient, evolve, or invert the polarity (deny_unknown_fields = false). Happy to rename.
  • Should skip_value() be a public trait method? I made it public because it's genuinely useful, but if you'd rather keep the trait surface minimal it can be pub(crate) and the derive can call it via a private path.
  • README updates — I added a small documentation section and softened the Security section's "always strict" wording to reflect that evolution is opt-in. Happy to drop or rephrase if you'd rather keep the README as-is.

Commits

  • 1dd71a0 — initial implementation of the feature
  • 4036bb9 — design split, tests, and README updates

Squash-merge friendly if you'd prefer one commit.

Thanks for zerompk — it's a really nicely scoped library.

Adds three capabilities while preserving strict-by-default and upstream
codegen for untouched structs:

- `Read::skip_value()` — trait method that consumes one MessagePack value
  of any type; implemented on `SliceReader` and `IOReader`. Respects the
  existing `MAX_DEPTH` check. Independently useful for log parsers and
  protocol proxies.

- Field-level `#[msgpack(default)]` and `#[msgpack(default = "path")]` —
  on read, a missing key/slot is filled with `Default::default()` or the
  named function. Mirrors serde's surface, pure codegen.

- Implicit evolution in map mode: structs with any `#[msgpack(default)]`
  field switch to a tolerant decoder (read_map_len + iterate + skip
  unknown keys via skip_value). No new struct-level attribute.

- Tolerant array decode: structs with trailing `default` fields accept
  shorter arrays and skip trailing extras.

- Compile-error safety net: array-mode structs with a mid-struct default
  are rejected at derive time, pointing to the exact field.

Strict structs (no `default` fields) get byte-for-byte identical codegen
to 0.4.1 — check_map_len / check_array_len fast-fail preserved.

All 153 existing tests pass.
# Conflicts:
#	zerompk_derive/src/lib.rs
…orthogonal attributes

Previously, any field annotated with `#[msgpack(default)]` silently flipped
the entire map-mode decoder into tolerating unknown keys — two unrelated
concerns bundled into one attribute.

Split them:

- `#[msgpack(default)]` fills missing keys only; unknown keys still error,
  with the offending key surfaced in `KeyNotFound`.
- `#[msgpack(allow_unknown_fields)]` (struct-level) skips unknown keys only;
  fields without `default` are still required.
- The two attributes compose independently for full schema evolution.

Tightened additional edges:

- `#[msgpack(default)]` in array mode is now a compile error (arrays have
  no field names; silently accepting shorter arrays hides corruption).
- `#[msgpack(default)]` on enum-variant fields is now rejected (was a
  silent no-op).
- `allow_unknown_fields` on array-mode structs and on enums is rejected
  with a clear diagnostic.
- Strict-mode codegen is unchanged byte-for-byte.

Add nine integration tests in `zerompk/tests/schema_evolution.rs` covering
each combination of the two attributes (the feature previously had no tests).

Update README to document both attributes and correct the Security section's
"always strict" claim.
@nuskey8
Copy link
Copy Markdown
Owner

nuskey8 commented May 1, 2026

Thank you! This is a very powerful feature!

From a quick look, it seems fine, but I'll review the details soon and merge it if there are no issues.

@farhan-syah
Copy link
Copy Markdown
Contributor Author

Okay, I'll fix the CI failure in the meantime

farhan-syah and others added 3 commits May 1, 2026 14:16
The derive macros are already re-exported via the zerompk crate,
making the explicit zerompk_derive import redundant and causing
a CI failure on PR nuskey8#37.
@nuskey8 nuskey8 merged commit 2aa291d into nuskey8:main May 9, 2026
@nuskey8
Copy link
Copy Markdown
Owner

nuskey8 commented May 9, 2026

Merged the PR, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants