Opt-in schema evolution: #[msgpack(default)] and #[msgpack(allow_unknown_fields)]#37
Merged
Merged
Conversation
Adds three capabilities while preserving strict-by-default and upstream codegen for untouched structs: - `Read::skip_value()` — trait method that consumes one MessagePack value of any type; implemented on `SliceReader` and `IOReader`. Respects the existing `MAX_DEPTH` check. Independently useful for log parsers and protocol proxies. - Field-level `#[msgpack(default)]` and `#[msgpack(default = "path")]` — on read, a missing key/slot is filled with `Default::default()` or the named function. Mirrors serde's surface, pure codegen. - Implicit evolution in map mode: structs with any `#[msgpack(default)]` field switch to a tolerant decoder (read_map_len + iterate + skip unknown keys via skip_value). No new struct-level attribute. - Tolerant array decode: structs with trailing `default` fields accept shorter arrays and skip trailing extras. - Compile-error safety net: array-mode structs with a mid-struct default are rejected at derive time, pointing to the exact field. Strict structs (no `default` fields) get byte-for-byte identical codegen to 0.4.1 — check_map_len / check_array_len fast-fail preserved. All 153 existing tests pass.
# Conflicts: # zerompk_derive/src/lib.rs
…orthogonal attributes Previously, any field annotated with `#[msgpack(default)]` silently flipped the entire map-mode decoder into tolerating unknown keys — two unrelated concerns bundled into one attribute. Split them: - `#[msgpack(default)]` fills missing keys only; unknown keys still error, with the offending key surfaced in `KeyNotFound`. - `#[msgpack(allow_unknown_fields)]` (struct-level) skips unknown keys only; fields without `default` are still required. - The two attributes compose independently for full schema evolution. Tightened additional edges: - `#[msgpack(default)]` in array mode is now a compile error (arrays have no field names; silently accepting shorter arrays hides corruption). - `#[msgpack(default)]` on enum-variant fields is now rejected (was a silent no-op). - `allow_unknown_fields` on array-mode structs and on enums is rejected with a clear diagnostic. - Strict-mode codegen is unchanged byte-for-byte. Add nine integration tests in `zerompk/tests/schema_evolution.rs` covering each combination of the two attributes (the feature previously had no tests). Update README to document both attributes and correct the Security section's "always strict" claim.
Owner
|
Thank you! This is a very powerful feature! From a quick look, it seems fine, but I'll review the details soon and merge it if there are no issues. |
Contributor
Author
|
Okay, I'll fix the CI failure in the meantime |
The derive macros are already re-exported via the zerompk crate, making the explicit zerompk_derive import redundant and causing a CI failure on PR nuskey8#37.
Owner
|
Merged the PR, thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
I'm using zerompk as a wire/storage format in NodeDB - where I can set that schema needs to evolve over time — adding fields in newer writers, removing fields in newer readers — without forcing a synchronized deploy of every producer and consumer. zerompk's strict-by-default decoding is the right default (and the security section makes a good case for it), but it means any schema change is a hard break across the wire.
serdesolves this with#[serde(default)]and#[serde(deny_unknown_fields)]. This PR adds the equivalent for zerompk, scoped narrowly so the strict path is preserved bit-for-bit and only opted-in users pay any runtime cost.I've been running this on a fork (
farhan-syah/zerompk) for a bit. Opening this PR in case it's useful upstream. Totally fine if it isn't a fit — happy to keep it on the fork or rework anything.What's added
Two orthogonal opt-ins on
#[msgpack(map)]structs:Field-level
#[msgpack(default)]— fills a missing key withDefault::default(). Also accepts#[msgpack(default = "path::to::fn")]to call a named function, mirroringserde(default = "..."). Only affects missing-key handling; unknown keys still error (with the offending key name surfaced inKeyNotFound).Struct-level
#[msgpack(allow_unknown_fields)]— skips unknown keys via a newRead::skip_value()trait method (which respectsMAX_DEPTH). Only affects unknown-key handling; missing required keys still error.The two compose. Three resulting modes:
defaultallow_unknown_fieldscheck_map_len(N)— byte-identical to current codegenread_map_len, fill missing, error on unknownread_map_len, fill missing, skip unknownDesign choices worth flagging
defaultattribute that implicitly switched the whole decoder into "tolerate unknown keys too" — but that bundles two unrelated decisions into one attribute. Splitting them matchesserde's mental model and lets users pick the evolution direction they actually need.defaultat compile time. Arrays have no field names, so silently accepting shorter/longer arrays would hide corruption rather than evolve schema. The error points at the offending field and suggests#[msgpack(map)].default. It was previously a silent no-op (parsed but ignored in codegen) — now an error so users don't write code that looks like it does evolution but doesn't.skip_value()is independently useful. It's exposed on theReadtrait so non-derive consumers can use it (log parsers, proxies, partial decoders). Implemented on bothSliceReaderandIOReader, with the existingMAX_DEPTHguard preserved.Tests
zerompk/tests/schema_evolution.rs(new) — 9 tests covering each cell of the matrix: defaults fill missing keys; defaults alone reject unknowns (with the correct key name);allow_unknown_fieldsalone skips extras;allow_unknown_fieldsalone still requires non-default keys; both compose;default = "path"form; strict mode rejects missing and extra keys; round-trip preservation.All 158 existing + new unit/integration tests pass. Clippy clean. Existing fuzz targets (
decode_no_panic,roundtrip) ran 30s each post-change without crashes.Compatibility
no_stdpreserved (usescore::default::Default).Things I'm uncertain about / open to changing
allow_unknown_fields— verbose. Could belenient,evolve, or invert the polarity (deny_unknown_fields = false). Happy to rename.skip_value()be a public trait method? I made it public because it's genuinely useful, but if you'd rather keep the trait surface minimal it can bepub(crate)and the derive can call it via a private path.Commits
Squash-merge friendly if you'd prefer one commit.
Thanks for zerompk — it's a really nicely scoped library.