-
Notifications
You must be signed in to change notification settings - Fork 1
Internals
This article covers the parts of mruby-cbor most users never need to touch: the encoding algorithms, the lazy decoder’s architecture, the sharedref bookkeeping, the fast wire format, the determinism guarantees, and how performance is tuned.
If you only want to encode and decode CBOR, the README has everything you need. Come here when you want to know how it works — for security review, performance work, or porting a feature to another runtime.
- Float Encoding (Preferred Serialization)
- BigInt Encoding (Tag 2/3)
- Shared References (Tag 28/29)
- Lazy Decoding Architecture
- Fast Wire Format
- Symbol Encoding Strategies
- String Encoding (UTF-8 vs Binary)
- Depth Tracking & Stack Overflow Prevention
- Memory Layout: SBO + Heap
- File Streaming: Adaptive Readahead
- Determinism Guarantees
- RFC 8949 Compliance
- Performance & Benchmarks
- Recursion Depth Tuning
Floats use the smallest CBOR float width that represents them losslessly:
CBOR.encode(0.0).bytesize # => 3 (f16)
CBOR.encode(1.0).bytesize # => 3 (f16)
CBOR.encode(1.5).bytesize # => 3 (f16)
CBOR.encode(1.0e10).bytesize # => 5 (f32)
CBOR.encode(3.14).bytesize # => 9 (f64)The selection is pure bit-pattern arithmetic with zero floating-point operations — important for determinism and for MRB_NO_FLOAT builds.
1. Extract sign, exponent, mantissa from f64 bit-pattern.
2. If NaN: emit canonical f16 0xF97E00, done.
3. If can fit in f32 (low 29 mant bits = 0, exp in range): try f16 next.
4. If can fit in f16 (subnormal or normal range checks): emit f16.
5. If fits in f32: emit f32.
6. Else: emit f64.
| Value | Encoding | Rationale |
|---|---|---|
| NaN (any payload) | f16 0x7E00
|
Canonical per RFC 8949 Appendix B |
| ±Inf, ±0 | f16 | Always representable |
| f16 normal | f16 if exp32 ∈ [113..142] and low 13 mant bits = 0 |
Lossless in f16 |
| f16 subnormal | f16 if exp32 ∈ [103..112] and shift is exact |
Lossless in f16 |
| Fits in f32 | f32 if low 29 f64 mant bits = 0 and exp in range | Lossless in f32 |
| Everything else | f64 | Need full precision |
MRB_USE_FLOAT32 builds: start at f32 and try f16, skipping f64 entirely (mruby’s mrb_float is f32 so f64 wouldn’t round-trip).
Fast path floats (encode_fast): always emit at full native width with no bit-pattern analysis — faster but larger on the wire.
Per RFC 8949 §3.4.3, bignums outside int64 range encode as:
- Tag 2: Positive bigint as byte string (big-endian)
-
Tag 3: Negative bigint as byte string (
|magnitude| - 1, big-endian)
Example: (1 << 200) + 1 → Tag 2 wrapping a 26-byte hex string.
Zero-length payloads are handled per the RFC edge case: tag(2, h'') = 0, tag(3, h'') = -1.
The negation of large bignums is computed in pure integer arithmetic — no floating-point operations, no overflow surprises. This matters: it’s one of the things that makes encoding deterministic across builds.
The shared-reference algorithm preserves object identity across encode/decode round-trips and lets you encode cyclic structures. Enable it with sharedrefs: true.
You cannot decide whether an object is shared until you’ve seen the whole graph — an object referenced once should encode inline with no tag, an object referenced two or more times needs a Tag 28 marker and subsequent Tag 29 back-references. That refcount is only known after a full traversal, so encoding runs in two passes:
Pass 1 (count):
Walk the object graph. For each non-immediate value, increment
counts[mrb_obj_id]. Recurse into a value's children only on its first
encounter. Cache any proc-tag / _before_encode transform results so
Pass 2 doesn't re-invoke them.
Pass 2 (emit):
counts[id] < 2 → encode inline, no tag (value is unique).
counts[id] >= 2, first → assign the next slot, emit Tag 28, then
encode the value bytes inline.
counts[id] >= 2, again → emit Tag 29 + slot index, skip the value.
Slot numbers are assigned lazily in Pass 2, in Tag 28 wire-appearance order. Single-occurrence objects never consume a slot.
Tag 28 (Shareable):
Compute the payload's byte offset. If that offset is already in the
offset_map, reuse its slot (makes registration idempotent across
re-traversals — e.g. lazy.value then lazy[i].value). Otherwise append a
new slot and register offset -> slot. For container payloads, pre-fill the
slot with the empty container *before* decoding its children, so Tag 29
references inside the payload resolve to the partially constructed object
(this is what makes cycles work).
Tag 29 (Shared ref):
Read the index, add CBOR_SHAREDREFS_INDEX_BIAS, fetch from the sharedrefs
array, return that exact object. Raw Lazy placeholders living in a slot are
upgraded to full Lazies on the way out.
sharedrefs is a plain Ruby Array carrying both bookkeeping and the shareable values:
| Index | Contents |
|---|---|
[0] |
HWM — highest payload offset registered so far. A cheap “already registered?” check during monotonic forward traversal (skip_cbor / lazy tag resolution), avoiding a hash lookup per Tag 28. |
[1] |
offset_map (Hash) — payload_offset → slot. Used to find the slot for a known offset (re-decode detection, placeholder routing). Created lazily on first registration, so documents with no Tag 28s pay nothing. |
[N + 2 ..] |
the shareable values themselves; encoder slot N lives at sharedrefs[N + CBOR_SHAREDREFS_INDEX_BIAS], where the bias is 2. |
Key invariant: encoder slot order and decoder slot order stay aligned because each Tag 28 reserves exactly one slot at its wire-order position, and the offset_map keeps that assignment idempotent across repeated traversals of the same bytes. This is what makes decoded["x"].equal?(decoded["y"]) and full[0].equal?(lazy[0].value) both hold.
Hash keys do not participate in sharing. mruby creates distinct copies for hash keys regardless, so trying to share them would just clutter the slot space.
Lazy + sharedrefs: the same sharedrefs array (with its offset_map) is threaded through every Lazy spawned from the same root buffer. Tag 28 sites encountered while skipping install a raw Lazy placeholder into their slot, initialized on first real access — so navigating across shared boundaries (lazy["a"]["x"] vs lazy["b"]["x"]) can resolve to the same wrapper.
A Lazy object is a (buffer, offset) handle plus a key-cache and value-cache, carrying two state flags:
-
evaluating— set whilevalueis in progress, for cycle detection. -
initialized—FALSEfor a raw Lazy (only the buffer ivar is set),TRUEfor a full Lazy (vcache / kcache / sharedrefs ivars set). Raw Lazies exist only as sharedref slot placeholders and are upgraded to full on first real access.
1. cbor_lazy_new: wrap buffer + offset, caches empty.
2. lazy["key"]: seek to offset, decode just enough to find the key,
spawn a new Lazy at the value's offset, cache it under the key.
3. lazy.value: decode from the current offset, return a fully-realized
value, cache it in vcache.
4. Repeated access: return the cached object — O(1).
The buffer is never copied. Each Lazy is just a different offset into the same underlying string. Strings extracted via mrb_str_byte_subseq are read-only views over the buffer too — perfect for partial extraction from large files.
CBOR::Path shares the cache: path queries call into the same cbor_lazy_aref that lazy[key] uses, so the second walk over a document is essentially free.
Cycle detection during materialization: the evaluating flag is set while value is in-progress. A re-entry while still evaluating raises RuntimeError: CBOR shared reference cycle detected instead of stack-overflowing.
The fast path (encode_fast / decode_fast) trades wire portability for speed by making integer and float widths fixed at compile time:
For each value:
integer → fixed-width (MRB_INT_BIT / 8 bytes), major 0 or 1
float → fixed-width (sizeof(mrb_float) bytes), 0xFA or 0xFB
string → canonical length prefix + bytes, no UTF-8 check, always major 2
array → canonical length prefix + fast-encoded elements
map → canonical length prefix + fast-encoded pairs
symbol → tag 39 hybrid: presym → uint payload, runtime sym → string payload
class → tag 49999 + canonical length + name bytes
other → fall back to canonical encode_value
Only scalars are fixed-width. Structural lengths remain shortest-form so container overhead is identical to canonical.
Wire width is determined entirely at compile time:
| Build setting | Integer wire | Float wire |
|---|---|---|
MRB_INT_BIT == 16 |
uint16 (info=25, 3 bytes) | — |
MRB_INT_BIT == 32 |
uint32 (info=26, 5 bytes) | — |
MRB_INT_BIT == 64 |
uint64 (info=27, 9 bytes) | — |
MRB_USE_FLOAT32 |
— | f32 (5 bytes) |
| default | — | f64 (9 bytes) |
Symbols on the fast path use the same hybrid as canonical strategy 2 (encode_sym_hybrid_fast): presyms emit as a uint payload, runtime symbols as a string payload. The presym leg is therefore not portable across builds — see Symbol Encoding Strategies.
Fallback for unsupported types: registered tags, bigints, UnhandledTag, and proc-tag types fall through to the canonical encoder transparently. The fast path never raises on an unsupported type — it just gets the canonical bytes for those values.
Trade-off: fixed-width integers produce larger wire output for small values (e.g. 1 encodes as 9 bytes on a 64-bit build instead of 1). For integer-heavy payloads with lots of small numbers, the canonical encoder is actually faster due to lower memcpy volume. The fast path wins on rich structured messages with string keys and mixed scalar values — the typical actor-message shape.
| Mode | Wire format | Interop |
|---|---|---|
no_symbols |
plain string | universal, loses the symbol type |
symbols_as_string |
Tag 39 + string | RFC 8949 compatible, fully portable |
symbols_as_uint32 (hybrid) |
Tag 39 + uint (presym) / string (runtime sym) | runtime leg portable; presym leg same-build mruby only |
encode_fast (any mode) |
same hybrid as strategy 2 | runtime leg portable; presym leg same-build mruby only |
Despite the historical name, symbols_as_uint32 no longer always emits a uint — it picks per symbol:
-
Presym (compile-time symbol,
sym <= MRB_PRESYM_MAX; valid IDs are 1 ..MRB_PRESYM_MAXinclusive) → uint payload carrying the sym ID directly. Two bytes total for IDs< 24. Cheapest path. -
Runtime sym (
"foo_#{i}".to_sym, anything> MRB_PRESYM_MAX) → string payload. Always portable, regardless of build.
The decoder accepts either payload under Tag 39 and dispatches by type (sym_from_tag39_payload, shared by canonical strategy 2 and the fast decoder). An out-of-range presym ID raises RangeError; a payload whose type doesn’t match the active strategy raises TypeError.
Why the presym leg is non-portable: symbol ID 42 on your mruby might be ID 100 on another mruby built with different presym settings. IDs are assigned at compile time based on which symbols the build actually references — two binaries from different source trees get different assignments even if they share most symbols. This is why the presym leg is restricted to internal mruby-to-mruby IPC where you control both ends. Runtime symbols travel as strings and so always survive a build mismatch; if you don’t control both ends, use symbols_as_string.
Requires presym; on MRB_NO_PRESYM builds, symbols_as_uint32 raises NotImplementedError.
When encoding, strings are auto-classified:
CBOR.encode("hello") # => major 3 (text string)
CBOR.encode("\x00\xFF\xFE".b) # => major 2 (byte string)The encoder calls mrb_str_is_utf8 on each string and chooses major type 3 (text) for valid UTF-8, major type 2 (byte string) otherwise.
When decoding, text strings (major type 3) are validated as UTF-8 if mruby was compiled with MRB_UTF8_STRING. Without MRB_UTF8_STRING, validation is skipped — the strings still decode, they’re just not validated.
Fast path always emits major type 2 regardless of content. The fast wire format is tested only for mruby-to-mruby and the text/binary distinction doesn’t apply at that layer.
Each decode call increments a per-Reader depth counter. At CBOR_MAX_DEPTH, the decoder raises RuntimeError: "CBOR nesting depth exceeded". This prevents:
- Deeply nested arrays: [[[[[...]]]]]]
- Deeply nested maps: {"a": {"a": {"a": ...}}}
- Circular references without sharedrefs (would loop forever)
- Tag chain bombs: tag(N)(tag(N)(tag(N)(...)))
The encoder applies the same limit, so a cyclic structure passed without sharedrefs: true raises before consuming arbitrary memory.
CBOR::Lazy#value and CBOR::Lazy#[] both carry their own depth counters for the same reason — recursive sharedref resolution is bounded.
Encoding uses a small-buffer optimization (SBO) with a 16 KB stack buffer:
#define CBOR_SBO_STACK_CAP (16 * 1024)- Documents under 16 KB → stack only, no allocation.
- Larger documents → spill to a heap-allocated mruby string, growing geometrically (next power of 2).
- The final
cbor_writer_finishreturns either a freshmrb_str_newof the stack contents, or the heap string with its length set in place — no unnecessary copies.
The heap string is GC-protected via mrb_gc_register while encoding is in progress and unregistered on finish.
CBOR.stream(file) uses an adaptive readahead with doubling strategy:
- First read: 9 bytes (minimum needed to parse any CBOR header).
- Use
doc_endto find where the document ends by skipping its contents. - If the buffer contains the full document: yield it, move to the next.
- If not complete: double the read size, re-read from the same offset, retry.
- Continue doubling until the full document is buffered.
- Then read exactly the remaining bytes needed (if any) to avoid over-reading.
This means streaming a file with mostly small messages and the occasional huge one stays cheap — readahead grows for the big ones and stays small for the rest.
This section answers: will the same input always produce the same output?
-
Integer encoding (any base). Fixnum encodes in shortest varint form; bignum converts to big-endian byte string with no lossy steps. Negative integers always use the
-1 - nrule. Same value → same bytes, regardless of how the integer was constructed. -
Float width selection. A float’s wire encoding depends solely on its bit-pattern, never on FP rounding. NaN always canonicalizes to
0xF97E00. - Hash field order. Encoding follows insertion order, which mruby guarantees. Reproducible across encodes and across machines.
- Bignum negation. Computed in pure integer arithmetic — no rounding, no overflow.
-
Integer overflow. Explicit bounds checks raise
RangeErrorconsistently rather than wrapping around.
| Factor | Impact | Details |
|---|---|---|
| mruby build config | Symbols, float width range |
MRB_USE_FLOAT32, presym settings affect encoding choices |
| Symbol IDs | Presym leg non-portable across mruby binaries | Use symbols_as_string for portable wire |
encode_fast integer width |
Non-portable across builds with different MRB_INT_BIT
|
Fast buffers must never cross build boundaries |
Practical: same mruby binary + same input = same output, forever. For cross-machine reproducibility, use symbols_as_string and encode (not encode_fast).
This implementation strictly follows RFC 8949:
- §3.1 (Unsigned integers): shortest-form (varint-style)
-
§3.2 (Negative integers): consistent
-1 - nrule - §3.3 (Byte strings): uninterpreted binary, no UTF-8 check
-
§3.4 (Text strings): validated UTF-8 (with
MRB_UTF8_STRING); major type 3 -
§3.5 (Arrays & maps): definite length only (use
CBOR.streamfor indefinite-style framing) - §4.1 (Float preferred serialization): smallest lossless width (f16→f32→f64)
-
§4.2 (Simplicity values):
false,true,nullonly (noundefined) - §3.4.3 (Bignums/Tags 2&3): zero-length payload rule respected
- §3.4.1 (Tags 28&29): identity-preserving shared references
-
Appendix B (Canonical CBOR, CTAP2): NaN always
0xF97E00
Indefinite-length items are rejected — they raise NotImplementedError on decode. CBOR sequences via CBOR.stream provide the equivalent capability with bounded memory usage.
The official RFC 8949 Appendix A test vectors are included in test-vectors/ and run as part of rake test.
Relative numbers, 100k iterations, -O3 -march=native:
| Operation | Canonical | Fast | Notes |
|---|---|---|---|
| Encode small map | 1× | ~1.4× faster | Typical actor message |
| Encode nested structure | 1× | ~1.3× faster | Maps + arrays |
Encode int array [100]
|
1× | ~0.9× slower | Fixed-width integers = more bytes |
| Decode small map | 1× | ~1.3× faster | |
| Decode nested structure | 1× | ~1.2× faster | |
Decode int array [100]
|
1× | ~1.1× faster | Fixed-width reads |
vs. simdjson on-demand (lazy decoding selective access):
- ~1.77× faster on i3
- ~3.4× faster on Ryzen 9
The structural advantage comes from CBOR’s length-prefixed format enabling genuine byte-skipping vs simdjson’s required byte scanning.
vs. msgpack (encoding):
- ~30% faster on typical structured payloads
Default limits depend on mruby profile:
MRB_HIGH_PROFILE → CBOR_MAX_DEPTH = 128
MRB_MAIN_PROFILE → CBOR_MAX_DEPTH = 64
MRB_BASELINE_PROFILE → CBOR_MAX_DEPTH = 32
Other / constrained → CBOR_MAX_DEPTH = 16Override at build time by defining CBOR_MAX_DEPTH in your build config:
conf.cc.defines << 'CBOR_MAX_DEPTH=256'Exceeding the limit raises RuntimeError: "CBOR nesting depth exceeded". The same limit applies to encode, decode, lazy value, lazy [], and tag chains.