Internals

This article covers the parts of mruby-cbor most users never need to touch: the encoding algorithms, the lazy decoder’s architecture, the sharedref bookkeeping, the fast wire format, the determinism guarantees, and how performance is tuned.

If you only want to encode and decode CBOR, the README has everything you need. Come here when you want to know how it works — for security review, performance work, or porting a feature to another runtime.

Float Encoding (Preferred Serialization)
BigInt Encoding (Tag 2/3)
Shared References (Tag 28/29)
Lazy Decoding Architecture
Fast Wire Format
Symbol Encoding Strategies
String Encoding (UTF-8 vs Binary)
Depth Tracking & Stack Overflow Prevention
Memory Layout: SBO + Heap
File Streaming: Adaptive Readahead
Determinism Guarantees
RFC 8949 Compliance
Performance & Benchmarks
Recursion Depth Tuning

Float Encoding (Preferred Serialization)

Floats use the smallest CBOR float width that represents them losslessly:

CBOR.encode(0.0).bytesize      # => 3 (f16)
CBOR.encode(1.0).bytesize      # => 3 (f16)
CBOR.encode(1.5).bytesize      # => 3 (f16)
CBOR.encode(1.0e10).bytesize   # => 5 (f32)
CBOR.encode(3.14).bytesize     # => 9 (f64)

The selection is pure bit-pattern arithmetic with zero floating-point operations — important for determinism and for MRB_NO_FLOAT builds.

1. Extract sign, exponent, mantissa from f64 bit-pattern.
2. If NaN: emit canonical f16 0xF97E00, done.
3. If can fit in f32 (low 29 mant bits = 0, exp in range): try f16 next.
4. If can fit in f16 (subnormal or normal range checks): emit f16.
5. If fits in f32: emit f32.
6. Else: emit f64.

Value	Encoding	Rationale
NaN (any payload)	f16 `0x7E00`	Canonical per RFC 8949 Appendix B
±Inf, ±0	f16	Always representable
f16 normal	f16 if `exp32 ∈ [113..142]` and low 13 mant bits = 0	Lossless in f16
f16 subnormal	f16 if `exp32 ∈ [103..112]` and shift is exact	Lossless in f16
Fits in f32	f32 if low 29 f64 mant bits = 0 and exp in range	Lossless in f32
Everything else	f64	Need full precision

MRB_USE_FLOAT32 builds: start at f32 and try f16, skipping f64 entirely (mruby’s mrb_float is f32 so f64 wouldn’t round-trip).

Fast path floats (encode_fast): always emit at full native width with no bit-pattern analysis — faster but larger on the wire.

BigInt Encoding (Tag 2/3)

Per RFC 8949 §3.4.3, bignums outside int64 range encode as:

Tag 2: Positive bigint as byte string (big-endian)
Tag 3: Negative bigint as byte string (|magnitude| - 1, big-endian)

Example: (1 << 200) + 1 → Tag 2 wrapping a 26-byte hex string.

Zero-length payloads are handled per the RFC edge case: tag(2, h'') = 0, tag(3, h'') = -1.

The negation of large bignums is computed in pure integer arithmetic — no floating-point operations, no overflow surprises. This matters: it’s one of the things that makes encoding deterministic across builds.

Shared References (Tag 28/29)

The shared-reference algorithm preserves object identity across encode/decode round-trips and lets you encode cyclic structures. Enable it with sharedrefs: true.

Encoding is two-pass

You cannot decide whether an object is shared until you’ve seen the whole graph — an object referenced once should encode inline with no tag, an object referenced two or more times needs a Tag 28 marker and subsequent Tag 29 back-references. That refcount is only known after a full traversal, so encoding runs in two passes:

Pass 1 (count):
  Walk the object graph. For each non-immediate value, increment
  counts[mrb_obj_id]. Recurse into a value's children only on its first
  encounter. Cache any proc-tag / _before_encode transform results so
  Pass 2 doesn't re-invoke them.

Pass 2 (emit):
  counts[id] < 2            → encode inline, no tag (value is unique).
  counts[id] >= 2, first    → assign the next slot, emit Tag 28, then
                              encode the value bytes inline.
  counts[id] >= 2, again    → emit Tag 29 + slot index, skip the value.

Slot numbers are assigned lazily in Pass 2, in Tag 28 wire-appearance order. Single-occurrence objects never consume a slot.

Decoding

Tag 28 (Shareable):
  Compute the payload's byte offset. If that offset is already in the
  offset_map, reuse its slot (makes registration idempotent across
  re-traversals — e.g. lazy.value then lazy[i].value). Otherwise append a
  new slot and register offset -> slot. For container payloads, pre-fill the
  slot with the empty container *before* decoding its children, so Tag 29
  references inside the payload resolve to the partially constructed object
  (this is what makes cycles work).

Tag 29 (Shared ref):
  Read the index, add CBOR_SHAREDREFS_INDEX_BIAS, fetch from the sharedrefs
  array, return that exact object. Raw Lazy placeholders living in a slot are
  upgraded to full Lazies on the way out.

The sharedrefs array layout

sharedrefs is a plain Ruby Array carrying both bookkeeping and the shareable values:

Index	Contents
`[0]`	HWM — highest payload offset registered so far. A cheap “already registered?” check during monotonic forward traversal (`skip_cbor` / lazy tag resolution), avoiding a hash lookup per Tag 28.
`[1]`	offset_map (Hash) — `payload_offset → slot`. Used to find the slot for a known offset (re-decode detection, placeholder routing). Created lazily on first registration, so documents with no Tag 28s pay nothing.
`[N + 2 ..]`	the shareable values themselves; encoder slot `N` lives at `sharedrefs[N + CBOR_SHAREDREFS_INDEX_BIAS]`, where the bias is `2`.

Key invariant: encoder slot order and decoder slot order stay aligned because each Tag 28 reserves exactly one slot at its wire-order position, and the offset_map keeps that assignment idempotent across repeated traversals of the same bytes. This is what makes decoded["x"].equal?(decoded["y"]) and full[0].equal?(lazy[0].value) both hold.

Hash keys do not participate in sharing. mruby creates distinct copies for hash keys regardless, so trying to share them would just clutter the slot space.

Lazy + sharedrefs: the same sharedrefs array (with its offset_map) is threaded through every Lazy spawned from the same root buffer. Tag 28 sites encountered while skipping install a raw Lazy placeholder into their slot, initialized on first real access — so navigating across shared boundaries (lazy["a"]["x"] vs lazy["b"]["x"]) can resolve to the same wrapper.

Lazy Decoding Architecture

A Lazy object is a (buffer, offset) handle plus a key-cache and value-cache, carrying two state flags:

evaluating — set while value is in progress, for cycle detection.
initialized — FALSE for a raw Lazy (only the buffer ivar is set), TRUE for a full Lazy (vcache / kcache / sharedrefs ivars set). Raw Lazies exist only as sharedref slot placeholders and are upgraded to full on first real access.

1. cbor_lazy_new: wrap buffer + offset, caches empty.
2. lazy["key"]: seek to offset, decode just enough to find the key,
   spawn a new Lazy at the value's offset, cache it under the key.
3. lazy.value: decode from the current offset, return a fully-realized
   value, cache it in vcache.
4. Repeated access: return the cached object — O(1).

The buffer is never copied. Each Lazy is just a different offset into the same underlying string. Strings extracted via mrb_str_byte_subseq are read-only views over the buffer too — perfect for partial extraction from large files.

CBOR::Path shares the cache: path queries call into the same cbor_lazy_aref that lazy[key] uses, so the second walk over a document is essentially free.

Cycle detection during materialization: the evaluating flag is set while value is in-progress. A re-entry while still evaluating raises RuntimeError: CBOR shared reference cycle detected instead of stack-overflowing.

Fast Wire Format

The fast path (encode_fast / decode_fast) trades wire portability for speed by making integer and float widths fixed at compile time:

For each value:
  integer  → fixed-width (MRB_INT_BIT / 8 bytes), major 0 or 1
  float    → fixed-width (sizeof(mrb_float) bytes), 0xFA or 0xFB
  string   → canonical length prefix + bytes, no UTF-8 check, always major 2
  array    → canonical length prefix + fast-encoded elements
  map      → canonical length prefix + fast-encoded pairs
  symbol   → tag 39 hybrid: presym → uint payload, runtime sym → string payload
  class    → tag 49999 + canonical length + name bytes
  other    → fall back to canonical encode_value

Only scalars are fixed-width. Structural lengths remain shortest-form so container overhead is identical to canonical.

Wire width is determined entirely at compile time:

Build setting	Integer wire	Float wire
`MRB_INT_BIT == 16`	uint16 (info=25, 3 bytes)	—
`MRB_INT_BIT == 32`	uint32 (info=26, 5 bytes)	—
`MRB_INT_BIT == 64`	uint64 (info=27, 9 bytes)	—
`MRB_USE_FLOAT32`	—	f32 (5 bytes)
default	—	f64 (9 bytes)

Symbols on the fast path use the same hybrid as canonical strategy 2 (encode_sym_hybrid_fast): presyms emit as a uint payload, runtime symbols as a string payload. The presym leg is therefore not portable across builds — see Symbol Encoding Strategies.

Fallback for unsupported types: registered tags, bigints, UnhandledTag, and proc-tag types fall through to the canonical encoder transparently. The fast path never raises on an unsupported type — it just gets the canonical bytes for those values.

Trade-off: fixed-width integers produce larger wire output for small values (e.g. 1 encodes as 9 bytes on a 64-bit build instead of 1). For integer-heavy payloads with lots of small numbers, the canonical encoder is actually faster due to lower memcpy volume. The fast path wins on rich structured messages with string keys and mixed scalar values — the typical actor-message shape.

Symbol Encoding Strategies

Mode	Wire format	Interop
`no_symbols`	plain string	universal, loses the symbol type
`symbols_as_string`	Tag 39 + string	RFC 8949 compatible, fully portable
`symbols_as_uint32` (hybrid)	Tag 39 + uint (presym) / string (runtime sym)	runtime leg portable; presym leg same-build mruby only
`encode_fast` (any mode)	same hybrid as strategy 2	runtime leg portable; presym leg same-build mruby only

Despite the historical name, symbols_as_uint32 no longer always emits a uint — it picks per symbol:

Presym (compile-time symbol, sym <= MRB_PRESYM_MAX; valid IDs are 1 .. MRB_PRESYM_MAX inclusive) → uint payload carrying the sym ID directly. Two bytes total for IDs < 24. Cheapest path.
Runtime sym ("foo_#{i}".to_sym, anything > MRB_PRESYM_MAX) → string payload. Always portable, regardless of build.

The decoder accepts either payload under Tag 39 and dispatches by type (sym_from_tag39_payload, shared by canonical strategy 2 and the fast decoder). An out-of-range presym ID raises RangeError; a payload whose type doesn’t match the active strategy raises TypeError.

Why the presym leg is non-portable: symbol ID 42 on your mruby might be ID 100 on another mruby built with different presym settings. IDs are assigned at compile time based on which symbols the build actually references — two binaries from different source trees get different assignments even if they share most symbols. This is why the presym leg is restricted to internal mruby-to-mruby IPC where you control both ends. Runtime symbols travel as strings and so always survive a build mismatch; if you don’t control both ends, use symbols_as_string.

Requires presym; on MRB_NO_PRESYM builds, symbols_as_uint32 raises NotImplementedError.

String Encoding (UTF-8 vs Binary)

When encoding, strings are auto-classified:

CBOR.encode("hello")          # => major 3 (text string)
CBOR.encode("\x00\xFF\xFE".b) # => major 2 (byte string)

The encoder calls mrb_str_is_utf8 on each string and chooses major type 3 (text) for valid UTF-8, major type 2 (byte string) otherwise.

When decoding, text strings (major type 3) are validated as UTF-8 if mruby was compiled with MRB_UTF8_STRING. Without MRB_UTF8_STRING, validation is skipped — the strings still decode, they’re just not validated.

Fast path always emits major type 2 regardless of content. The fast wire format is tested only for mruby-to-mruby and the text/binary distinction doesn’t apply at that layer.

Depth Tracking & Stack Overflow Prevention

Each decode call increments a per-Reader depth counter. At CBOR_MAX_DEPTH, the decoder raises RuntimeError: "CBOR nesting depth exceeded". This prevents:

- Deeply nested arrays:  [[[[[...]]]]]]
- Deeply nested maps:    {"a": {"a": {"a": ...}}}
- Circular references without sharedrefs (would loop forever)
- Tag chain bombs:       tag(N)(tag(N)(tag(N)(...)))

The encoder applies the same limit, so a cyclic structure passed without sharedrefs: true raises before consuming arbitrary memory.

CBOR::Lazy#value and CBOR::Lazy#[] both carry their own depth counters for the same reason — recursive sharedref resolution is bounded.

Memory Layout: SBO + Heap

Encoding uses a small-buffer optimization (SBO) with a 16 KB stack buffer:

#define CBOR_SBO_STACK_CAP (16 * 1024)

Documents under 16 KB → stack only, no allocation.
Larger documents → spill to a heap-allocated mruby string, growing geometrically (next power of 2).
The final cbor_writer_finish returns either a fresh mrb_str_new of the stack contents, or the heap string with its length set in place — no unnecessary copies.

The heap string is GC-protected via mrb_gc_register while encoding is in progress and unregistered on finish.

File Streaming: Adaptive Readahead

CBOR.stream(file) uses an adaptive readahead with doubling strategy:

First read: 9 bytes (minimum needed to parse any CBOR header).
Use doc_end to find where the document ends by skipping its contents.
If the buffer contains the full document: yield it, move to the next.
If not complete: double the read size, re-read from the same offset, retry.
Continue doubling until the full document is buffered.
Then read exactly the remaining bytes needed (if any) to avoid over-reading.

This means streaming a file with mostly small messages and the occasional huge one stays cheap — readahead grows for the big ones and stays small for the rest.

Determinism Guarantees

This section answers: will the same input always produce the same output?

What IS deterministic

Integer encoding (any base). Fixnum encodes in shortest varint form; bignum converts to big-endian byte string with no lossy steps. Negative integers always use the -1 - n rule. Same value → same bytes, regardless of how the integer was constructed.
Float width selection. A float’s wire encoding depends solely on its bit-pattern, never on FP rounding. NaN always canonicalizes to 0xF97E00.
Hash field order. Encoding follows insertion order, which mruby guarantees. Reproducible across encodes and across machines.
Bignum negation. Computed in pure integer arithmetic — no rounding, no overflow.
Integer overflow. Explicit bounds checks raise RangeError consistently rather than wrapping around.

What is NOT deterministic across builds

Factor	Impact	Details
mruby build config	Symbols, float width range	`MRB_USE_FLOAT32`, presym settings affect encoding choices
Symbol IDs	Presym leg non-portable across mruby binaries	Use `symbols_as_string` for portable wire
`encode_fast` integer width	Non-portable across builds with different `MRB_INT_BIT`	Fast buffers must never cross build boundaries

Practical: same mruby binary + same input = same output, forever. For cross-machine reproducibility, use symbols_as_string and encode (not encode_fast).

RFC 8949 Compliance

This implementation strictly follows RFC 8949:

§3.1 (Unsigned integers): shortest-form (varint-style)
§3.2 (Negative integers): consistent -1 - n rule
§3.3 (Byte strings): uninterpreted binary, no UTF-8 check
§3.4 (Text strings): validated UTF-8 (with MRB_UTF8_STRING); major type 3
§3.5 (Arrays & maps): definite length only (use CBOR.stream for indefinite-style framing)
§4.1 (Float preferred serialization): smallest lossless width (f16→f32→f64)
§4.2 (Simplicity values): false, true, null only (no undefined)
§3.4.3 (Bignums/Tags 2&3): zero-length payload rule respected
§3.4.1 (Tags 28&29): identity-preserving shared references
Appendix B (Canonical CBOR, CTAP2): NaN always 0xF97E00

Indefinite-length items are rejected — they raise NotImplementedError on decode. CBOR sequences via CBOR.stream provide the equivalent capability with bounded memory usage.

The official RFC 8949 Appendix A test vectors are included in test-vectors/ and run as part of rake test.

Performance & Benchmarks

Relative numbers, 100k iterations, -O3 -march=native:

Operation	Canonical	Fast	Notes
Encode small map	1×	~1.4× faster	Typical actor message
Encode nested structure	1×	~1.3× faster	Maps + arrays
Encode int array `[100]`	1×	~0.9× slower	Fixed-width integers = more bytes
Decode small map	1×	~1.3× faster
Decode nested structure	1×	~1.2× faster
Decode int array `[100]`	1×	~1.1× faster	Fixed-width reads

vs. simdjson on-demand (lazy decoding selective access):

~1.77× faster on i3
~3.4× faster on Ryzen 9

The structural advantage comes from CBOR’s length-prefixed format enabling genuine byte-skipping vs simdjson’s required byte scanning.

vs. msgpack (encoding):

~30% faster on typical structured payloads

Recursion Depth Tuning

Default limits depend on mruby profile:

MRB_HIGH_PROFILE       → CBOR_MAX_DEPTH = 128
MRB_MAIN_PROFILE       → CBOR_MAX_DEPTH = 64
MRB_BASELINE_PROFILE   → CBOR_MAX_DEPTH = 32
Other / constrained    → CBOR_MAX_DEPTH = 16

Override at build time by defining CBOR_MAX_DEPTH in your build config:

conf.cc.defines << 'CBOR_MAX_DEPTH=256'

Exceeding the limit raises RuntimeError: "CBOR nesting depth exceeded". The same limit applies to encode, decode, lazy value, lazy [], and tag chains.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internals

Internals

Table of Contents

Float Encoding (Preferred Serialization)

BigInt Encoding (Tag 2/3)

Shared References (Tag 28/29)

Encoding is two-pass

Decoding

The sharedrefs array layout

Lazy Decoding Architecture

Fast Wire Format

Symbol Encoding Strategies

String Encoding (UTF-8 vs Binary)

Depth Tracking & Stack Overflow Prevention

Memory Layout: SBO + Heap

File Streaming: Adaptive Readahead

Determinism Guarantees

What IS deterministic

What is NOT deterministic across builds

RFC 8949 Compliance

Performance & Benchmarks

Recursion Depth Tuning

Uh oh!

Clone this wiki locally