Skip to content

Make the typed-model codec pluggable behind the Serde seam, and dogfood it across the wire-touching subsystems #63

@OmarAlJarrah

Description

@OmarAlJarrah

Current design

serde/codec.py is a ~1000-line reflective interpreter that maps frozen dataclasses to and from plain documents. It walks the typing surface itself: _decode_value dispatches over Annotated, unions (Union / types.UnionType), parametrised generic dataclasses (with type-parameter substitution in _type_arg_map), containers, mappings, tuples, Tristate, enums, datetimes, and UUID; it carries a discriminated-union registry (@discriminated / @variant), a module-level _MODEL_CACHE, and a _MAX_DEPTH = 200 guard that converts a would-be RecursionError into a CodecError (codec.py:300-304).

Two structural facts about how this sits in the codebase:

  • Codec is a concrete class, not a seam. The Serde / Serializer / Deserializer Protocols in serde/serde.py stop at document↔bytes. The document↔typed-model layer — the large, reflective part — is the concrete Codec class (codec.py:226), with no Protocol equivalent. So the part most likely to need swapping (the type engine) is the one part that is not abstracted.

  • Core never exercises the seam. A repo-wide search finds no non-test consumer of Serde / JSON_SERDE / Codec outside the serde package. The subsystems that actually touch the wire decode JSON inline instead: pagination/paginator.py:71, http/common/streaming.py:79 (JSONL), and http/webhooks/verification.py:216 all call json.loads directly.

The codec is also dataclass-only. _decode_atomic (codec.py:360-371) returns the raw document unchanged for any class target that is not a dataclass / enum / datetime / UUID, so pointing it at a TypedDict, NamedTuple, or attrs type yields silent passthrough rather than a decoded value or a clear error. The only decode knob is tolerate_unknown; there is no scalar coercion or validation (a JSON string under an int-typed field is returned unchanged).

There is no docs/serde.md, and docs/architecture.md does not mention serde or the codec at all.

Trade-off / concern

A reflective codec signs up to track the entire evolving typing surface — PEP 695 generics, Annotated, the two union spellings, forward refs — indefinitely, and the file already carries that weight (the manual recursion guard, the generic-parameter substitution, the __type_params__ / __parameters__ fallback in _type_arg_map). That is a real maintenance and correctness surface to own forever in a no-runtime-deps core.

What makes it sharper is the internal inconsistency: the stated value of a format-agnostic Serde is "a single injection point ... easy to swap formats at the edge of the SDK" (serde.py:43-51), but the SDK's own edges don't go through it — they hardcode json.loads. (The paginator's hardcoded JSON decode is already on file as a defect; I cite it here only as evidence that the inline-decode pattern has bitten us, not as the concern itself.) So we pay the full cost of owning a reflective engine and a Serde abstraction, while the engine is non-pluggable, the abstraction is unused internally, and the whole thing is undocumented in the architecture. Exported-but-unused reflective machinery with full backward-compatibility obligations is an awkward position for long-term API stability.

The dataclass-only substrate compounds this: a consumer who points the codec at their own TypedDict / attrs domain types gets passthrough rather than a decoded result or a clear error.

Proposed direction

Treat the typed-model layer the way the SDK already treats transports — as a pluggable seam with a zero-dependency default and optional adapter distributions:

  • Define a narrow TypedCodec Protocol (encode(value) -> doc, decode(doc, target) -> T) and make the stdlib Codec one implementation of it. This keeps the no-deps guarantee for core while letting validation- or performance-sensitive consumers opt into a mature engine via optional packages (dexpace-sdk-serde-msgspec / -pydantic), exactly as dexpace-sdk-http-* adapt HttpClient.
  • Dogfood the seam: thread Serde / TypedCodec through pagination, JSONL streaming, and webhook decoding so the wire format is genuinely swappable rather than nominally swappable.
  • For the substrate gap, at minimum raise CodecError instead of silently passing through a non-dataclass class target; a registry of (predicate -> encoder/decoder) hooks would let consumers teach the codec about TypedDict / attrs / custom scalars without forking.
  • Add docs/serde.md, place serde/codec in the architecture diagram, and state a stability level for it.

Trade-off of the proposal: a TypedCodec Protocol plus a hook registry add a layer of indirection and SPI surface, and adapter packages are more distributions to maintain. That cost is weighed against perpetual exposure to owning a reflective type interpreter as the only substrate, in core, that core itself never runs.

Acknowledging the current rationale

This is a deliberate, documented choice, not an oversight. The codec's module docstring states it is "deliberately validation-free ... performs no schema checks or scalar coercion," and the CHANGELOG flags it as "the largest new surface and ... worth a careful read before depending on it," consistent with the project's stdlib-only, no-extra-runtime-deps stance (furl is the sole sanctioned dependency). The no-deps rule genuinely forecloses simply delegating to a mature library inside core.

Revisiting is still worthwhile because the proposal honours that rule — the stdlib Codec stays the default and core keeps zero new deps — while resolving the two tensions the current shape leaves open: the engine that most needs to be swappable is the one piece with no Protocol, and the Serde abstraction the project paid for is bypassed by the SDK's own wire-decoding paths. Even if optional adapters are deferred, extracting a TypedCodec Protocol and routing the internal decoders through it would align the codec with the project's established "Protocol for SPIs" convention and its own transport-pluggability pattern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions