Skip to content

SdkError records the ambient sys.exc_info() over an explicitly passed error= cause #66

@OmarAlJarrah

Description

@OmarAlJarrah

Current design

SdkError.__init__ (packages/dexpace-sdk-core/src/dexpace/sdk/core/errors/base.py:51-58) records the cause two different ways and lets them disagree:

self.inner_exception = error
exc_info = sys.exc_info()
self.exc_type = exc_info[0] or (type(error) if error is not None else None)
self.exc_value = exc_info[1] or error
self.exc_traceback = exc_info[2] or (error.__traceback__ if error is not None else None)

inner_exception always holds the explicitly passed error=. But the exc_type / exc_value / exc_traceback triple prefers whatever sys.exc_info() returns at construction time and only falls back to error when the ambient tuple is empty. So inside any active except block, the in-flight exception wins over the argument the caller deliberately supplied.

A quick reproduction:

class A(Exception): ...
class B(Exception): ...

try:
    raise A("unrelated in-flight error")
except A:
    e = SdkError("wrap", error=B("the real cause"))

e.inner_exception   # B('the real cause')   <- the explicit cause
e.exc_value         # A('unrelated ...')     <- the ambient one wins
e.exc_type          # <class 'A'>
e.__cause__         # None                    <- SdkError sets no chaining itself

The result: a single SdkError instance carries two parallel, potentially-contradicting views of "what caused this" (inner_exception vs the exc_* triple), and the triple can be attributed to an exception that has nothing to do with the failure the caller is wrapping.

Trade-off / concern

Two things bother me about this shape:

  1. The precedence is backwards from least-surprise. When a caller goes out of their way to pass error=specific_cause, that is the strongest possible signal of intent. Having it lose to whatever happens to be on the stack two frames up makes the recorded exc_* triple non-deterministic with respect to the call site — the same SdkError(..., error=x) records different causes depending on whether some unrelated except block is active up-stack. Post-mortem tooling that reads exc_traceback can then attribute a failure to an unrelated in-flight exception.

  2. Two sources of truth that we never reconcile. inner_exception and exc_value are both "the cause," and we don't set self.__cause__ at all, so Python's own chaining (__cause__ / __traceback__) is a third view. Today the 60-odd internal raise sites all use the raise SomeError(str(err), error=err) from err pattern, where the caught exception, the error= argument, and the from target all coincide, so the divergence never shows up in our own code. That is exactly what makes it a quiet trap: it is invisible until a consumer (or future SDK code) wraps a deliberately-selected cause that differs from the ambient one.

Proposed direction

Invert the precedence so the explicit argument wins, keeping the auto-capture only as a fallback for the bare raise SdkError(...) convenience:

self.inner_exception = error
exc_info = sys.exc_info()
self.exc_value = error or exc_info[1]
self.exc_type = type(error) if error is not None else exc_info[0]
self.exc_traceback = (error.__traceback__ if error is not None else None) or exc_info[2]

Going a step further, it is worth asking whether the parallel exc_* triple should exist at all. We could set self.__cause__ = error in __init__ (or rely on raise ... from error at the raise sites, which we already do everywhere) and let Python's standard chaining and traceback machinery be the single source of truth, deriving any exc_*-style accessors from __cause__ / __traceback__ rather than snapshotting a separate triple that can drift. That keeps tracebacks, __cause__, and our attributes in agreement by construction.

Trade-offs: flipping the precedence is a behavioral change for any caller relying on ambient-wins — but that reliance is itself fragile and is not documented as intentional. Collapsing onto __cause__ is a larger API change (the public exc_type / exc_value / exc_traceback attributes are part of the documented surface in docs/errors.md), so it would want a deprecation path rather than an outright removal.

Acknowledging the current rationale

docs/errors.md:30 states the intent: "SdkError captures sys.exc_info() at construction time, preserving the original cause even when the SDK re-wraps a stdlib exception," and the exc_* triple deliberately mirrors azure-core's AzureError, which captures the ambient exception so that raise SomeError(message) inside an except block still records the cause without the caller threading it through. That auto-capture convenience is genuinely useful and I do not want to lose it. The point of this issue is narrower: the docs describe capturing the cause but say nothing about the precedence when both an explicit error= and an ambient exception are present, and the current ordering silently overrides the explicit one. Even keeping the auto-capture, making the explicit argument win (and ideally reconciling with __cause__) seems strictly closer to least-surprise. Does anyone recall a reason the ambient tuple should outrank an explicitly supplied cause?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions