fix(flink): route dead-letter via yield (PyFlink has no ctx.output) — R4 follow-up#124
Merged
Merged
Conversation
…tput ValidateAndEnrich routed invalid events with `ctx.output(DEAD_LETTER_TAG, …)` — the Java side-output API. PyFlink's ProcessFunction context (InternalProcessFunctionContext) has no `.output()`, so every invalid event raised `AttributeError` and the dead-letter path was broken (documented in docs/perf/freshness-realpath-2026-06-30.md from the #122 real-path run; valid events were unaffected because they never hit ctx.output). PyFlink emits side outputs by *yielding* `(OutputTag, value)`; the framework distinguishes main from side output by the first tuple element's type, so the main `(event_id, payload)` yield stays unambiguous. All four dead-letter emits (parse / cdc-normalization / schema / semantic failures) now yield the tag. Verified end-to-end on the live Flink cluster (pyflink 2.2.1, real Kafka→Flink→Kafka path on the Mac, job RUNNING): a malformed-JSON event lands on events.deadletter with stage=parse, a schema-invalid event lands with stage=schema_validation, and a schema-valid event still flows to events.validated (main output intact). Before the fix the dead-letter topic received nothing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DORA Metrics
|
…ut fake The stream_processor unit tests routed invalid events through a `_FakeProcessContext.output()` stub and asserted on `ctx.outputs` — but real pyflink has no `ctx.output()`, so that fake masked the very AttributeError the yield fix addresses (the dead-letter path looked tested while it was broken on the cluster). The fake now omits `output` entirely (a regression to ctx.output fails loudly), and the three DLQ tests assert the dead-letter is *yielded* as `(DEAD_LETTER_TAG, payload)`; the two valid-path tests assert no DLQ tuple is emitted. 20 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
brownjuly2003-code
added a commit
that referenced
this pull request
Jun 30, 2026
…eak + 1 LOW DDL race) (#125) * fix(security): mask PII renamed above an inner SELECT * (D2 #123 residual) The #123 D2 fix resolves projection lineage so a PII column renamed through a subquery/CTE is masked by what it is built from. But when an inner `SELECT *` sits *below* the rename — e.g. `SELECT c FROM (SELECT email AS c FROM (SELECT * FROM users_enriched) z) t` — sqlglot.lineage walks past the renamed `email` node to the bare `*` leaf and returns a plain `frozenset({'*'})`. That is NOT the `_UnresolvedSources` sentinel (which only fires on a lineage *exception*), so `email` is absent from the source set and the column fails **open** as cleartext with no X-PII-Masked signal; the shallow scan sees only the outer alias `c`. The #123 deep|shallow union only closes a star one level *above* the rename. A `*` lineage leaf means the column could carry any source column of that table, including PII, so treat it as unresolved and fail closed (mask) — the same policy the module already applies on a lineage exception. Regression test (subquery and CTE forms) fails on old code (cleartext, was masked=False) and passes on new; the existing masking/property/mutation suites stay green. Independently reproduced against live DuckDB before and after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(api): lock the webhook delivery-queue lazy DDL (#123 residual) The #123 lock fix serialized the three offloaded read-handler `ensure_*_table` helpers behind `catalog_ddl_lock`, but missed `ensure_webhook_delivery_queue_ table`: the dispatcher runs its lazy `CREATE TABLE IF NOT EXISTS webhook_ delivery_queue` on the shared serving connection from the event loop, while an offloaded read handler runs its own (now-locked) `ensure_*` on a worker thread. Concurrent catalog DDL on one cold DuckDB raises a "Catalog write-write conflict" across *different* tables too, so the cross-table 500-on-cold-restart the #123 fix set out to remove was still reachable through this unlocked site. Wrap its CREATE in the same shared `catalog_ddl_lock` as its three siblings (internal-wrap pattern; the two callers do not hold the lock, so no nesting). Regression: add `ensure_webhook_delivery_queue_table` to the concurrency harness's ensurer set so both the 32-thread same-table and the cross-table Barrier hammers cover it — both fail on old code (verified: 23+ and 11+ conflicts on the queue table) and pass with the lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: JuliaEdom <uedomskikh@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ValidateAndEnrich.process_elementrouted invalid events withctx.output(DEAD_LETTER_TAG, …)— the Java side-output API. PyFlink'sProcessFunctioncontext (InternalProcessFunctionContext) has no.output(),so every invalid event raised
AttributeErrorand the dead-letter path wassilently broken. Valid events were unaffected (they never hit
ctx.output),which is why it went unnoticed until the #122 real-path run surfaced it
(
docs/perf/freshness-realpath-2026-06-30.md, "Bugs found", item 3).Fix
PyFlink emits side outputs by yielding
(OutputTag, value). The frameworktells main from side output by the first tuple element's type, so the main
(event_id, payload)yield stays unambiguous. All four dead-letter emits(parse / cdc-normalization / schema / semantic failures) now
yieldthe tag.Verification — end-to-end on the live Flink cluster
Built and ran the real stack (
docker-compose.yml+docker-compose.flink.yml,PyFlink 2.2.1, Kafka→Flink→Kafka) on the Mac; job reached RUNNING, then
produced three events to
orders.raw:order.createdevents.validatedevents.deadletter(stage=parse)events.deadletter(stage=schema_validation)Before the fix the dead-letter topic received nothing. (
flink-smokeCI alsoexercises a live job submission.)
This closes the documented R4 follow-up;
road-to-9.8.mdR4 is otherwise ✅.🤖 Generated with Claude Code