Skip to content

No shared contract between source output and sink input format #3174

@atharvalade

Description

@atharvalade

The Postgres source wraps all row data in a DatabaseRecord envelope:

{
  "table_name": "tpch.lineitem",
  "operation_type": "SELECT",
  "timestamp": "2026-04-24T19:57:00Z",
  "data": {"id": 1, "l_orderkey": 123, ...},
  "old_data": null
}

The Iceberg sink's Arrow JSON reader expects flat JSON matching the target table schema:

{"id": 1, "l_orderkey": 123, ...}

These two formats are fundamentally incompatible. There is no shared message schema spec or envelope definition between sources and sinks. Connecting the Postgres source to the Iceberg sink without the (newly added) flat_json_output = true flag produces silent failures because Arrow maps the nested data object to the top-level id field as null, then fails on non-nullable constraints.

Fix: Either define a standard connector message envelope that all sinks know how to unwrap, or have the sink explicitly handle the DatabaseRecord format (extract the data field before Arrow conversion). The current workaround (flat_json_output) is a band-aid that puts the burden on the user to know about the incompatibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions