The Postgres source wraps all row data in a DatabaseRecord envelope:
{
"table_name": "tpch.lineitem",
"operation_type": "SELECT",
"timestamp": "2026-04-24T19:57:00Z",
"data": {"id": 1, "l_orderkey": 123, ...},
"old_data": null
}
The Iceberg sink's Arrow JSON reader expects flat JSON matching the target table schema:
{"id": 1, "l_orderkey": 123, ...}
These two formats are fundamentally incompatible. There is no shared message schema spec or envelope definition between sources and sinks. Connecting the Postgres source to the Iceberg sink without the (newly added) flat_json_output = true flag produces silent failures because Arrow maps the nested data object to the top-level id field as null, then fails on non-nullable constraints.
Fix: Either define a standard connector message envelope that all sinks know how to unwrap, or have the sink explicitly handle the DatabaseRecord format (extract the data field before Arrow conversion). The current workaround (flat_json_output) is a band-aid that puts the burden on the user to know about the incompatibility.
The Postgres source wraps all row data in a
DatabaseRecordenvelope:{ "table_name": "tpch.lineitem", "operation_type": "SELECT", "timestamp": "2026-04-24T19:57:00Z", "data": {"id": 1, "l_orderkey": 123, ...}, "old_data": null }The Iceberg sink's Arrow JSON reader expects flat JSON matching the target table schema:
{"id": 1, "l_orderkey": 123, ...}These two formats are fundamentally incompatible. There is no shared message schema spec or envelope definition between sources and sinks. Connecting the Postgres source to the Iceberg sink without the (newly added)
flat_json_output = trueflag produces silent failures because Arrow maps the nesteddataobject to the top-levelidfield as null, then fails on non-nullable constraints.Fix: Either define a standard connector message envelope that all sinks know how to unwrap, or have the sink explicitly handle the
DatabaseRecordformat (extract thedatafield before Arrow conversion). The current workaround (flat_json_output) is a band-aid that puts the burden on the user to know about the incompatibility.