Skip to content

Comments

Handle inherited tables with different schemas#3936

Open
lukejudd-lux wants to merge 4 commits intoPeerDB-io:mainfrom
lukejudd-lux:cdc_inherited_tables_fix
Open

Handle inherited tables with different schemas#3936
lukejudd-lux wants to merge 4 commits intoPeerDB-io:mainfrom
lukejudd-lux:cdc_inherited_tables_fix

Conversation

@lukejudd-lux
Copy link
Contributor

@lukejudd-lux lukejudd-lux commented Feb 16, 2026

Summary

Fixes Postgres CDC tuple decoding failures and cross-child data corruption when syncing old-style INHERITS tables where children have additional columns beyond the parent's schema.

Problem

When a child table's RelationMessage arrived, processMessage mutated msg.RelationID to the parent's ID, causing the child's RelationMessage to be stored under the parent's key in relationMessageMapping. This had two consequences:

  1. Cross-child contamination: multiple children overwrote each other's RelationMessage under the same parent key — whichever child sent last determined the column layout used for decoding all children's tuples.
  2. Positional column mismatch: a child with extra columns (e.g. stripe_payment with 20 columns including fk_stripe_payment) had its tuple decoded using the parent's or another child's RelationMessage (e.g. 18 columns), causing columns to be decoded with wrong types.

Solution

  • RelationMessage handling: stop mutating msg.RelationID — store each child's RelationMessage under its own relation ID. Use the parent's ID only for the "do we care?" check and table/schema lookups.
  • Tuple decoding: in processInsertMessage / processUpdateMessage / processDeleteMessage, use the child's actual relation ID (actualRelID) for RelationMessage lookup so column types match the WAL tuple. The parent's ID is used only for table name and destination routing.

Testing

  • TestInheritedTableWithExtraColumns: parent payment (4 columns), child stripe_payment (6 columns including fk_stripe_payment TEXT and stripe_account_id TEXT). Inserts from child with Stripe charge ID ch_3T1KxCFUtwYrZPVC0M6OeTpy — verifies extra columns are decoded as text, not misinterpreted due to positional mismatch.
  • TestMultipleInheritedChildrenNoContamination: two children (stripe_payment, paypal_payment) with different extra columns. Verifies inserts from both are decoded using their own RelationMessages without cross-child contamination.

Fixes #3935

@CLAassistant
Copy link

CLAassistant commented Feb 16, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@heavycrystal heavycrystal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself seems fine, but I'm confused as to what problem this solves. As you noted, Postgres doesn't allow child tables to have a different type for the same column. And since the destination table itself is still using the type of the parent for this column, if the child table is BIGINT instead of TEXT, things can break in a different way. So it's just pushing the error to a different place

@lukejudd-lux
Copy link
Contributor Author

@heavycrystal after looking a little closer its not that they are different types at source

Its that the inherited table has more columns than the parent (as can happen under relkind 'r'). So the columns are out of alignment at parsing time.

I've updated the test to address the real issue, the code changes still address the actual problem.

@lukejudd-lux lukejudd-lux force-pushed the cdc_inherited_tables_fix branch from ef4ed4e to 902c0b9 Compare February 16, 2026 22:55
@lukejudd-lux lukejudd-lux changed the title Handle inherited tables with different types Handle inherited tables with different schemas Feb 16, 2026
@lukejudd-lux
Copy link
Contributor Author

@heavycrystal after more testing I have added another commit to this change

When replicating tables that use PostgreSQL table inheritance (rare I know sorry), child-specific columns were incorrectly detected as schema changes on the parent table. This triggered spurious ALTER TABLE ... ADD COLUMN statements on the destination for columns that don't belong on the parent, which could cause the mirror to hang (as was the case with our BQ dest).

Root cause: processRelationMessage in cdc.go compares incoming RELATION messages against the parent's known schema. For inherited child tables, columns unique to the child (not present on the parent) were misidentified as newly-added parent columns.

Fix: When processing a RELATION message from a child table, query pg_attribute for the parent's actual column set. Columns that exist only on the child are filtered out of schema delta detection. Genuine parent DDL changes (e.g., ALTER TABLE parent ADD COLUMN) are still correctly detected and applied.

IMPACT

For anybody with inherited table setups...
To replicate both parent and child tables with full fidelity, you will have to run two separate mirrors. Child tables must not be in the same mirror as their parent, otherwise their records are routed to the parent and child-specific columns are dropped.

I've tested this locally and it seems to keep everything in sync.

Ultimately, this query should return the same thing on source and destination (with the same schemas if you mirror the whole thing)

select * from parent
select * from child

Sorry Sai...

saisrirampur on Aug 2, 2024 Great question! I'm expecting it to support parent/child tables too as the way we implemented partitioned table support is querying the pg_inherits metadata table - https://github.com/PeerDB-io/peerdb/blob/2d30e5fae887552f93c... However, inheritance (old way of partitioning) isn't a common thing with Postgres. Out of 100s of workloads I've seen in the past decade, it came up a couple of times...

@heavycrystal
Copy link
Contributor

@lukejudd-lux child tables having mismatched schemas from the parent table from inheritance is something we've hit before. It is a niche feature that is tricky to support reliably during CDC.

As an example, it's possible for a column to be added and then dropped almost immediately during CDC. This should still create the column on the destination table and populate a few rows with the data. Your change makes it such that if pg_attribute doesn't contain the column when querying, the schema change won't be propagated.

@lukejudd-lux
Copy link
Contributor Author

@heavycrystal yeh I can see what you are saying

The attributes table can never be WAL consistent.

In the example you provide, the current approach would not even include this column (and any data) at the destination.
Could always add and attinhcount = 0 to the attribute query

Scenario Current fix attinhcount
stripe_account_id in RELATION (stable, always existed on child) Filtered (correct) Filtered (correct)
Parent adds foo (stable) Propagated (correct) Propagated (correct)
Parent adds foo, drops it rapidly foo not in parent's pg_attribute → filtered (data loss) foo mangled in child's pg_attribute, not in childOnlyColumns → propagated (correct)
Child adds bar, drops it rapidly bar was never on parent → filtered (correct regardless of race) bar mangled, not in childOnlyColumns → propagated (false positive on parent destination)

I can see tradeoffs any way I look at it. In our setup the likelihood of this happening is close to zero, DDL is always intentional in production.

Previously we have set up streaming query replication on this specific parent table and a CDC mirror on the inherited tables, which seems to avoid this issue entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Postgres CDC: error when syncing inherited tables where child has different schema than parent

3 participants