Skip to content

Fix record corruption when fields are reordered in table config#150

Merged
larsewi merged 1 commit intomasterfrom
canonical-tuple-order
May 7, 2026
Merged

Fix record corruption when fields are reordered in table config#150
larsewi merged 1 commit intomasterfrom
canonical-tuple-order

Conversation

@larsewi
Copy link
Copy Markdown
Owner

@larsewi larsewi commented May 7, 2026

Summary

  • Tuple identity in Table records (both primary key and subsidiary halves) was tied to the declaration order of fields in tables.toml. Swapping two fields in the config silently changed every row's hash key, breaking delta merging against pre-swap state and historical blocks.
  • Make the tuple layout canonical: sort fields lexicographically by name within the primary-key and subsidiary groups. The same config produces the same tuple identity regardless of how the user wrote the field list.
  • compute_canonical_columns returns each tuple slot as a (column_index, &FieldConfig) pair, so the index and parser can't drift; parse_columns consumes the bundled slice directly.
  • SQL emission still follows wire order. Now that the wire is canonical, generated INSERT column lists and composite-PK WHERE clauses appear in lexicographic order. A follow-up PR will route SQL through the hub's declared field order so users keep their preferred column ordering in generated SQL.

Notes for reviewers

  • accept_composite_pk.rs had its expected SQL updated to reflect the canonical column order (course_id before student_id). The follow-up that reroutes SQL emission will flip these assertions back.
  • Existing on-disk STATE / blocks written before this change carry tuples in declaration order. The first run after upgrade will hit the existing layout-mismatch path in Delta::compute / merge_block_deltas and fall back to a full state payload, so no migration code is needed.

Make tuple identity independent of the order fields are declared in the
config. Previously, the position of each value within a row's primary-key
or subsidiary tuple followed the declaration order in `tables.toml`, so
swapping two fields in the config silently changed every record's hash
key and produced bogus deltas against pre-swap state and blocks.

Replace `compute_key_indices` with `compute_canonical_columns`, which
sorts columns lexicographically by field name within each group and
returns each entry paired with its `FieldConfig`. Bundling index and
parser eliminates the implicit "two sorts must agree" coupling between
indices and configs that the old layout had. SQL emission still follows
the wire's column order, which is now canonical, so generated INSERT
column lists and composite-PK WHERE clauses are emitted in lexicographic
order — a follow-up will reroute SQL emission through the hub's declared
order so users keep their preferred column ordering in generated SQL.

Signed-off-by: Lars Erik Wik <lars.erik.wik@northern.tech>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@larsewi larsewi added the bug Bug fix label May 7, 2026
@larsewi larsewi merged commit cfbac8c into master May 7, 2026
6 checks passed
@larsewi larsewi deleted the canonical-tuple-order branch May 7, 2026 09:27
@larsewi larsewi changed the title Fix record corruption when fields are reordered in tables.toml Fix record corruption when fields are reordered in table config May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant