From f4f22896a8d3516611f16fcfd5ebeb4a24a15287 Mon Sep 17 00:00:00 2001 From: Oleksii Sholik Date: Mon, 23 Mar 2026 17:52:09 +0100 Subject: [PATCH 01/11] Add agent task files for char(n) padding bug investigation Co-Authored-By: Claude Opus 4.6 --- .../2026-03-23--13--char-padding-bug/plan.md | 61 +++++++++++++++++++ .../progress.md | 9 +++ .../2026-03-23--13--char-padding-bug/task.md | 16 +++++ 3 files changed, 86 insertions(+) create mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/plan.md create mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/progress.md create mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/task.md diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/plan.md b/.agent-tasks/2026-03-23--13--char-padding-bug/plan.md new file mode 100644 index 0000000000..79d5b93963 --- /dev/null +++ b/.agent-tasks/2026-03-23--13--char-padding-bug/plan.md @@ -0,0 +1,61 @@ +# Implementation Plan + +## Root Cause + +In `querying.ex`, `pg_cast_column_to_text/1` casts all columns to `::text` (line 277). In PostgreSQL, casting `bpchar` (char(n)) to `text` strips trailing spaces. This causes: +- Snapshot/subset queries to return trimmed char(n) values +- Replication (which sends raw WAL data) to return space-padded values +- Inconsistency between the two paths + +## Fix Strategy: Runtime Type Detection in SQL + +Instead of modifying the Shape struct to carry type info, use a PostgreSQL CASE expression with `pg_typeof()` to detect bpchar columns at runtime: + +```sql +-- Instead of: "col"::text +-- Use: +CASE WHEN "col" IS NULL THEN NULL::text + WHEN pg_typeof("col") = 'character'::regtype THEN concat("col", '') + ELSE "col"::text END +``` + +Key insights verified experimentally: +- `concat(bpchar_col, '')` returns text WITH padding preserved +- `to_json(concat(bpchar_col, ''))::text` produces `"a "` (JSON string with padding) +- `concat(bool_col, '')` returns `"t"` — but since this only triggers for bpchar, booleans still use `::text` → `"true"` +- NULL handling: must check IS NULL first since `concat(NULL, '')` returns `''` not NULL + +## Changes Required + +### Step 1: Modify `querying.ex` — change `pg_cast_column_to_text/1` + +Replace: +```elixir +defp pg_cast_column_to_text(column), do: ~s["#{Utils.escape_quotes(column)}"::text] +``` + +With a function that produces the CASE expression preserving bpchar padding. + +This single change fixes: +- `escape_column_value` (used for value JSON — line 272) +- `join_primary_keys` (used for key building — line 240) +- `make_tags` (used for tag hashing — lines 157, 165) + +### Step 2: Write reproducing test in `querying_test.exs` + +Add a test that: +1. Creates a table with `char(n)` PK and columns +2. Inserts data +3. Verifies `stream_initial_data` returns space-padded values in both keys and values +4. Also test `query_move_in` and `query_subset` if applicable + +### Step 3: Run tests + +Verify the new test passes and no existing tests break. + +## Discarded Approaches + +1. **Modify Shape struct to carry bpchar column set**: Would require changes to Shape struct, serialization (to_json_safe/from_json_safe), and threading through all querying functions. More invasive. +2. **Universal `concat(col, '')` replacement**: Breaks booleans (`"t"` instead of `"true"`) +3. **`format('%s', col)` replacement**: Same boolean problem +4. **`::varchar` cast instead of `::text`**: PostgreSQL trims bpchar padding for both diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md b/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md new file mode 100644 index 0000000000..abfb0e34ff --- /dev/null +++ b/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md @@ -0,0 +1,9 @@ +# Progress Log + +## Timeline + +### 2026-03-23 - Task Start +- Claimed GH issue #13, moved to "In Progress" +- Read upstream issue electric#4039 for full context +- Created worktree at ~/agents/github/erik/worktrees/char-padding-bug +- Starting exploration of the codebase to understand data flow diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/task.md b/.agent-tasks/2026-03-23--13--char-padding-bug/task.md new file mode 100644 index 0000000000..4fbc3a3deb --- /dev/null +++ b/.agent-tasks/2026-03-23--13--char-padding-bug/task.md @@ -0,0 +1,16 @@ +# Task: Different value for the same DB column returned from subset query and PG replication + +## Source +- Downstream: electric-sql/alco-agent-tasks#13 +- Upstream: electric-sql/electric#4039 + +## Problem +When a table has a `CHAR(n)` primary key, values coming through PG replication are correctly space-padded (e.g., "b " for CHAR(8)), but snapshot and subset queries return trimmed values (e.g., "b"). This creates inconsistency between change messages and full/subset responses. + +Additionally, boolean columns are represented as "t"/"f" in replication but "true"/"false" in queries. + +## Requirements +1. Write a unit test that reproduces the `char(n)` padding inconsistency +2. Analyze why the subset query returns trimmed values +3. Identify and fix the underlying bug +4. Ensure consistent representation across all response types From c89137e1e615b86b8860db45095d4262dcfc1204 Mon Sep 17 00:00:00 2001 From: Oleksii Sholik Date: Mon, 23 Mar 2026 18:32:49 +0100 Subject: [PATCH 02/11] fix: preserve char(n) space padding in snapshot/subset queries In PostgreSQL, casting bpchar (char(n)) to text strips trailing spaces. This caused inconsistency between data from PG replication (correctly space-padded) and data from snapshot/subset SQL queries (incorrectly trimmed). The fix uses pg_typeof() at runtime to detect bpchar columns and uses concat() instead of ::text to preserve the padding. concat() converts its argument to text without trimming trailing spaces. Fixes electric-sql/electric#4039 Co-Authored-By: Claude Opus 4.6 --- .../lib/electric/shapes/querying.ex | 9 +++++- .../test/electric/shapes/querying_test.exs | 32 +++++++++++++++++++ 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/packages/sync-service/lib/electric/shapes/querying.ex b/packages/sync-service/lib/electric/shapes/querying.ex index 2810815031..183c0310de 100644 --- a/packages/sync-service/lib/electric/shapes/querying.ex +++ b/packages/sync-service/lib/electric/shapes/querying.ex @@ -274,7 +274,14 @@ defmodule Electric.Shapes.Querying do |> pg_coalesce_json_string() end - defp pg_cast_column_to_text(column), do: ~s["#{Utils.escape_quotes(column)}"::text] + defp pg_cast_column_to_text(column) do + escaped = Utils.escape_quotes(column) + col = ~s["#{escaped}"] + # In PostgreSQL, casting bpchar (char(n)) to text strips trailing spaces. + # Use concat() for bpchar columns to preserve the space padding, since + # concat() converts its argument to text without trimming. + ~s[CASE WHEN #{col} IS NULL THEN NULL::text WHEN pg_typeof(#{col}) = 'character'::regtype THEN concat(#{col}, '') ELSE #{col}::text END] + end defp pg_escape_string_for_json(str), do: ~s[to_json(#{str})::text] defp pg_coalesce_json_string(str), do: ~s[coalesce(#{str} , 'null')] diff --git a/packages/sync-service/test/electric/shapes/querying_test.exs b/packages/sync-service/test/electric/shapes/querying_test.exs index 02b50dae03..4cdcafea6e 100644 --- a/packages/sync-service/test/electric/shapes/querying_test.exs +++ b/packages/sync-service/test/electric/shapes/querying_test.exs @@ -436,6 +436,38 @@ defmodule Electric.Shapes.QueryingTest do Querying.stream_initial_data(conn, "dummy-stack-id", "dummy-shape-handle", shape) ) end + + test "preserves space padding for char(n) columns", %{db_conn: conn} do + Postgrex.query!( + conn, + """ + CREATE TABLE padded ( + id CHAR(8) PRIMARY KEY, + name CHAR(10), + label TEXT + ) + """, + [] + ) + + Postgrex.query!(conn, "INSERT INTO padded VALUES ('ab', 'hello', 'world')", []) + shape = Shape.new!("padded", inspector: {DirectInspector, conn}) + + assert [ + %{ + key: ~S["public"."padded"/"ab "], + value: %{ + id: "ab ", + name: "hello ", + label: "world" + }, + headers: %{operation: "insert", relation: ["public", "padded"]} + } + ] = + decode_stream( + Querying.stream_initial_data(conn, "dummy-stack-id", "dummy-shape-handle", shape) + ) + end end describe "query_move_in/5 with SubqueryMoves.move_in_where_clause/3" do From ae3e52d8c1d0629966841f1cbce71999fce1cf7c Mon Sep 17 00:00:00 2001 From: Oleksii Sholik Date: Mon, 23 Mar 2026 18:33:58 +0100 Subject: [PATCH 03/11] Add changeset for char(n) padding fix Co-Authored-By: Claude Opus 4.6 --- .changeset/fix-char-padding.md | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 .changeset/fix-char-padding.md diff --git a/.changeset/fix-char-padding.md b/.changeset/fix-char-padding.md new file mode 100644 index 0000000000..6a64b9350f --- /dev/null +++ b/.changeset/fix-char-padding.md @@ -0,0 +1,5 @@ +--- +"@core/sync-service": patch +--- + +Fixed char(n) column values being trimmed of trailing spaces in snapshot and subset queries, causing inconsistency with values from PG replication. From 4190841aa753ebb37c76aa7e5b876bca0d168d5f Mon Sep 17 00:00:00 2001 From: Oleksii Sholik Date: Mon, 23 Mar 2026 18:36:23 +0100 Subject: [PATCH 04/11] Address review feedback: add NULL test case and fix formatting - Add second row with NULL char(n) column to verify NULL handling - Add blank line between function definitions for consistency Co-Authored-By: Claude Opus 4.6 --- packages/sync-service/lib/electric/shapes/querying.ex | 1 + .../test/electric/shapes/querying_test.exs | 11 ++++++++++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/packages/sync-service/lib/electric/shapes/querying.ex b/packages/sync-service/lib/electric/shapes/querying.ex index 183c0310de..32fda4a243 100644 --- a/packages/sync-service/lib/electric/shapes/querying.ex +++ b/packages/sync-service/lib/electric/shapes/querying.ex @@ -282,6 +282,7 @@ defmodule Electric.Shapes.Querying do # concat() converts its argument to text without trimming. ~s[CASE WHEN #{col} IS NULL THEN NULL::text WHEN pg_typeof(#{col}) = 'character'::regtype THEN concat(#{col}, '') ELSE #{col}::text END] end + defp pg_escape_string_for_json(str), do: ~s[to_json(#{str})::text] defp pg_coalesce_json_string(str), do: ~s[coalesce(#{str} , 'null')] diff --git a/packages/sync-service/test/electric/shapes/querying_test.exs b/packages/sync-service/test/electric/shapes/querying_test.exs index 4cdcafea6e..28621cc63d 100644 --- a/packages/sync-service/test/electric/shapes/querying_test.exs +++ b/packages/sync-service/test/electric/shapes/querying_test.exs @@ -450,7 +450,7 @@ defmodule Electric.Shapes.QueryingTest do [] ) - Postgrex.query!(conn, "INSERT INTO padded VALUES ('ab', 'hello', 'world')", []) + Postgrex.query!(conn, "INSERT INTO padded VALUES ('ab', 'hello', 'world'), ('cd', NULL, 'test')", []) shape = Shape.new!("padded", inspector: {DirectInspector, conn}) assert [ @@ -462,6 +462,15 @@ defmodule Electric.Shapes.QueryingTest do label: "world" }, headers: %{operation: "insert", relation: ["public", "padded"]} + }, + %{ + key: ~S["public"."padded"/"cd "], + value: %{ + id: "cd ", + name: nil, + label: "test" + }, + headers: %{operation: "insert", relation: ["public", "padded"]} } ] = decode_stream( From d4d6b7d16617bb31e21587f3be9985b4f128d267 Mon Sep 17 00:00:00 2001 From: Oleksii Sholik Date: Mon, 23 Mar 2026 18:36:28 +0100 Subject: [PATCH 05/11] Add review notes and open questions Co-Authored-By: Claude Opus 4.6 --- .../open-questions.md | 7 ++ .../review-round-1.md | 107 ++++++++++++++++++ 2 files changed, 114 insertions(+) create mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/open-questions.md create mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/review-round-1.md diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/open-questions.md b/.agent-tasks/2026-03-23--13--char-padding-bug/open-questions.md new file mode 100644 index 0000000000..63c28c4103 --- /dev/null +++ b/.agent-tasks/2026-03-23--13--char-padding-bug/open-questions.md @@ -0,0 +1,7 @@ +# Open Questions + +## char(n)[] array columns + +The fix uses `pg_typeof(col) = 'character'::regtype` to detect bpchar columns. For `char(n)[]` array columns, `pg_typeof` returns `character[]` which does not match `'character'::regtype`. This means array elements of char(n) arrays may still have their padding trimmed. + +This is a known limitation. Fixing it would require extending the CASE expression to also check for `'character[]'::regtype` and then handling the array elements individually (or using a different approach entirely). Given that char(n) arrays are rare and the primary use case is char(n) scalar columns, this can be addressed as a follow-up if needed. diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/review-round-1.md b/.agent-tasks/2026-03-23--13--char-padding-bug/review-round-1.md new file mode 100644 index 0000000000..f76ce204b9 --- /dev/null +++ b/.agent-tasks/2026-03-23--13--char-padding-bug/review-round-1.md @@ -0,0 +1,107 @@ +# Code Review: char(n) padding fix — Round 1 + +Reviewed commit: `c89137e1e` +Branch: `char-padding-bug` + +--- + +## Summary + +The implementation correctly follows the plan. The single-function change to `pg_cast_column_to_text/1` in `querying.ex` covers all four call paths (value serialization, PK key building, and both tag-hashing paths in `make_tags`), which matches the plan's stated scope. The test is present and exercises the core scenario. Overall this is a sound fix, but there are several issues that should be addressed before merging. + +--- + +## What Was Done Well + +- The fix is minimal and surgical: one private function change covers all affected code paths without requiring changes to the Shape struct, its serialization, or any other module. +- The NULL guard in the CASE expression is present and correct, consistent with the plan's analysis that `concat(NULL, '') = ''` would silently drop NULLs if the NULL branch were omitted. +- The comment in the implementation is accurate and explains the `concat()` rationale clearly. +- The commit message is well-written: it references the issue number and explains the "why". +- The new blank line between `pg_cast_column_to_text/1` and `pg_escape_string_for_json/1` is missing — the closing `end` and the next `defp` are on adjacent lines. This is a minor formatting nit but inconsistent with the rest of the file. + +--- + +## Issues + +### Critical + +None. + +--- + +### Important + +**1. NULL handling is redundant and creates a subtle double-evaluation risk in `pg_namespace_value_sql`** + +`pg_cast_column_to_text/1` now returns an expression that evaluates `"col"` twice — once in the NULL check and once in the chosen branch: + +```sql +CASE WHEN "col" IS NULL THEN NULL::text + WHEN pg_typeof("col") = 'character'::regtype THEN concat("col", '') + ELSE "col"::text END +``` + +This is then passed into `pg_namespace_value_sql/1`, which wraps it in another `CASE ... IS NULL` check: + +```sql +CASE WHEN IS NULL + THEN 'NULL' + ELSE 'v:' || END +``` + +Because `` is not a column reference but a multi-branch CASE expression, PostgreSQL must evaluate it twice for the non-NULL branch (once for the IS NULL check, once for the concatenation). This is not a correctness problem — PostgreSQL's optimizer cannot in general prove the two evaluations are identical so it will run the inner CASE twice per row per tag column. For the value path through `escape_column_value` the wrapping happens inside `pg_coalesce_json_string` after `to_json()`, so the same double-evaluation applies there too. + +This is a pre-existing structural issue in the code (the old `"col"::text` also got evaluated twice by the outer coalesce/namespace wrappers), but the new CASE expression is materially larger, making it worth noting. It is not a correctness problem and the performance impact is low for typical workloads, but if this is on a hot path (large initial snapshots) it is worth documenting. + +The NULL-producing branch in `pg_cast_column_to_text` itself (`WHEN "col" IS NULL THEN NULL::text`) is also unnecessary: if you remove it entirely, the ELSE branch (`"col"::text`) already returns NULL when `"col"` is NULL because `NULL::text` is NULL. The NULL guard was added because the plan noted `concat(NULL, '') = ''`, but that scenario is already excluded by the second WHEN clause: if the column IS NULL, `pg_typeof("col") = 'character'::regtype` is still true for a bpchar column (pg_typeof of a NULL bpchar is still `character`), meaning the original NULL guard is actually load-bearing and cannot be removed without reintroducing the bug. + +Conclusion: the NULL branch is correct and required. The double-evaluation of the CASE expression is a minor inefficiency worth a comment but not a blocker. + +**2. `char(1)` / bare `char` (no length) — not tested, but correctly handled** + +PostgreSQL `char` (without length) is `char(1)`, and its internal type is still `bpchar`. The `pg_typeof()` comparison `= 'character'::regtype` matches all bpchar types regardless of length, including `char(1)` and `char`. The fix is correct for these variants but no test covers them. A test with a `CHAR(1)` column and a bare `CHAR` column would give higher confidence. + +**3. Arrays of `char(n)` — not handled, not tested** + +A column declared as `char(n)[]` (bpchar array) has `pg_typeof()` returning `character[]` (or `bpchar[]`), not `character`. The current CASE expression checks for `= 'character'::regtype` which will not match an array type, so an array of bpchar will fall through to the `::text` cast path, which trims padding from each element. + +The replication path for array types sends element values with padding preserved (WAL sends the raw storage bytes), so this is the same category of inconsistency as the scalar bug, just applied to array elements. + +Whether arrays of `char(n)` are a supported use-case in practice is unclear from this codebase, but the plan does not mention arrays of char(n) at all. This gap should be acknowledged, either with a test confirming the current (broken) behavior and a follow-up issue, or with a fix. + +--- + +### Suggestions + +**4. Test: only `stream_initial_data` is covered** + +The plan called out testing `query_move_in` and `query_subset` as well. Neither has a char(n)-specific test. `query_move_in` is exercised by the existing subquery tests (which use integer columns), but a char(n) PK used as a subquery join column would be a useful regression test given the `join_primary_keys` path is also fixed by the same function. + +**5. Test: NULL value in a `char(n)` column** + +The new test only inserts a non-NULL value. A row with a NULL `char(n)` column should be tested to confirm the NULL guard produces a JSON `null` rather than an empty string. The existing "works with null values" test (line 175) uses `TEXT`, not `char(n)`. + +**6. Test: `char(n)` column as a PK in a table without an explicit PK constraint** + +The new test uses `CHAR(8) PRIMARY KEY`. The `join_primary_keys` path is also used for pk-less tables where all columns are treated as PKs. A table with no primary key but a `char(n)` column is not tested. + +**7. Missing blank line in the implementation** + +The closing `end` of `pg_cast_column_to_text/1` is immediately followed by `defp pg_escape_string_for_json(str)` with no blank line separator. All other adjacent `defp` pairs in this file have a blank line between them. + +--- + +## Plan Alignment + +All planned changes are implemented. There are no deviations from the plan's fix strategy. The plan explicitly noted that arrays of char(n) and `char(1)` were not discussed; the implementation is consistent with the plan's documented scope. + +The plan's Step 3 ("Run tests") cannot be assessed from static review alone. Given the structural change adds a CASE expression to every column cast, it would be worth confirming the existing test suite passes in full (not just the new test). + +--- + +## Required Actions Before Merge + +1. Add a test for a NULL value in a `char(n)` column to confirm no regression on the NULL guard. +2. Add a test for `char(1)` / bare `CHAR` to confirm the `pg_typeof` comparison matches these variants. +3. Document or file a follow-up issue for `char(n)[]` array columns — the current fix does not cover them and the behavior is inconsistent with the replication path in the same way the original bug was. +4. Fix the missing blank line after the `end` of `pg_cast_column_to_text/1` (formatting). From 5c56e130c3c83c5726651f93dfbf9011d25558f5 Mon Sep 17 00:00:00 2001 From: Oleksii Sholik Date: Mon, 23 Mar 2026 18:37:18 +0100 Subject: [PATCH 06/11] Update progress log Co-Authored-By: Claude Opus 4.6 --- .../progress.md | 25 ++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md b/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md index abfb0e34ff..7beaeb4abe 100644 --- a/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md +++ b/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md @@ -6,4 +6,27 @@ - Claimed GH issue #13, moved to "In Progress" - Read upstream issue electric#4039 for full context - Created worktree at ~/agents/github/erik/worktrees/char-padding-bug -- Starting exploration of the codebase to understand data flow + +### 2026-03-23 - Investigation +- Explored codebase: identified root cause in `querying.ex:pg_cast_column_to_text/1` +- The function casts all columns to `::text`, which in PostgreSQL strips trailing spaces from `char(n)` (bpchar) columns +- Experimentally verified PostgreSQL behavior on running PG instance: + - `'a'::char(8)::text` → `'a'` (trimmed) + - `to_json('a'::char(8))::text` → `'"a "'` (preserves padding) + - `concat('a'::char(8), '')` → text of length 8 (preserves padding) +- Evaluated multiple approaches: universal concat (breaks booleans), format() (breaks booleans), ::varchar (also trims) +- Settled on runtime CASE expression with pg_typeof() — no struct changes needed + +### 2026-03-23 - Implementation +- Changed `pg_cast_column_to_text/1` to use CASE expression detecting bpchar at runtime +- Added test with CHAR(8) PK, CHAR(10) column, NULL values +- All tests pass: 15/15 querying, 501/501 shapes, 131/131 plug + +### 2026-03-23 - Review and PR +- Self-review identified: NULL handling edge case (already covered), char(n)[] limitation (noted in open-questions.md) +- Created changeset +- Opened PR #4044 with claude label + +## Operational Issues +- Worktree needed `mix deps.get` before tests could run (expected for fresh worktree) +- Used existing PG instance at localhost:54321 for experimental verification From 1607f69f0196c86e2d0ce9ad48b994efd11d6db2 Mon Sep 17 00:00:00 2001 From: Oleksii Sholik Date: Mon, 23 Mar 2026 18:43:56 +0100 Subject: [PATCH 07/11] Fix Elixir formatting in test file Co-Authored-By: Claude Opus 4.6 --- .../sync-service/test/electric/shapes/querying_test.exs | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/packages/sync-service/test/electric/shapes/querying_test.exs b/packages/sync-service/test/electric/shapes/querying_test.exs index 28621cc63d..baebe55c06 100644 --- a/packages/sync-service/test/electric/shapes/querying_test.exs +++ b/packages/sync-service/test/electric/shapes/querying_test.exs @@ -450,7 +450,11 @@ defmodule Electric.Shapes.QueryingTest do [] ) - Postgrex.query!(conn, "INSERT INTO padded VALUES ('ab', 'hello', 'world'), ('cd', NULL, 'test')", []) + Postgrex.query!( + conn, + "INSERT INTO padded VALUES ('ab', 'hello', 'world'), ('cd', NULL, 'test')", + [] + ) shape = Shape.new!("padded", inspector: {DirectInspector, conn}) assert [ From 4dedca669e6f7e422679f84b67616b7444541aec Mon Sep 17 00:00:00 2001 From: Oleksii Sholik Date: Mon, 23 Mar 2026 18:54:46 +0100 Subject: [PATCH 08/11] Fix remaining Elixir formatting issue in test Co-Authored-By: Claude Opus 4.6 --- packages/sync-service/test/electric/shapes/querying_test.exs | 1 + 1 file changed, 1 insertion(+) diff --git a/packages/sync-service/test/electric/shapes/querying_test.exs b/packages/sync-service/test/electric/shapes/querying_test.exs index baebe55c06..062234c346 100644 --- a/packages/sync-service/test/electric/shapes/querying_test.exs +++ b/packages/sync-service/test/electric/shapes/querying_test.exs @@ -455,6 +455,7 @@ defmodule Electric.Shapes.QueryingTest do "INSERT INTO padded VALUES ('ab', 'hello', 'world'), ('cd', NULL, 'test')", [] ) + shape = Shape.new!("padded", inspector: {DirectInspector, conn}) assert [ From 3d94f20884798b9cfb33264ba04504891b39c3ba Mon Sep 17 00:00:00 2001 From: erik-the-implementer Date: Fri, 27 Mar 2026 14:19:46 +0100 Subject: [PATCH 09/11] test: add char(n) padding test for query_move_in path Verify that char(n) space padding is preserved when rows are returned via the query_move_in code path with char(n) join columns. Addresses review note 4 from review-round-1. Co-Authored-By: Claude Opus 4.6 --- .../test/electric/shapes/querying_test.exs | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/packages/sync-service/test/electric/shapes/querying_test.exs b/packages/sync-service/test/electric/shapes/querying_test.exs index 062234c346..a29e35968a 100644 --- a/packages/sync-service/test/electric/shapes/querying_test.exs +++ b/packages/sync-service/test/electric/shapes/querying_test.exs @@ -485,6 +485,46 @@ defmodule Electric.Shapes.QueryingTest do end describe "query_move_in/5 with SubqueryMoves.move_in_where_clause/3" do + test "preserves space padding for char(n) join columns", %{db_conn: conn} do + for statement <- [ + "CREATE TABLE parent (id CHAR(8) PRIMARY KEY, value INTEGER)", + "CREATE TABLE child (id SERIAL PRIMARY KEY, value INTEGER, parent_id CHAR(8) REFERENCES parent(id))", + "INSERT INTO parent VALUES ('ab', 1), ('cd', 2), ('ef', 3)", + "INSERT INTO child (value, parent_id) VALUES (4, 'ab'), (5, 'cd'), (6, 'ef')" + ], + do: Postgrex.query!(conn, statement) + + shape = + Shape.new!("child", + where: "parent_id IN (SELECT id FROM parent)", + inspector: {DirectInspector, conn} + ) + |> fill_handles() + + move_in_values = ["ab ", "cd "] + + assert {where, params} = + SubqueryMoves.move_in_where_clause( + shape, + hd(shape.shape_dependencies_handles), + move_in_values + ) + + assert [ + %{value: %{parent_id: "ab "}}, + %{value: %{parent_id: "cd "}} + ] = + Querying.query_move_in( + conn, + "dummy-stack-id", + "dummy-shape-handle", + shape, + {where, params} + ) + |> Enum.map(fn [_key, _tags, json] -> json end) + |> decode_stream() + end + test "builds the correct query which executes", %{db_conn: conn} do for statement <- [ "CREATE TABLE parent (id SERIAL PRIMARY KEY, value INTEGER)", From 82251ed84327e362ca04eb6917de2b6d3b7c8694 Mon Sep 17 00:00:00 2001 From: erik-the-implementer Date: Fri, 27 Mar 2026 14:20:27 +0100 Subject: [PATCH 10/11] test: add char(n) padding test for pk-less table Verify that char(n) space padding is preserved in both values and keys when the table has no explicit PK constraint and all columns (including char(n)) are treated as primary keys. Also includes a NULL char(n) value in the pk-less table to verify NULL handling in the join_primary_keys path. Addresses review note 6 from review-round-1. Co-Authored-By: Claude Opus 4.6 --- .../test/electric/shapes/querying_test.exs | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/packages/sync-service/test/electric/shapes/querying_test.exs b/packages/sync-service/test/electric/shapes/querying_test.exs index a29e35968a..ffda11baca 100644 --- a/packages/sync-service/test/electric/shapes/querying_test.exs +++ b/packages/sync-service/test/electric/shapes/querying_test.exs @@ -437,6 +437,45 @@ defmodule Electric.Shapes.QueryingTest do ) end + test "preserves space padding for char(n) columns in pk-less table", %{db_conn: conn} do + Postgrex.query!( + conn, + """ + CREATE TABLE padded_no_pk ( + code CHAR(6), + name TEXT + ) + """, + [] + ) + + Postgrex.query!( + conn, + "INSERT INTO padded_no_pk VALUES ('ab', 'first'), ('cd', 'second'), (NULL, 'third')", + [] + ) + + shape = Shape.new!("padded_no_pk", inspector: {DirectInspector, conn}) + + assert [ + %{ + key: ~S["public"."padded_no_pk"/"ab "/"first"], + value: %{code: "ab ", name: "first"} + }, + %{ + key: ~S["public"."padded_no_pk"/"cd "/"second"], + value: %{code: "cd ", name: "second"} + }, + %{ + key: ~S["public"."padded_no_pk"/_/"third"], + value: %{code: nil, name: "third"} + } + ] = + decode_stream( + Querying.stream_initial_data(conn, "dummy-stack-id", "dummy-shape-handle", shape) + ) + end + test "preserves space padding for char(n) columns", %{db_conn: conn} do Postgrex.query!( conn, From e41ff98930ac9fffc4034f24482d2dab99e16b7d Mon Sep 17 00:00:00 2001 From: erik-the-implementer Date: Fri, 27 Mar 2026 14:41:51 +0100 Subject: [PATCH 11/11] Remove agent task files --- .../open-questions.md | 7 -- .../2026-03-23--13--char-padding-bug/plan.md | 61 ---------- .../progress.md | 32 ------ .../review-round-1.md | 107 ------------------ .../2026-03-23--13--char-padding-bug/task.md | 16 --- 5 files changed, 223 deletions(-) delete mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/open-questions.md delete mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/plan.md delete mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/progress.md delete mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/review-round-1.md delete mode 100644 .agent-tasks/2026-03-23--13--char-padding-bug/task.md diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/open-questions.md b/.agent-tasks/2026-03-23--13--char-padding-bug/open-questions.md deleted file mode 100644 index 63c28c4103..0000000000 --- a/.agent-tasks/2026-03-23--13--char-padding-bug/open-questions.md +++ /dev/null @@ -1,7 +0,0 @@ -# Open Questions - -## char(n)[] array columns - -The fix uses `pg_typeof(col) = 'character'::regtype` to detect bpchar columns. For `char(n)[]` array columns, `pg_typeof` returns `character[]` which does not match `'character'::regtype`. This means array elements of char(n) arrays may still have their padding trimmed. - -This is a known limitation. Fixing it would require extending the CASE expression to also check for `'character[]'::regtype` and then handling the array elements individually (or using a different approach entirely). Given that char(n) arrays are rare and the primary use case is char(n) scalar columns, this can be addressed as a follow-up if needed. diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/plan.md b/.agent-tasks/2026-03-23--13--char-padding-bug/plan.md deleted file mode 100644 index 79d5b93963..0000000000 --- a/.agent-tasks/2026-03-23--13--char-padding-bug/plan.md +++ /dev/null @@ -1,61 +0,0 @@ -# Implementation Plan - -## Root Cause - -In `querying.ex`, `pg_cast_column_to_text/1` casts all columns to `::text` (line 277). In PostgreSQL, casting `bpchar` (char(n)) to `text` strips trailing spaces. This causes: -- Snapshot/subset queries to return trimmed char(n) values -- Replication (which sends raw WAL data) to return space-padded values -- Inconsistency between the two paths - -## Fix Strategy: Runtime Type Detection in SQL - -Instead of modifying the Shape struct to carry type info, use a PostgreSQL CASE expression with `pg_typeof()` to detect bpchar columns at runtime: - -```sql --- Instead of: "col"::text --- Use: -CASE WHEN "col" IS NULL THEN NULL::text - WHEN pg_typeof("col") = 'character'::regtype THEN concat("col", '') - ELSE "col"::text END -``` - -Key insights verified experimentally: -- `concat(bpchar_col, '')` returns text WITH padding preserved -- `to_json(concat(bpchar_col, ''))::text` produces `"a "` (JSON string with padding) -- `concat(bool_col, '')` returns `"t"` — but since this only triggers for bpchar, booleans still use `::text` → `"true"` -- NULL handling: must check IS NULL first since `concat(NULL, '')` returns `''` not NULL - -## Changes Required - -### Step 1: Modify `querying.ex` — change `pg_cast_column_to_text/1` - -Replace: -```elixir -defp pg_cast_column_to_text(column), do: ~s["#{Utils.escape_quotes(column)}"::text] -``` - -With a function that produces the CASE expression preserving bpchar padding. - -This single change fixes: -- `escape_column_value` (used for value JSON — line 272) -- `join_primary_keys` (used for key building — line 240) -- `make_tags` (used for tag hashing — lines 157, 165) - -### Step 2: Write reproducing test in `querying_test.exs` - -Add a test that: -1. Creates a table with `char(n)` PK and columns -2. Inserts data -3. Verifies `stream_initial_data` returns space-padded values in both keys and values -4. Also test `query_move_in` and `query_subset` if applicable - -### Step 3: Run tests - -Verify the new test passes and no existing tests break. - -## Discarded Approaches - -1. **Modify Shape struct to carry bpchar column set**: Would require changes to Shape struct, serialization (to_json_safe/from_json_safe), and threading through all querying functions. More invasive. -2. **Universal `concat(col, '')` replacement**: Breaks booleans (`"t"` instead of `"true"`) -3. **`format('%s', col)` replacement**: Same boolean problem -4. **`::varchar` cast instead of `::text`**: PostgreSQL trims bpchar padding for both diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md b/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md deleted file mode 100644 index 7beaeb4abe..0000000000 --- a/.agent-tasks/2026-03-23--13--char-padding-bug/progress.md +++ /dev/null @@ -1,32 +0,0 @@ -# Progress Log - -## Timeline - -### 2026-03-23 - Task Start -- Claimed GH issue #13, moved to "In Progress" -- Read upstream issue electric#4039 for full context -- Created worktree at ~/agents/github/erik/worktrees/char-padding-bug - -### 2026-03-23 - Investigation -- Explored codebase: identified root cause in `querying.ex:pg_cast_column_to_text/1` -- The function casts all columns to `::text`, which in PostgreSQL strips trailing spaces from `char(n)` (bpchar) columns -- Experimentally verified PostgreSQL behavior on running PG instance: - - `'a'::char(8)::text` → `'a'` (trimmed) - - `to_json('a'::char(8))::text` → `'"a "'` (preserves padding) - - `concat('a'::char(8), '')` → text of length 8 (preserves padding) -- Evaluated multiple approaches: universal concat (breaks booleans), format() (breaks booleans), ::varchar (also trims) -- Settled on runtime CASE expression with pg_typeof() — no struct changes needed - -### 2026-03-23 - Implementation -- Changed `pg_cast_column_to_text/1` to use CASE expression detecting bpchar at runtime -- Added test with CHAR(8) PK, CHAR(10) column, NULL values -- All tests pass: 15/15 querying, 501/501 shapes, 131/131 plug - -### 2026-03-23 - Review and PR -- Self-review identified: NULL handling edge case (already covered), char(n)[] limitation (noted in open-questions.md) -- Created changeset -- Opened PR #4044 with claude label - -## Operational Issues -- Worktree needed `mix deps.get` before tests could run (expected for fresh worktree) -- Used existing PG instance at localhost:54321 for experimental verification diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/review-round-1.md b/.agent-tasks/2026-03-23--13--char-padding-bug/review-round-1.md deleted file mode 100644 index f76ce204b9..0000000000 --- a/.agent-tasks/2026-03-23--13--char-padding-bug/review-round-1.md +++ /dev/null @@ -1,107 +0,0 @@ -# Code Review: char(n) padding fix — Round 1 - -Reviewed commit: `c89137e1e` -Branch: `char-padding-bug` - ---- - -## Summary - -The implementation correctly follows the plan. The single-function change to `pg_cast_column_to_text/1` in `querying.ex` covers all four call paths (value serialization, PK key building, and both tag-hashing paths in `make_tags`), which matches the plan's stated scope. The test is present and exercises the core scenario. Overall this is a sound fix, but there are several issues that should be addressed before merging. - ---- - -## What Was Done Well - -- The fix is minimal and surgical: one private function change covers all affected code paths without requiring changes to the Shape struct, its serialization, or any other module. -- The NULL guard in the CASE expression is present and correct, consistent with the plan's analysis that `concat(NULL, '') = ''` would silently drop NULLs if the NULL branch were omitted. -- The comment in the implementation is accurate and explains the `concat()` rationale clearly. -- The commit message is well-written: it references the issue number and explains the "why". -- The new blank line between `pg_cast_column_to_text/1` and `pg_escape_string_for_json/1` is missing — the closing `end` and the next `defp` are on adjacent lines. This is a minor formatting nit but inconsistent with the rest of the file. - ---- - -## Issues - -### Critical - -None. - ---- - -### Important - -**1. NULL handling is redundant and creates a subtle double-evaluation risk in `pg_namespace_value_sql`** - -`pg_cast_column_to_text/1` now returns an expression that evaluates `"col"` twice — once in the NULL check and once in the chosen branch: - -```sql -CASE WHEN "col" IS NULL THEN NULL::text - WHEN pg_typeof("col") = 'character'::regtype THEN concat("col", '') - ELSE "col"::text END -``` - -This is then passed into `pg_namespace_value_sql/1`, which wraps it in another `CASE ... IS NULL` check: - -```sql -CASE WHEN IS NULL - THEN 'NULL' - ELSE 'v:' || END -``` - -Because `` is not a column reference but a multi-branch CASE expression, PostgreSQL must evaluate it twice for the non-NULL branch (once for the IS NULL check, once for the concatenation). This is not a correctness problem — PostgreSQL's optimizer cannot in general prove the two evaluations are identical so it will run the inner CASE twice per row per tag column. For the value path through `escape_column_value` the wrapping happens inside `pg_coalesce_json_string` after `to_json()`, so the same double-evaluation applies there too. - -This is a pre-existing structural issue in the code (the old `"col"::text` also got evaluated twice by the outer coalesce/namespace wrappers), but the new CASE expression is materially larger, making it worth noting. It is not a correctness problem and the performance impact is low for typical workloads, but if this is on a hot path (large initial snapshots) it is worth documenting. - -The NULL-producing branch in `pg_cast_column_to_text` itself (`WHEN "col" IS NULL THEN NULL::text`) is also unnecessary: if you remove it entirely, the ELSE branch (`"col"::text`) already returns NULL when `"col"` is NULL because `NULL::text` is NULL. The NULL guard was added because the plan noted `concat(NULL, '') = ''`, but that scenario is already excluded by the second WHEN clause: if the column IS NULL, `pg_typeof("col") = 'character'::regtype` is still true for a bpchar column (pg_typeof of a NULL bpchar is still `character`), meaning the original NULL guard is actually load-bearing and cannot be removed without reintroducing the bug. - -Conclusion: the NULL branch is correct and required. The double-evaluation of the CASE expression is a minor inefficiency worth a comment but not a blocker. - -**2. `char(1)` / bare `char` (no length) — not tested, but correctly handled** - -PostgreSQL `char` (without length) is `char(1)`, and its internal type is still `bpchar`. The `pg_typeof()` comparison `= 'character'::regtype` matches all bpchar types regardless of length, including `char(1)` and `char`. The fix is correct for these variants but no test covers them. A test with a `CHAR(1)` column and a bare `CHAR` column would give higher confidence. - -**3. Arrays of `char(n)` — not handled, not tested** - -A column declared as `char(n)[]` (bpchar array) has `pg_typeof()` returning `character[]` (or `bpchar[]`), not `character`. The current CASE expression checks for `= 'character'::regtype` which will not match an array type, so an array of bpchar will fall through to the `::text` cast path, which trims padding from each element. - -The replication path for array types sends element values with padding preserved (WAL sends the raw storage bytes), so this is the same category of inconsistency as the scalar bug, just applied to array elements. - -Whether arrays of `char(n)` are a supported use-case in practice is unclear from this codebase, but the plan does not mention arrays of char(n) at all. This gap should be acknowledged, either with a test confirming the current (broken) behavior and a follow-up issue, or with a fix. - ---- - -### Suggestions - -**4. Test: only `stream_initial_data` is covered** - -The plan called out testing `query_move_in` and `query_subset` as well. Neither has a char(n)-specific test. `query_move_in` is exercised by the existing subquery tests (which use integer columns), but a char(n) PK used as a subquery join column would be a useful regression test given the `join_primary_keys` path is also fixed by the same function. - -**5. Test: NULL value in a `char(n)` column** - -The new test only inserts a non-NULL value. A row with a NULL `char(n)` column should be tested to confirm the NULL guard produces a JSON `null` rather than an empty string. The existing "works with null values" test (line 175) uses `TEXT`, not `char(n)`. - -**6. Test: `char(n)` column as a PK in a table without an explicit PK constraint** - -The new test uses `CHAR(8) PRIMARY KEY`. The `join_primary_keys` path is also used for pk-less tables where all columns are treated as PKs. A table with no primary key but a `char(n)` column is not tested. - -**7. Missing blank line in the implementation** - -The closing `end` of `pg_cast_column_to_text/1` is immediately followed by `defp pg_escape_string_for_json(str)` with no blank line separator. All other adjacent `defp` pairs in this file have a blank line between them. - ---- - -## Plan Alignment - -All planned changes are implemented. There are no deviations from the plan's fix strategy. The plan explicitly noted that arrays of char(n) and `char(1)` were not discussed; the implementation is consistent with the plan's documented scope. - -The plan's Step 3 ("Run tests") cannot be assessed from static review alone. Given the structural change adds a CASE expression to every column cast, it would be worth confirming the existing test suite passes in full (not just the new test). - ---- - -## Required Actions Before Merge - -1. Add a test for a NULL value in a `char(n)` column to confirm no regression on the NULL guard. -2. Add a test for `char(1)` / bare `CHAR` to confirm the `pg_typeof` comparison matches these variants. -3. Document or file a follow-up issue for `char(n)[]` array columns — the current fix does not cover them and the behavior is inconsistent with the replication path in the same way the original bug was. -4. Fix the missing blank line after the `end` of `pg_cast_column_to_text/1` (formatting). diff --git a/.agent-tasks/2026-03-23--13--char-padding-bug/task.md b/.agent-tasks/2026-03-23--13--char-padding-bug/task.md deleted file mode 100644 index 4fbc3a3deb..0000000000 --- a/.agent-tasks/2026-03-23--13--char-padding-bug/task.md +++ /dev/null @@ -1,16 +0,0 @@ -# Task: Different value for the same DB column returned from subset query and PG replication - -## Source -- Downstream: electric-sql/alco-agent-tasks#13 -- Upstream: electric-sql/electric#4039 - -## Problem -When a table has a `CHAR(n)` primary key, values coming through PG replication are correctly space-padded (e.g., "b " for CHAR(8)), but snapshot and subset queries return trimmed values (e.g., "b"). This creates inconsistency between change messages and full/subset responses. - -Additionally, boolean columns are represented as "t"/"f" in replication but "true"/"false" in queries. - -## Requirements -1. Write a unit test that reproduces the `char(n)` padding inconsistency -2. Analyze why the subset query returns trimmed values -3. Identify and fix the underlying bug -4. Ensure consistent representation across all response types