Skip to content

fix(c/driver/postgresql): use array_view->offset when writing list rows#4320

Open
mediprtl wants to merge 3 commits into
apache:mainfrom
mediprtl:fix/postgres-copy-sliced-list
Open

fix(c/driver/postgresql): use array_view->offset when writing list rows#4320
mediprtl wants to merge 3 commits into
apache:mainfrom
mediprtl:fix/postgres-copy-sliced-list

Conversation

@mediprtl
Copy link
Copy Markdown

@mediprtl mediprtl commented May 18, 2026

Summary

PostgresCopyListFieldWriter::Write computed each row's child range from the logical row index without adding array_view_->offset. When the parent List / LargeList / FixedSizeList array has offset > 0, the variable-length branch read the wrong slot of the offsets buffer (ArrowArrayViewListChildOffset does not honor offset, unlike ArrowArrayViewGetIntUnsafe), and the fixed-size branch multiplied the wrong base index by the element size. The child ranges still indexed into the still-full child values buffer, so list elements ended up attached to the wrong rows.

Fixes #4319.

`PostgresCopyListFieldWriter::Write` computed each row's child range from
the *logical* row index without adding `array_view_->offset`. When the
parent List / LargeList / FixedSizeList array has `offset > 0` (a sliced
parent), the variable-length branch read the wrong slot of the offsets
buffer (`ArrowArrayViewListChildOffset` does not honor `offset`, unlike
`ArrowArrayViewGetIntUnsafe`), and the fixed-size branch multiplied the
wrong base index by the element size. The child ranges still indexed
into the still-full child values buffer, so list elements ended up
attached to the wrong rows.

Practical impact: silent, per-row drift of list-column values when an
Arrow table is sliced into multiple batches and ingested via
`adbc_ingest` with `mode="create"` then `mode="append"`. The first chunk
(`offset=0`) is always correct; every subsequent chunk's list column is
shifted by the chunk's `parent.offset`. Scalar columns are unaffected
because their writers route through `ArrowArrayViewGetInt*`, which
honors `offset`.

Add a regression test for List, LargeList, and FixedSizeList that
writes rows 3..5 of a 6-row source via offset/length and asserts the
output body equals what the writer produces for those same rows passed
as a fresh 3-row array. Both branches fail this check on `main` and
pass with the one-edit fix.

Fixes apache#4319.
…rams

The parameterized prepared-statement path (`BindAndExecuteCurrentRow`)
builds its field writers via the same `MakeCopyFieldWriter` factory as
the COPY ingest path, so the same offset bug drifts list-typed bound
parameters on INSERT/UPDATE/DELETE/upsert when the bound array has
`offset > 0`.

Add a `PostgresStatementTest.BindUpsertWithSlicedListParameter` that
binds a sliced 6→3-row Arrow array as the `text[]` parameter of an
`INSERT ... ON CONFLICT DO UPDATE` upsert and asserts rows 3..5 receive
their own `array_length(tags, 1)` rather than rows 0..2's. The test
fails on the unpatched driver at the first `len` assertion (id=3 sees
length 2 — drift to row 0 — instead of 1) and passes with the writer.h
fix from this branch.
@mediprtl
Copy link
Copy Markdown
Author

Added a bind-path regression test as a follow-up commit (b5a8d88). Context (cross-posted from #4319): PostgresCopyListFieldWriter is reused by BindStream via the same MakeCopyFieldWriter factory (c/driver/postgresql/bind_stream.h:166), so the offset bug also drifts list-typed parameters on INSERT / UPDATE / DELETE / upsert when the bound Arrow array is sliced.

The new PostgresStatementTest.BindUpsertWithSlicedListParameter exercises the bind path end-to-end: binds rows 3..5 of a 6-row Arrow batch (with parent.offset=3 on the struct, id and tags children) as the text[] parameter of an INSERT ... ON CONFLICT DO UPDATE upsert, then asserts array_length(tags, 1) per id matches the slice (1/2/1), not the underlying-rows-0..2 lengths (2/1/2). Fails on the unpatched driver at the first len assertion (id=3 reports length 2 — drift to row 0); passes with the writer.h fix from this branch.

CI builds the C driver with -DADBC_BUILD_WARNING_LEVEL=CHECKIN (which
turns on -Werror). My two regression tests in postgres_copy_writer_test.cc
compared `size_t i` against `ArrowBuffer::size_bytes` (int64_t), tripping
-Werror=sign-compare on the C/C++ Conda, Documentation, vcpkg/windows,
and PostgreSQL Integration jobs. Switch the loop counters to int64_t to
match the field type, mirroring how the surrounding list-writer tests
avoid the warning.

Separately, the bind-path test in postgresql_test.cc and the new tests
in postgres_copy_writer_test.cc were hand-formatted and did not match
the project's clang-format (v18.1.7) config, failing the pre-commit job.
Re-formatted in place; no logic changes.

Verified locally:
  cmake ../c -DADBC_DRIVER_POSTGRESQL=ON -DADBC_BUILD_TESTS=ON \
             -DADBC_BUILD_WARNING_LEVEL=CHECKIN -DCMAKE_BUILD_TYPE=Release
  -- builds adbc-driver-postgresql-copy-test and adbc-driver-postgresql-test
     clean with no warnings.
  -- all three SlicedMatchesDirect tests pass; BindUpsertWithSlicedListParameter
     passes.
  -- pre-commit run --files <changed> green after one format pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

c/driver/postgresql: adbc_ingest silently misaligns list/large_list/fixed_size_list rows when the source Arrow array is sliced (parent.offset > 0)

1 participant