fix(c/driver/postgresql): use array_view->offset when writing list rows#4320
fix(c/driver/postgresql): use array_view->offset when writing list rows#4320mediprtl wants to merge 3 commits into
Conversation
`PostgresCopyListFieldWriter::Write` computed each row's child range from the *logical* row index without adding `array_view_->offset`. When the parent List / LargeList / FixedSizeList array has `offset > 0` (a sliced parent), the variable-length branch read the wrong slot of the offsets buffer (`ArrowArrayViewListChildOffset` does not honor `offset`, unlike `ArrowArrayViewGetIntUnsafe`), and the fixed-size branch multiplied the wrong base index by the element size. The child ranges still indexed into the still-full child values buffer, so list elements ended up attached to the wrong rows. Practical impact: silent, per-row drift of list-column values when an Arrow table is sliced into multiple batches and ingested via `adbc_ingest` with `mode="create"` then `mode="append"`. The first chunk (`offset=0`) is always correct; every subsequent chunk's list column is shifted by the chunk's `parent.offset`. Scalar columns are unaffected because their writers route through `ArrowArrayViewGetInt*`, which honors `offset`. Add a regression test for List, LargeList, and FixedSizeList that writes rows 3..5 of a 6-row source via offset/length and asserts the output body equals what the writer produces for those same rows passed as a fresh 3-row array. Both branches fail this check on `main` and pass with the one-edit fix. Fixes apache#4319.
…rams The parameterized prepared-statement path (`BindAndExecuteCurrentRow`) builds its field writers via the same `MakeCopyFieldWriter` factory as the COPY ingest path, so the same offset bug drifts list-typed bound parameters on INSERT/UPDATE/DELETE/upsert when the bound array has `offset > 0`. Add a `PostgresStatementTest.BindUpsertWithSlicedListParameter` that binds a sliced 6→3-row Arrow array as the `text[]` parameter of an `INSERT ... ON CONFLICT DO UPDATE` upsert and asserts rows 3..5 receive their own `array_length(tags, 1)` rather than rows 0..2's. The test fails on the unpatched driver at the first `len` assertion (id=3 sees length 2 — drift to row 0 — instead of 1) and passes with the writer.h fix from this branch.
|
Added a bind-path regression test as a follow-up commit (b5a8d88). Context (cross-posted from #4319): The new |
CI builds the C driver with -DADBC_BUILD_WARNING_LEVEL=CHECKIN (which
turns on -Werror). My two regression tests in postgres_copy_writer_test.cc
compared `size_t i` against `ArrowBuffer::size_bytes` (int64_t), tripping
-Werror=sign-compare on the C/C++ Conda, Documentation, vcpkg/windows,
and PostgreSQL Integration jobs. Switch the loop counters to int64_t to
match the field type, mirroring how the surrounding list-writer tests
avoid the warning.
Separately, the bind-path test in postgresql_test.cc and the new tests
in postgres_copy_writer_test.cc were hand-formatted and did not match
the project's clang-format (v18.1.7) config, failing the pre-commit job.
Re-formatted in place; no logic changes.
Verified locally:
cmake ../c -DADBC_DRIVER_POSTGRESQL=ON -DADBC_BUILD_TESTS=ON \
-DADBC_BUILD_WARNING_LEVEL=CHECKIN -DCMAKE_BUILD_TYPE=Release
-- builds adbc-driver-postgresql-copy-test and adbc-driver-postgresql-test
clean with no warnings.
-- all three SlicedMatchesDirect tests pass; BindUpsertWithSlicedListParameter
passes.
-- pre-commit run --files <changed> green after one format pass.
Summary
PostgresCopyListFieldWriter::Writecomputed each row's child range from the logical row index without addingarray_view_->offset. When the parentList/LargeList/FixedSizeListarray hasoffset > 0, the variable-length branch read the wrong slot of the offsets buffer (ArrowArrayViewListChildOffsetdoes not honoroffset, unlikeArrowArrayViewGetIntUnsafe), and the fixed-size branch multiplied the wrong base index by the element size. The child ranges still indexed into the still-full child values buffer, so list elements ended up attached to the wrong rows.Fixes #4319.