Skip to content

Zero trimmed ranges under verifier flags to make Trim() observable#3

Closed
llersch wants to merge 1 commit into
mainfrom
llersch_trim_coverage
Closed

Zero trimmed ranges under verifier flags to make Trim() observable#3
llersch wants to merge 1 commit into
mainfrom
llersch_trim_coverage

Conversation

@llersch
Copy link
Copy Markdown
Owner

@llersch llersch commented Mar 4, 2026

On Linux with ext4/xfs, fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE) causes reads from the punched range to return zeros, so Trim() is already observable there. On macOS, Windows, tmpfs, NFS, and filesystems that do not support PUNCH_HOLE, Trim() returns false and does nothing — meaning the trim path is effectively untested in most CI environments.

Under DUCKDB_RUN_SLOW_VERIFIERS or DUCKDB_ALTERNATIVE_VERIFY, explicitly zero the trimmed range with Write() before attempting fallocate. This makes Trim() observable everywhere: any code that reads a trimmed block will see zeros regardless of the underlying filesystem or OS.

The two flags are triggered independently and serve different purposes:

  • DUCKDB_RUN_SLOW_VERIFIERS enables additive extra verification on top of the production code path. Zeroing fits here because it turns a silent no-op (on non-Linux or unsupporting filesystems) into a detectable operation without changing what the production path does.

  • DUCKDB_ALTERNATIVE_VERIFY selects alternative implementations of the same operation. Zeroing is precisely an alternative implementation of trim: it achieves the same observable effect as PUNCH_HOLE (reads from trimmed blocks return zeros) through explicit writes rather than a kernel hole-punching syscall.

The two blocks are independent: on Linux with glibc >= 2.18, both the zeroing and the fallocate execute, exercising the production code path while still making the result observable.

On Linux with ext4/xfs, fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
causes reads from the punched range to return zeros, so Trim() is already
observable there. On macOS, Windows, tmpfs, NFS, and filesystems that do not
support PUNCH_HOLE, Trim() returns false and does nothing — meaning the
trim path is effectively untested in most CI environments.

Under DUCKDB_RUN_SLOW_VERIFIERS or DUCKDB_ALTERNATIVE_VERIFY, explicitly
zero the trimmed range with Write() before attempting fallocate. This makes
Trim() observable everywhere: any code that reads a trimmed block will see
zeros regardless of the underlying filesystem or OS.

The two flags are triggered independently and serve different purposes:

- DUCKDB_RUN_SLOW_VERIFIERS enables additive extra verification on top of
  the production code path. Zeroing fits here because it turns a silent
  no-op (on non-Linux or unsupporting filesystems) into a detectable
  operation without changing what the production path does.

- DUCKDB_ALTERNATIVE_VERIFY selects alternative implementations of the
  same operation. Zeroing is precisely an alternative implementation of
  trim: it achieves the same observable effect as PUNCH_HOLE (reads from
  trimmed blocks return zeros) through explicit writes rather than a
  kernel hole-punching syscall.

The two blocks are independent: on Linux with glibc >= 2.18, both the
zeroing and the fallocate execute, exercising the production code path
while still making the result observable.
@llersch llersch force-pushed the llersch_trim_coverage branch from fa147d5 to 69ac9d9 Compare March 5, 2026 01:49
@llersch llersch closed this Mar 10, 2026
@llersch llersch deleted the llersch_trim_coverage branch March 10, 2026 15:51
llersch pushed a commit that referenced this pull request Apr 17, 2026
Closes duckdb/ducklake#977

When I was using DuckLake I met certain issue
```sh
memory D ATTACH ':memory:' AS dl (TYPE ducklake, DATA_INLINING_ROW_LIMIT 1000);
memory D CREATE TABLE dl.test(id INTEGER, val TEXT);
memory D INSERT INTO dl.test VALUES (1, 'a'), (2, 'b'), (3, 'c');
memory D BEGIN;
memory D ALTER TABLE dl.test ALTER COLUMN id TYPE BIGINT;
memory D UPDATE dl.test SET val = 'X' WHERE id = 1;
=================================================================
==83726==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160001cffb8 at pc 0x00010bf6c580 bp 0x00016f1a3cf0 sp 0x00016f1a3ce8
READ of size 1 at 0x6160001cffb8 thread T5
    #0 0x00010bf6c57c in duckdb::LogicalType::operator==(duckdb::LogicalType const&) const types.cpp:2071
    #1 0x000106c89ab8 in duckdb::StatisticsPropagator::CanPropagateCast(duckdb::LogicalType const&, duckdb::LogicalType const&) propagate_cast.cpp:31
    #2 0x00010a635dcc in duckdb::TryCastTableFilter(duckdb::TableFilter const&, duckdb::MultiFileIndexMapping&, duckdb::LogicalType const&) multi_file_column_mapper.cpp:1014
    #3 0x00010a632458 in duckdb::MultiFileColumnMapper::CreateFilters(std::__1::map<unsigned long long, std::__1::reference_wrapper<duckdb::TableFilter>, std::__1::less<unsigned long long>, std::__1::allocator<std::__1::pair<unsigned long long const, std::__1::reference_wrapper<duckdb::TableFilter>>>>&, duckdb::ResultColumnMapping&) multi_file_column_mapper.cpp:1104
```

I don't have too much knowledge on `MultiFileColumnMapper`, but
considering (1) the invalid memory access is 1 byte; (2) stacktrace code
location more or less indicates its logical type reference stored within
`MultiFileColumnMapper` gets invalid.
Turning it into value fixes the ASAN complaint.
llersch pushed a commit that referenced this pull request Apr 30, 2026
### Remove copied extensions from release/relassert artifact

The tarball for `relassert` contains duplicate extensions:
```
relassert artifact includes:
  duckdb
  extension/core_functions/core_functions.duckdb_extension
  extension/parquet/parquet.duckdb_extension
  repository/36a968b1bf/linux_amd64/core_functions.duckdb_extension
  repository/36a968b1bf/linux_amd64/parquet.duckdb_extension
  src/libduckdb.so
  test/extension/loadable_extension_demo.duckdb_extension
  test/extension/loadable_extension_optimizer_demo.duckdb_extension
  test/unittest
relassert artifact size:
470M build/relassert-artifact/relassert/extension/parquet
530M build/relassert-artifact/relassert/extension/core_functions
592M build/relassert-artifact/relassert/src
870M build/relassert-artifact/relassert/test/extension
1000M build/relassert-artifact/relassert/extension
1000M build/relassert-artifact/relassert/repository
1000M build/relassert-artifact/relassert/repository/36a968b1bf
1000M build/relassert-artifact/relassert/repository/36a968b1bf/linux_amd64
1.5G build/relassert-artifact/relassert/test
4.6G build/relassert-artifact/relassert
```
because these two paths:
```
extension/*/*.duckdb_extension
repository/36a968b1bf/linux_amd64/*.duckdb_extension
```
appears to contain binary file copies.

### Show symbols for stacktrace in `linux-debug`

Before, on `linux-debug`, we would
[see](https://github.com/duckdb/duckdb/actions/runs/24786615493/job/72534047134)
a traceback with no symbols:
```
build/relassert/test/unittest(+0x4447240) [0x560387001240]
build/relassert/test/unittest(+0x7b63367) [0x56038a71d367]
build/relassert/test/unittest(+0x7b66aec) [0x56038a720aec]
build/relassert/test/unittest(+0x7b7447c) [0x56038a72e47c]
build/relassert/test/unittest(+0x7b79e61) [0x56038a733e61]
build/relassert/test/unittest(+0x7b722c3) [0x56038a72c2c3]
build/relassert/test/unittest(+0x7b75a79) [0x56038a72fa79]
build/relassert/test/unittest(+0x7b7c248) [0x56038a736248]
build/relassert/test/unittest(+0x7b9b018) [0x56038a755018]
build/relassert/test/unittest(+0x301e5d4) [0x560385bd85d4]
build/relassert/test/unittest(+0x3ba5b9f) [0x56038675fb9f]
build/relassert/test/unittest(+0x3b9e56e) [0x56038675856e]
build/relassert/test/unittest(+0x3b9b716) [0x560386755716]
build/relassert/test/unittest(+0x3bafee7) [0x560386769ee7]
build/relassert/test/unittest(+0x3bad196) [0x560386767196]
build/relassert/test/unittest(+0x3bf2c57) [0x5603867acc57]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x7fc4a01b61ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x7fc4a01b628b]
build/relassert/test/unittest(+0x2ee2065) [0x560385a9c065]
```

After, we get a stacktrace with symbols.

Use `-rdynamic` as linker flag to fix unused CLI flag warning.

### `-rdynamic` caused ODR violations

With `EXPORT_DYNAMIC_SYMBOLS: 1` for `linux-debug`, address sanitizer
[finds](https://github.com/duckdb/duckdb/actions/runs/24837199395/job/72703372246?pr=22246#step:9:22)
One Definition Rule violations:

```
==437==ERROR: AddressSanitizer: odr-violation (0x7f85b6a36b60):
  [1] size=256 'LOOKUP_TABLE' /home/runner/work/duckdb/duckdb/src/function/cast/nested_to_varchar_cast.cpp:5:12 in /home/runner/work/duckdb/duckdb/build/relassert/src/libduckdb.so
  [2] size=256 'LOOKUP_TABLE' /home/runner/work/duckdb/duckdb/src/function/cast/nested_to_varchar_cast.cpp:5:12 in /home/runner/work/duckdb/duckdb/build/relassert/test/unittest
These globals were registered at these points:
  [1]:
    #0 0x7f85d158906f in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:350
    #1 0x7f85d1c2071e in call_init elf/dl-init.c:74
    #2 0x7f85d1c20823 in call_init elf/dl-init.c:120
    #3 0x7f85d1c20823 in _dl_init elf/dl-init.c:121
    duckdb#4 0x7f85d1c1c5b1 in __GI__dl_catch_exception elf/dl-catch.c:211
    duckdb#5 0x7f85d1c27d7b in dl_open_worker elf/dl-open.c:829
    duckdb#6 0x7f85d1c27d7b in dl_open_worker elf/dl-open.c:792
    duckdb#7 0x7f85d1c1c51b in __GI__dl_catch_exception elf/dl-catch.c:237
    duckdb#8 0x7f85d1c28163 in _dl_open elf/dl-open.c:905
    duckdb#9 0x7f85d0a0f1a3 in dlopen_doit dlfcn/dlopen.c:56
    duckdb#10 0x7f85d1c1c51b in __GI__dl_catch_exception elf/dl-catch.c:237
    duckdb#11 0x7f85d1c1c668 in _dl_catch_error elf/dl-catch.c:256
    duckdb#12 0x7f85d0a0ec82 in _dlerror_run dlfcn/dlerror.c:138
    duckdb#13 0x7f85d0a0f25e in dlopen_implementation dlfcn/dlopen.c:71
    duckdb#14 0x7f85d0a0f25e in ___dlopen dlfcn/dlopen.c:81
    duckdb#15 0x7f85d15b2f08 in dlopen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:6341
    duckdb#16 0x55ad3185a30f in AdbcLoadDriver (/home/runner/work/duckdb/duckdb/build/relassert/test/unittest+0xcf3630f) (BuildId: fbeec3ab975633c9b9b6c6233c1545043737a86a)

  [2]:
    #0 0x7f85d158906f in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:350
    #1 0x7f85d09a1303 in call_init ../csu/libc-start.c:145
    #2 0x7f85d09a1303 in __libc_start_main_impl ../csu/libc-start.c:347
    #3 0x55ad31808ea4 in _start (/home/runner/work/duckdb/duckdb/build/relassert/test/unittest+0xcee4ea4) (BuildId: fbeec3ab975633c9b9b6c6233c1545043737a86a)

```

The errors are caused by duplicated symbols. The symbols are present in
both the unittest binary and the loaded shared object `libduckdb.so`.

We can solve the errors by separating unittest object files and linking
the `unittest` binary to `libduckdb.so`.

### Differences in compressed (gzip -4) artifacts

build type | before | after | reduction
-- | -- | -- | --
linux-relassert-build | 1250 MB | 873 MB | 377 MB (30.2%)
linux-release-build | 258 MB | 170 MB | 88 MB (34.1%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant