I have some CI changes I'd like to propose. They go a bit deeper than the recent caching work, so I wanted to open an issue first to align on direction before pursuing any of them. The pieces below are independent; taking forward only some of them works fine.
Resolving Arrow from conda-forge
Eleven jobs build the vendored Arrow and Parquet from source, but the toolchain already prefers a prebuilt one (the FetchContent declaration uses FIND_PACKAGE_ARGS); CI just never installs one. Installing libarrow and libparquet from conda-forge with setup-miniconda (ASF allowlisted) takes that compile out of each leg:
| Test leg |
Build targets |
Cold build step |
| Ubuntu |
951 to 606 |
21 to 16 min |
| macOS |
946 to 602 |
11 to 8 min |
| Windows |
900 to 578 |
47 to 35 min |
(fork measurements; times vary with runner load, target counts don't; the AWS, SQL catalog, sanitizer, and linter legs shrink similarly)
It is also most of our cache pressure: one push to main saves around 10 GB of sccache entries, more than the repository's 10 GB limit by itself, so eviction ends up deleting entries that have no newer replacement. Over the past week about every third push to main rebuilt at least one leg from scratch. With prebuilt Arrow, and without debug info that nothing in CI reads, saves drop under 3 GB.
Coverage holds: the AWS leg keeps building the bundled AWS SDK from source, the Meson legs don't use Arrow, and the sanitizer leg passes against the non-instrumented Arrow. The conda pin would track the version in the toolchain file and bump in the same PR.
Building one library flavor per leg
CI builds with ICEBERG_BUILD_STATIC and ICEBERG_BUILD_SHARED both ON, and the two targets compile the same sources twice (the shared build adds the export define and hidden visibility, so objects can't be reused). The tests link one flavor, so building one roughly halves what a leg compiles of our own code. A static-only fork run passes the full build and test matrix; one leg could keep both ON to keep both exercised.
Windows: build Debug like the other legs
The Unix test legs build Debug; Windows builds Release and has been the slowest leg fairly consistently. MSVC Debug needs embedded debug info (/Z7 via CMP0141) for sccache to cache the objects, and that combination is green on a fork. Windows is currently the only Release build in CI, though, so this is partly a question of what the matrix should cover.
Two smaller cleanups
The test workflows trigger on both push (all branches) and pull_request, and the concurrency groups key on different refs per event, so a branch pushed here with an open PR runs everything twice. Scoping push to main, as cpp-linter.yml already does, drops the duplicates and keeps the post-merge runs that seed the caches. Separately, the sccache steps are copy-pasted across nine jobs in five workflow files; a composite action under .github/actions/ would hold them (and the conda setup) in one place.
Questions
- Is conda-forge acceptable as a source of prebuilt Arrow in CI, and how would you want the version pin maintained?
- Single flavor on the test legs: which one, and is it enough to keep one leg building both static and shared?
- Windows on Debug: fine, or should the matrix keep a Release leg?
- Any concerns with the composite action or the push trigger scoping?
I have some CI changes I'd like to propose. They go a bit deeper than the recent caching work, so I wanted to open an issue first to align on direction before pursuing any of them. The pieces below are independent; taking forward only some of them works fine.
Resolving Arrow from conda-forge
Eleven jobs build the vendored Arrow and Parquet from source, but the toolchain already prefers a prebuilt one (the FetchContent declaration uses FIND_PACKAGE_ARGS); CI just never installs one. Installing libarrow and libparquet from conda-forge with setup-miniconda (ASF allowlisted) takes that compile out of each leg:
(fork measurements; times vary with runner load, target counts don't; the AWS, SQL catalog, sanitizer, and linter legs shrink similarly)
It is also most of our cache pressure: one push to main saves around 10 GB of sccache entries, more than the repository's 10 GB limit by itself, so eviction ends up deleting entries that have no newer replacement. Over the past week about every third push to main rebuilt at least one leg from scratch. With prebuilt Arrow, and without debug info that nothing in CI reads, saves drop under 3 GB.
Coverage holds: the AWS leg keeps building the bundled AWS SDK from source, the Meson legs don't use Arrow, and the sanitizer leg passes against the non-instrumented Arrow. The conda pin would track the version in the toolchain file and bump in the same PR.
Building one library flavor per leg
CI builds with ICEBERG_BUILD_STATIC and ICEBERG_BUILD_SHARED both ON, and the two targets compile the same sources twice (the shared build adds the export define and hidden visibility, so objects can't be reused). The tests link one flavor, so building one roughly halves what a leg compiles of our own code. A static-only fork run passes the full build and test matrix; one leg could keep both ON to keep both exercised.
Windows: build Debug like the other legs
The Unix test legs build Debug; Windows builds Release and has been the slowest leg fairly consistently. MSVC Debug needs embedded debug info (/Z7 via CMP0141) for sccache to cache the objects, and that combination is green on a fork. Windows is currently the only Release build in CI, though, so this is partly a question of what the matrix should cover.
Two smaller cleanups
The test workflows trigger on both push (all branches) and pull_request, and the concurrency groups key on different refs per event, so a branch pushed here with an open PR runs everything twice. Scoping push to main, as cpp-linter.yml already does, drops the duplicates and keeps the post-merge runs that seed the caches. Separately, the sccache steps are copy-pasted across nine jobs in five workflow files; a composite action under .github/actions/ would hold them (and the conda setup) in one place.
Questions