Conversation
58f942a to
94665c4
Compare
d87abd1 to
86d1bd8
Compare
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets. For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard. Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, Windows-safe PowerShell UInt32 masking, and isolated Windows shard temp files from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it. Co-authored-by: Codex <noreply@openai.com>
starr-openai
approved these changes
Apr 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The larger Rust test targets are expensive as single Bazel test actions. Native Bazel sharding helps split execution, but it still reports through the same test target label. That is not enough for the Codex workflow: we want BuildBuddy to show timing and flakiness per shard label, and we want a failed or flaky shard to be rerunnable as a concrete Bazel target instead of treating the whole aggregate test as one opaque target.
This PR therefore moves the selected large Rust tests to explicit generated shard targets. It also incorporates the rules_rust behavior from hermeticbuild/rules_rust#14 so each test name maps to a stable shard bucket by hash, rather than by list position.
What
codex_rust_cratewithtest_shard_counts.test_suiteaggregate and generates one concrete test rule per shard.workspace_root_testwrapper shape around one*-unit-tests-binrust_test.*-all-test-binrust_testand makes each shard label a lightweighttest_binary_testwrapper around that binary. This preserves distinct labels without compiling the Rust test crate once per shard.RULES_RUST_TEST_TOTAL_SHARDS/RULES_RUST_TEST_SHARD_INDEXon each generated shard target so the rules_rust wrapper can run the correct subset without using Bazel-reservedTEST_*env vars.patches/rules_rust_stable_explicit_test_shards.patch, which mirrors rust_test: shard by stable name hash hermeticbuild/rules_rust#14 until Codex can bump to a mergedrules_rustcommit containing that support.UInt64constants in the Windows PowerShell FNV hash expression so the 32-bit mask cannot be interpreted as-1.TEST_TMPDIRplus a per-wrapper temp directory in the Windows sharding wrapper so parallel shards do not collide on shared%TEMP%\rust_test_list_*.txtfiles.//codex-rs/core:core-all-test//codex-rs/core:core-unit-tests//codex-rs/app-server:app-server-all-test//codex-rs/app-server:app-server-unit-tests//codex-rs/tui:tui-unit-testsExample Labels
The aggregate label remains available:
but it now expands to explicit shard labels:
For integration tests, those shard labels point at one compiled test binary:
That means BuildBuddy sees labels such as
//codex-rs/core:core-all-test-shard-8-of-8, but Bazel still has only one underlyingrust_testrule for the integration test binary://codex-rs/core:core-all-test-bin.Unit tests use the same explicit shard label pattern while still running through the workspace-root launcher:
The label is one-indexed for readability (
shard-1-of-8), while the env value is the zero-indexed shard index consumed by the wrapper.Verification
just bazel-lock-checkbazel query 'kind("rust_test rule", //codex-rs/core:*)'bazel query 'kind(".* rule", //codex-rs/core:core-all-test-shard-8-of-8 + //codex-rs/core:core-all-test-bin + //codex-rs/app-server:app-server-all-test-shard-8-of-8 + //codex-rs/tui:tui-unit-tests-bin + //codex-rs/tui:tui-unit-tests-shard-1-of-8)' --output=buildbazel test --test_output=errors //codex-rs/core:core-all-test-shard-1-of-8 //codex-rs/core:core-all-test-shard-8-of-8 //codex-rs/tui:tui-unit-tests-shard-1-of-8bazel test --test_output=errors //codex-rs/core:core-all-test-shard-7-of-8bazel test --test_output=errors //codex-rs/app-server:app-server-unit-tests-shard-3-of-8 //codex-rs/core:core-all-test-shard-7-of-8