Add support for multiple module interfaces per `cc_library` #27492

fmeum · 2025-11-01T12:58:17Z

When multiple module_interfaces are specified on a single cc_library, the individual compilation actions form a DAG based on imports between these modules. Consider the following situation:

a.cppm imports b.cppm, both of which are in the module_interfaces of a single cc_library.
Building the target populates the action cache with an entry for a.pcm that stores b.pcm as a discovered input.
Now edit a.cppm and b.cppm so that b.cppm imports a.cppm and a.cppm no longer imports b.cppm.
Build again (optionally after a shutdown).

Before this commit, this resulted in an action cycle since during action cache checking, Bazel would reuse or look up the inputs discovered in the previous build, thus introducing an edge from a.pcm to b.pcm. Together with the newly discovered edge from b.pcm to a.pcm, this resulted in a cycle.

This is fixed by not requesting the previously discovered inputs (either retained in memory or in the action cache) if the mandatory inputs changed. In the case of C++20 modules, this is sufficient since the modmap file, which lists all transitive .pcm files required for compilation, is a mandatory input.

As part of this change, MetadataDigestUtils.fromMetadata had to be modified to always return a byte array of proper digest length, even if called with an empty map, to match the assumptions of the action cache.

fmeum · 2025-11-01T13:02:39Z

Currently pulls in bazelbuild/rules_cc#511 and bazelbuild/rules_cc#512 via an override.

fmeum · 2025-11-01T13:14:05Z

FYI @PikachuHyA

lberki · 2025-11-02T19:13:46Z

Disclaimer: so far, I have not participated in the implementation of C++20 modules at all, so this all is quite alien to me. You'd probably be better off getting an informed opinion from @pzembrod wrt. the C++ parts. I do feel competent enough to comment on the general mechanism, though.

I assume the "other related change" you mentioned over private chat is #26859 ? Where is the associated to

Ow, this is really thorny. if I understand correctly, it's not enough to persist the module map file (at least a single one) because there is no guarantee that the change that creates the cycle happens at the top level. E.g. if a.ccpm imports b.cppm imports c.cppm, you'd presumably include a.cppm in the action cache entry. But then what if the change is that c.cppm imports b.cppm? Then your file remains unchanged, so you are still susceptible to the same dependency cycle.

In addition to this, special-casing that one file is weird because then Bazel relies on explicit action by the rule author to stay correct. This isn't unheard of (e.g. persistent workers), but it's bad if even Skyframe required this kind of outside help.

The only possible approach I can see is to request the inputs from the action cache entry in the same order as they were requested by the original action and verify their up-to-dateness also gradually. That way, you avoid cycles without extra hints at the cost of some inefficiency (Skyframe restarts)

But the simplestest approach would be to disallow multiple .cppm files in a single cc_library, and this is where my ignorance about C++20 modules comes in. Would this be feasible?

fmeum · 2025-11-02T19:34:36Z

I assume the "other related change" you mentioned over private chat is #26859 ? Where is the associated to

It's related in the sense that without that PR, the situation described in the current PR would always result in a cycle, even if there is no action cache entry: Both a.pcm and b.pcm are in the allowed derived inputs for the action producing the other.

if I understand correctly, it's not enough to persist the module map file (at least a single one) because there is no guarantee that the change that creates the cycle happens at the top level.

That's correct for the module interface file, but the file we digest here is the module map file, which contains the file names of transitive .pcm file dependencies, not just the direct ones. I think that this should be good enough for invalidation since it describes the complete set of discovered inputs.

The only possible approach I can see is to request the inputs from the action cache entry in the same order as they were requested by the original action and verify their up-to-dateness also gradually. That way, you avoid cycles without extra hints at the cost of some inefficiency (Skyframe restarts)

Yes, this should work. We could first request all mandatory inputs in a batch and then the discovered deps one by one, discarding the cache entry when one of them mismatches. This would require persisting the individual hashes of all discovered files, I think, where today we only persist their paths.

But the simplestest approach would be to disallow multiple .cppm files in a single cc_library, and this is where my ignorance about C++20 modules comes in. Would this be feasible?

This is the current approach, cc_library currently fails when module_interfaces has more than one entry. This does make usage rather verbose compared to other build systems. But as the original PR demonstrated, those other build systems (in this case, ninja) also struggly with this kind of invalidation. @PikachuHyA Can you already assess how problematic this restriction is in practice?

PikachuHyA · 2025-11-03T09:55:53Z

But the simplestest approach would be to disallow multiple .cppm files in a single cc_library, and this is where my ignorance about C++20 modules comes in. Would this be feasible?

I don't think it is feasible for real-world C++20 Modules projects.

The PR #22553 banned multiple module interface files in a single cc_binary/cc_library, but that would be too restrictive for modules-native projects. Real C++20 Modules projects commonly have multiple module interfaces or partitions in a single library target. For example, the modules-native version of async_simple cannot be built with bazel after that PR #22553 merged. Forcing one module interface file (e.g. foo.cppm) per target would force awkward splits, duplicated BUILD files or unnatural organization.

So a blanket ban on multiple module-interface files is not a practical default.
The restriction that each cc_binary/cc_library's module_interfaces attribute may contain only a single module interface file should be removed.

lberki · 2025-11-04T08:54:06Z

Allowing my ignorance to shine through, how are C++ modules different enough from cc_library rules that a 1:1 mapping is not feasible? Naively, one would think that "one module / one library" is a good philosophy.

IOW: Why do real C++20 Modules projects commonly have multiple module interfaces or partitions in a single library target?

ChuanqiXu9 · 2025-11-04T09:18:04Z

Allowing my ignorance to shine through, how are C++ modules different enough from cc_library rules that a 1:1 mapping is not feasible? Naively, one would think that "one module / one library" is a good philosophy.

IOW: Why do real C++20 Modules projects commonly have multiple module interfaces or partitions in a single library target?

Since in C++, a module can be (and generally should be) composed by multiple module interfaces. e.g, an async_simple module is consisted by module interfaces in https://github.com/alibaba/async_simple/tree/CXX20Modules/async_simple_module like https://github.com/alibaba/async_simple/blob/CXX20Modules/async_simple_module/Common.cppm , https://github.com/alibaba/async_simple/blob/CXX20Modules/async_simple_module/Executor.cppm and https://github.com/alibaba/async_simple/blob/CXX20Modules/async_simple_module/Future.cppm

they implement different parts of the module.

fmeum · 2025-11-04T18:54:44Z

@lberki and I discussed this offline and I will switch to a new approach that doesn't require changes to individual actions: action cache checks will be split into two parts, first checking the mandatory inputs only, then all inputs.

trybka · 2025-11-05T15:12:09Z

Might be worth clarifying (here, in docs, wherever) that Modules have a primary interface, and then support other module units.

In the example above, https://github.com/alibaba/async_simple/tree/CXX20Modules/async_simple_module, async_simple_module/async_simple.cppm would be the main "module interface" and the other .cppm files are module partition interface units (denoted by export module async_simple:$PARTITION_NAME)

Terminology from here: https://clang.llvm.org/docs/StandardCPlusPlusModules.html#background-and-terminology

I think it makes sense to say that a cc_library as a single "Primary module interface unit" but also consists of multiple other modular units (whether they be partitions, implementation units, or internal interface units).

@PikachuHyA and others, does that make sense (i.e. we expect a cc_library to only have one file that has export module $NAME while still consisting of other modular units)?

fmeum · 2025-11-05T16:03:36Z

@bazel-io fork 9.0.0

ChuanqiXu9 · 2025-11-06T05:17:03Z

Might be worth clarifying (here, in docs, wherever) that Modules have a primary interface, and then support other module units.

In the example above, https://github.com/alibaba/async_simple/tree/CXX20Modules/async_simple_module, async_simple_module/async_simple.cppm would be the main "module interface" and the other .cppm files are module partition interface units (denoted by export module async_simple:$PARTITION_NAME)

Terminology from here: https://clang.llvm.org/docs/StandardCPlusPlusModules.html#background-and-terminology

I think it makes sense to say that a cc_library as a single "Primary module interface unit" but also consists of multiple other modular units (whether they be partitions, implementation units, or internal interface units).

@PikachuHyA and others, does that make sense (i.e. we expect a cc_library to only have one file that has export module $NAME while still consisting of other modular units)?

On the one hand, I think what you said makes sense for a specific practice. But, on the other side, the limitation is not forced by other build systems. I feel it'll makes user harder to use bazel with modules. For example, there are users using primary module interfaces for every module interfaces:

https://github.com/davidstone/technical-machine/blob/main/source/tm/binary_file_reader.cpp
https://github.com/davidstone/technical-machine/blob/main/source/tm/bit_view.cpp
https://github.com/davidstone/technical-machine/blob/main/source/tm/blocks_selection_and_execution.cpp
https://github.com/davidstone/bounded-integer/blob/main/source/bounded/builtin_min_max_value.cpp
https://github.com/davidstone/bounded-integer/blob/main/source/bounded/builtin_integer.cpp

I think, as a build system, it is better to not introduce the limitation.

src/main/java/com/google/devtools/build/lib/analysis/actions/StarlarkAction.java

# Conflicts: # src/main/java/com/google/devtools/build/lib/actions/ActionCacheChecker.java

fmeum · 2025-11-29T10:08:41Z

@lberki I botched the conflict resolution, but it should be good now.

lberki · 2025-12-09T12:37:58Z

Your fix didn't seem to fix the issue according to my testing, but I could fix it pretty easily on top of your work. Dunno if this is a behavior difference between Blaze and Bazel or an oversight on your part.

Either way, I'll import this change myself to make the process go a bit faster; it's been quite a long time already.

lberki · 2025-12-09T12:38:14Z

(cc @tjgq and @pzembrod for awareness)

When multiple `module_interfaces` are specified on a single `cc_library`, the individual compilation actions form a DAG based on `import`s between these modules. Consider the following situation: * `a.cppm` imports `b.cppm`, both of which are in the `module_interfaces` of a single `cc_library`. * Building the target populates the action cache with an entry for `a.pcm` that stores `b.pcm` as a discovered input. * Now edit `a.cppm` and `b.cppm` so that `b.cppm` imports `a.cppm` and `a.cppm` no longer imports `b.cppm`. * Build again (optionally after a shutdown). Before this commit, this resulted in an action cycle since during action cache checking, Bazel would reuse or look up the inputs discovered in the previous build, thus introducing an edge from `a.pcm` to `b.pcm`. Together with the newly discovered edge from `b.pcm` to `a.pcm`, this resulted in a cycle. This is fixed by not requesting the previously discovered inputs (either retained in memory or in the action cache) if the mandatory inputs changed. In the case of C++20 modules, this is sufficient since the modmap file, which lists all transitive `.pcm` files required for compilation, is a mandatory input. As part of this change, `MetadataDigestUtils.fromMetadata` had to be modified to always return a byte array of proper digest length, even if called with an empty map, to match the assumptions of the action cache. This change is pretty much Fabian's PR bazelbuild#27492 with a tiny fix added on top (not returning from computeMandatoryInputsDigest() early on valuesMissing() if inErrorBubbling() is true) Closes bazelbuild#27492. PiperOrigin-RevId: 842733471 Change-Id: I48fa2c0bceb888dcb58db29d50c30719b2122c5d (cherry picked from commit cb9bd86)

…27927) When multiple `module_interfaces` are specified on a single `cc_library`, the individual compilation actions form a DAG based on `import`s between these modules. Consider the following situation: * `a.cppm` imports `b.cppm`, both of which are in the `module_interfaces` of a single `cc_library`. * Building the target populates the action cache with an entry for `a.pcm` that stores `b.pcm` as a discovered input. * Now edit `a.cppm` and `b.cppm` so that `b.cppm` imports `a.cppm` and `a.cppm` no longer imports `b.cppm`. * Build again (optionally after a shutdown). Before this commit, this resulted in an action cycle since during action cache checking, Bazel would reuse or look up the inputs discovered in the previous build, thus introducing an edge from `a.pcm` to `b.pcm`. Together with the newly discovered edge from `b.pcm` to `a.pcm`, this resulted in a cycle. This is fixed by not requesting the previously discovered inputs (either retained in memory or in the action cache) if the mandatory inputs changed. In the case of C++20 modules, this is sufficient since the modmap file, which lists all transitive `.pcm` files required for compilation, is a mandatory input. As part of this change, `MetadataDigestUtils.fromMetadata` had to be modified to always return a byte array of proper digest length, even if called with an empty map, to match the assumptions of the action cache. This change is pretty much Fabian's PR #27492 with a tiny fix added on top (not returning from computeMandatoryInputsDigest() early on valuesMissing() if inErrorBubbling() is true) Closes #27492. PiperOrigin-RevId: 842733471 Change-Id: I48fa2c0bceb888dcb58db29d50c30719b2122c5d (cherry picked from commit cb9bd86) Closes #27544

lberki · 2025-12-18T13:51:47Z

I have some bad news: despite very careful benchmarking before merging this, it looks like this caused a significant regression in one of our internal benchmarks. The proximate cause is that mandatory inputs are now iterated over twice: in computeMandatoryInputHash() and where the key of the whole action is computed.

This is far from trivial to fix. My best idea would be to change action key computation such that the mandatory inputs are not hashed the second time when the full action cache key is computed and the already-computed input is used as their proxy. This would definitely work, but would require carefully distinguishing between mandatory and discovered inputs and is not something I could possibly casually do in a free half an hour: we'd need to create a counterpart for Action.getMandatoryInputs() called getDiscoveredInputs() and see to it that it's correct everywhere.

Which leaves us with two options: rolling back this commit and dealing with the fallout later, or eating the regression for now and fixing it after the fact. Given that Bazel 9 is around the corner and that after all this years, I still have a streak of cowboy coding in me, I am inclined to opt for fixing it with a follow-up change.

@tjgq @fmeum @meisterT WDYT?

fmeum · 2025-12-18T14:14:21Z

The proximate cause is that mandatory inputs are now iterated over twice: in computeMandatoryInputHash() and where the key of the whole action is computed.

Can you conclude from the benchmarks whether the problem is 1) iterating the inputs or 2) digesting the input's digests? If it's 2), then we could possibly get away with turning the mandatory inputs into a set and skipping over them. If it's 1) then yes, this would probably require quite some restructuring of Action methods.

I'm up for both 🙂

lberki · 2025-12-18T14:26:39Z

From a quick look it looks like both. I read 425 sec of extra CPU time and out of that:

211 sec comes from MetadataDigestUtils.getDigest() (digesting)
91 sec form Fingerprint.addString() (digesting)
82 sec from ActionExecutionFunction.getAndCheckInputSkyValue() (iterating, although this might be fixable with some clever rearrangement of the code)
27 sec from AdtionInputMap.addToMap() (iterating, although I don't understand why this takes longer)
14 sec from HashMap$EntryIterator.next() (iterating)
11 sec from FileArtifactValue.addTo() iterating)

Minus some compensating speedups in other places for about 10 seconds which I didn't bother to decode.

I think turning them into a set is not obviously the right thing to do because that requires CPU to deduplicate. I have no idea how it'd play out in reality, though.

If that's not too much of a bother for you, I think then it's best to roll it back because I'd much rather not risk this masking other regressions during the holiday season.

fmeum · 2025-12-18T14:42:25Z

I'm fine with the rollback and can look into fixing this. Which likely means going down the getDiscoveredInputs route since everything else would require benchmark access to validate.

lberki · 2025-12-18T16:01:15Z

What alternatives do you have under "everything else"? (I can't think of any other than maybe you set idea but that would probably require tapping into some other deduplication mechanism that's already there so as not to waste CPU time)

fmeum · 2025-12-18T16:19:30Z

What alternatives do you have under "everything else"? (I can't think of any other than maybe you set idea but that would probably require tapping into some other deduplication mechanism that's already there so as not to waste CPU time)

Just the set idea, nothing else. I hope the refactoring turns out to be manageable.

lberki · 2025-12-19T08:03:39Z

Makes us two; FWIW, it's not trivial, but then again, the fact that I couldn't fit into my time before the Christmas break doesn't mean that it's complicated, just that it's not trivial.

fmeum force-pushed the fix-c++20-modules-no-cycle branch from 2b6dfab to bdaffc1 Compare November 1, 2025 13:01

fmeum changed the title ~~Fix c++20 modules no cycle~~ Add support for multiple module interfaces per cc_library Nov 1, 2025

fmeum marked this pull request as ready for review November 1, 2025 13:14

fmeum requested a review from lberki as a code owner November 1, 2025 13:14

github-actions bot added team-Performance Issues for Performance teams team-Rules-CPP Issues for C++ rules awaiting-review PR is awaiting review from an assigned reviewer labels Nov 1, 2025

lberki requested a review from pzembrod November 2, 2025 19:06

PikachuHyA mentioned this pull request Nov 3, 2025

[4/5] support C++20 Modules, support one phase compilation #22553

Closed

fmeum force-pushed the fix-c++20-modules-no-cycle branch from 5dc3825 to 1f64f4e Compare November 4, 2025 13:42

pzembrod requested a review from trybka November 4, 2025 15:25

fmeum marked this pull request as draft November 4, 2025 18:54

fmeum force-pushed the fix-c++20-modules-no-cycle branch 2 times, most recently from d608f48 to 8494a0e Compare November 5, 2025 11:42

fmeum force-pushed the fix-c++20-modules-no-cycle branch from 5b877e6 to d6b52ec Compare November 5, 2025 16:03

bazel-io mentioned this pull request Nov 5, 2025

[9.0.0] Add support for multiple module interfaces per cc_library #27544

Closed

fmeum force-pushed the fix-c++20-modules-no-cycle branch 2 times, most recently from ce409e0 to 6cb869e Compare November 6, 2025 21:44

fmeum commented Nov 6, 2025

View reviewed changes

src/main/java/com/google/devtools/build/lib/analysis/actions/StarlarkAction.java Show resolved Hide resolved

fmeum added 12 commits November 29, 2025 10:48

Update cc_integration_test.sh

41960ac

Push new test

dfe2ed8

Switch to two-phase lookup

dc2a97b

# Conflicts: # src/main/java/com/google/devtools/build/lib/actions/ActionCacheChecker.java

Fix tests and improve comments

3bdefc0

Handle MissingArtifactValue in input lookup

e759a94

Add Lukacs' patch

7bf14e4

Ensure that empty digests are of the right size

61733a6

Use patches

8f71d0e

Update comment

5eba266

Fix ActionCacheTest

8d5b965

Address comments

460b95f

Wrap SourceArtifactException

ec23c88

fmeum force-pushed the fix-c++20-modules-no-cycle branch from 3e12f4e to ec23c88 Compare November 29, 2025 09:50

copybara-service bot closed this in cb9bd86 Dec 10, 2025

github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Dec 10, 2025

fmeum deleted the fix-c++20-modules-no-cycle branch December 10, 2025 16:18

fmeum mentioned this pull request Dec 10, 2025

[9.0.0] Add support for multiple module interfaces per cc_library #27927

Merged

Add support for multiple module interfaces per cc_library #27492

Add support for multiple module interfaces per cc_library #27492

Uh oh!

Conversation

fmeum commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmeum commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmeum commented Nov 1, 2025

Uh oh!

lberki commented Nov 2, 2025

Uh oh!

fmeum commented Nov 2, 2025

Uh oh!

PikachuHyA commented Nov 3, 2025

Uh oh!

lberki commented Nov 4, 2025

Uh oh!

ChuanqiXu9 commented Nov 4, 2025

Uh oh!

fmeum commented Nov 4, 2025

Uh oh!

trybka commented Nov 5, 2025

Uh oh!

fmeum commented Nov 5, 2025

Uh oh!

ChuanqiXu9 commented Nov 6, 2025

Uh oh!

Uh oh!

fmeum commented Nov 29, 2025

Uh oh!

lberki commented Dec 9, 2025

Uh oh!

lberki commented Dec 9, 2025

Uh oh!

lberki commented Dec 18, 2025

Uh oh!

fmeum commented Dec 18, 2025

Uh oh!

lberki commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmeum commented Dec 18, 2025

Uh oh!

lberki commented Dec 18, 2025

Uh oh!

fmeum commented Dec 18, 2025

Uh oh!

lberki commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add support for multiple module interfaces per `cc_library` #27492

Add support for multiple module interfaces per `cc_library` #27492

fmeum commented Nov 1, 2025 •

edited

Loading

fmeum commented Nov 1, 2025 •

edited

Loading

lberki commented Dec 18, 2025 •

edited

Loading