-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Add support for multiple module interfaces per cc_library
#27492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2b6dfab to
bdaffc1
Compare
|
Currently pulls in bazelbuild/rules_cc#511 and bazelbuild/rules_cc#512 via an override. |
cc_library
|
FYI @PikachuHyA |
|
Disclaimer: so far, I have not participated in the implementation of C++20 modules at all, so this all is quite alien to me. You'd probably be better off getting an informed opinion from @pzembrod wrt. the C++ parts. I do feel competent enough to comment on the general mechanism, though. I assume the "other related change" you mentioned over private chat is #26859 ? Where is the associated to Ow, this is really thorny. if I understand correctly, it's not enough to persist the module map file (at least a single one) because there is no guarantee that the change that creates the cycle happens at the top level. E.g. if In addition to this, special-casing that one file is weird because then Bazel relies on explicit action by the rule author to stay correct. This isn't unheard of (e.g. persistent workers), but it's bad if even Skyframe required this kind of outside help. The only possible approach I can see is to request the inputs from the action cache entry in the same order as they were requested by the original action and verify their up-to-dateness also gradually. That way, you avoid cycles without extra hints at the cost of some inefficiency (Skyframe restarts) But the simplestest approach would be to disallow multiple |
It's related in the sense that without that PR, the situation described in the current PR would always result in a cycle, even if there is no action cache entry: Both
That's correct for the module interface file, but the file we digest here is the module map file, which contains the file names of transitive
Yes, this should work. We could first request all mandatory inputs in a batch and then the discovered deps one by one, discarding the cache entry when one of them mismatches. This would require persisting the individual hashes of all discovered files, I think, where today we only persist their paths.
This is the current approach, |
I don't think it is feasible for real-world C++20 Modules projects. The PR #22553 banned multiple module interface files in a single So a blanket ban on multiple module-interface files is not a practical default. |
|
Allowing my ignorance to shine through, how are C++ modules different enough from IOW: Why do real C++20 Modules projects commonly have multiple module interfaces or partitions in a single library target? |
Since in C++, a module can be (and generally should be) composed by multiple module interfaces. e.g, an they implement different parts of the module. |
5dc3825 to
1f64f4e
Compare
|
@lberki and I discussed this offline and I will switch to a new approach that doesn't require changes to individual actions: action cache checks will be split into two parts, first checking the mandatory inputs only, then all inputs. |
d608f48 to
8494a0e
Compare
|
Might be worth clarifying (here, in docs, wherever) that Modules have a primary interface, and then support other module units. In the example above, https://github.com/alibaba/async_simple/tree/CXX20Modules/async_simple_module, async_simple_module/async_simple.cppm would be the main "module interface" and the other .cppm files are module partition interface units (denoted by Terminology from here: https://clang.llvm.org/docs/StandardCPlusPlusModules.html#background-and-terminology I think it makes sense to say that a cc_library as a single "Primary module interface unit" but also consists of multiple other modular units (whether they be partitions, implementation units, or internal interface units). @PikachuHyA and others, does that make sense (i.e. we expect a |
5b877e6 to
d6b52ec
Compare
|
@bazel-io fork 9.0.0 |
On the one hand, I think what you said makes sense for a specific practice. But, on the other side, the limitation is not forced by other build systems. I feel it'll makes user harder to use bazel with modules. For example, there are users using primary module interfaces for every module interfaces: https://github.com/davidstone/technical-machine/blob/main/source/tm/binary_file_reader.cpp I think, as a build system, it is better to not introduce the limitation. |
ce409e0 to
6cb869e
Compare
src/main/java/com/google/devtools/build/lib/analysis/actions/StarlarkAction.java
Show resolved
Hide resolved
# Conflicts: # src/main/java/com/google/devtools/build/lib/actions/ActionCacheChecker.java
3e12f4e to
ec23c88
Compare
|
@lberki I botched the conflict resolution, but it should be good now. |
|
Your fix didn't seem to fix the issue according to my testing, but I could fix it pretty easily on top of your work. Dunno if this is a behavior difference between Blaze and Bazel or an oversight on your part. Either way, I'll import this change myself to make the process go a bit faster; it's been quite a long time already. |
When multiple `module_interfaces` are specified on a single `cc_library`, the individual compilation actions form a DAG based on `import`s between these modules. Consider the following situation: * `a.cppm` imports `b.cppm`, both of which are in the `module_interfaces` of a single `cc_library`. * Building the target populates the action cache with an entry for `a.pcm` that stores `b.pcm` as a discovered input. * Now edit `a.cppm` and `b.cppm` so that `b.cppm` imports `a.cppm` and `a.cppm` no longer imports `b.cppm`. * Build again (optionally after a shutdown). Before this commit, this resulted in an action cycle since during action cache checking, Bazel would reuse or look up the inputs discovered in the previous build, thus introducing an edge from `a.pcm` to `b.pcm`. Together with the newly discovered edge from `b.pcm` to `a.pcm`, this resulted in a cycle. This is fixed by not requesting the previously discovered inputs (either retained in memory or in the action cache) if the mandatory inputs changed. In the case of C++20 modules, this is sufficient since the modmap file, which lists all transitive `.pcm` files required for compilation, is a mandatory input. As part of this change, `MetadataDigestUtils.fromMetadata` had to be modified to always return a byte array of proper digest length, even if called with an empty map, to match the assumptions of the action cache. This change is pretty much Fabian's PR bazelbuild#27492 with a tiny fix added on top (not returning from computeMandatoryInputsDigest() early on valuesMissing() if inErrorBubbling() is true) Closes bazelbuild#27492. PiperOrigin-RevId: 842733471 Change-Id: I48fa2c0bceb888dcb58db29d50c30719b2122c5d (cherry picked from commit cb9bd86)
…27927) When multiple `module_interfaces` are specified on a single `cc_library`, the individual compilation actions form a DAG based on `import`s between these modules. Consider the following situation: * `a.cppm` imports `b.cppm`, both of which are in the `module_interfaces` of a single `cc_library`. * Building the target populates the action cache with an entry for `a.pcm` that stores `b.pcm` as a discovered input. * Now edit `a.cppm` and `b.cppm` so that `b.cppm` imports `a.cppm` and `a.cppm` no longer imports `b.cppm`. * Build again (optionally after a shutdown). Before this commit, this resulted in an action cycle since during action cache checking, Bazel would reuse or look up the inputs discovered in the previous build, thus introducing an edge from `a.pcm` to `b.pcm`. Together with the newly discovered edge from `b.pcm` to `a.pcm`, this resulted in a cycle. This is fixed by not requesting the previously discovered inputs (either retained in memory or in the action cache) if the mandatory inputs changed. In the case of C++20 modules, this is sufficient since the modmap file, which lists all transitive `.pcm` files required for compilation, is a mandatory input. As part of this change, `MetadataDigestUtils.fromMetadata` had to be modified to always return a byte array of proper digest length, even if called with an empty map, to match the assumptions of the action cache. This change is pretty much Fabian's PR #27492 with a tiny fix added on top (not returning from computeMandatoryInputsDigest() early on valuesMissing() if inErrorBubbling() is true) Closes #27492. PiperOrigin-RevId: 842733471 Change-Id: I48fa2c0bceb888dcb58db29d50c30719b2122c5d (cherry picked from commit cb9bd86) Closes #27544
|
I have some bad news: despite very careful benchmarking before merging this, it looks like this caused a significant regression in one of our internal benchmarks. The proximate cause is that mandatory inputs are now iterated over twice: in This is far from trivial to fix. My best idea would be to change action key computation such that the mandatory inputs are not hashed the second time when the full action cache key is computed and the already-computed input is used as their proxy. This would definitely work, but would require carefully distinguishing between mandatory and discovered inputs and is not something I could possibly casually do in a free half an hour: we'd need to create a counterpart for Which leaves us with two options: rolling back this commit and dealing with the fallout later, or eating the regression for now and fixing it after the fact. Given that Bazel 9 is around the corner and that after all this years, I still have a streak of cowboy coding in me, I am inclined to opt for fixing it with a follow-up change. |
Can you conclude from the benchmarks whether the problem is 1) iterating the inputs or 2) digesting the input's digests? If it's 2), then we could possibly get away with turning the mandatory inputs into a set and skipping over them. If it's 1) then yes, this would probably require quite some restructuring of I'm up for both 🙂 |
|
From a quick look it looks like both. I read 425 sec of extra CPU time and out of that:
Minus some compensating speedups in other places for about 10 seconds which I didn't bother to decode. I think turning them into a set is not obviously the right thing to do because that requires CPU to deduplicate. I have no idea how it'd play out in reality, though. If that's not too much of a bother for you, I think then it's best to roll it back because I'd much rather not risk this masking other regressions during the holiday season. |
|
I'm fine with the rollback and can look into fixing this. Which likely means going down the |
|
What alternatives do you have under "everything else"? (I can't think of any other than maybe you set idea but that would probably require tapping into some other deduplication mechanism that's already there so as not to waste CPU time) |
Just the set idea, nothing else. I hope the refactoring turns out to be manageable. |
|
Makes us two; FWIW, it's not trivial, but then again, the fact that I couldn't fit into my time before the Christmas break doesn't mean that it's complicated, just that it's not trivial. |
When multiple
module_interfacesare specified on a singlecc_library, the individual compilation actions form a DAG based onimports between these modules. Consider the following situation:a.cppmimportsb.cppm, both of which are in themodule_interfacesof a singlecc_library.a.pcmthat storesb.pcmas a discovered input.a.cppmandb.cppmso thatb.cppmimportsa.cppmanda.cppmno longer importsb.cppm.Before this commit, this resulted in an action cycle since during action cache checking, Bazel would reuse or look up the inputs discovered in the previous build, thus introducing an edge from
a.pcmtob.pcm. Together with the newly discovered edge fromb.pcmtoa.pcm, this resulted in a cycle.This is fixed by not requesting the previously discovered inputs (either retained in memory or in the action cache) if the mandatory inputs changed. In the case of C++20 modules, this is sufficient since the modmap file, which lists all transitive
.pcmfiles required for compilation, is a mandatory input.As part of this change,
MetadataDigestUtils.fromMetadatahad to be modified to always return a byte array of proper digest length, even if called with an empty map, to match the assumptions of the action cache.