8379396: "assert(offset + partition_size <= size()) failed: partition failed" when combining NonProfiledCodeHeapSize and large value for CICompilerCount#30121
Conversation
|
👋 Welcome back bulasevich! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@bulasevich The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
| COMPILER1_PRESENT(compiler_buffer_size += CompilationPolicy::c1_count() * Compiler::code_buffer_size()); | ||
| COMPILER2_PRESENT(compiler_buffer_size += CompilationPolicy::c2_count() * C2Compiler::initial_code_buffer_size()); | ||
| COMPILER1_PRESENT(compiler_buffer_size += (size_t)CompilationPolicy::c1_count() * Compiler::code_buffer_size()); | ||
| COMPILER2_PRESENT(compiler_buffer_size += (size_t)CompilationPolicy::c2_count() * C2Compiler::initial_code_buffer_size()); |
There was a problem hiding this comment.
On 32-bit platforms like arm32, size_t is still 32 bits. Having compiler buffer space larger than UINT_MAX seems extreme, as does have very large values of CICompilerCount. We should probably trim extreme values earlier. Let's start by reducing the max CICompilerCount from its current max_jint down to something reasonable, like "max ram / thread stack size" should give a number in the 1000s and then reduce further based on the number of cores would give a number in the 10s or 100s.
For a local quick fix here, we could compute the intermediate result as uint64_t, then exit with out-of-memory if the result does not fit in size_t.
There was a problem hiding this comment.
@dean-long
Thanks for the review!
You are right that on 32-bit platforms we should use uint64_t for the intermediate computation. I changed the code to use a temporary 64-bit variable for the compiler buffer size calculation.
Then I compare it against CODE_CACHE_SIZE_LIMIT rather than SIZE_MAX, because this multiplication is not yet the final value (we add min_cache_size later when computing non_nmethod_min_size), so I would prefer to keep the intermediate result within the architectural CodeCache budget already at this point, rather than only checking representability in size_t.
- size_t compiler_buffer_size = 0;
- COMPILER1_PRESENT(compiler_buffer_size += (size_t)CompilationPolicy::c1_count() * Compiler::code_buffer_size());
- COMPILER2_PRESENT(compiler_buffer_size += (size_t)CompilationPolicy::c2_count() * C2Compiler::initial_code_buffer_size());
+ uint64_t compiler_buffer_size_uint64 = 0;
+ COMPILER1_PRESENT(compiler_buffer_size_uint64 += (uint64_t)CompilationPolicy::c1_count() * Compiler::code_buffer_size());
+ COMPILER2_PRESENT(compiler_buffer_size_uint64 += (uint64_t)CompilationPolicy::c2_count() * C2Compiler::initial_code_buffer_size());
+ if (compiler_buffer_size_uint64 > (uint64_t)CODE_CACHE_SIZE_LIMIT) {
+ vm_exit_during_initialization("Compiler buffer size exceeds the architectural CodeCache limit");
+ }
+ size_t compiler_buffer_size = (size_t)compiler_buffer_size_uint64;As for the upper bound on CICompilerCount: there is already CICompilerCountConstraintFunc, and in principle we could add compiler-buffer-size math there as well. However, that would duplicate logic, and at that stage the final compiler threads number is not known yet. I would prefer to keep the exact sizing logic in CodeCache::initialize_heaps(), where the actual buffer requirements are computed.
Would it make sense to instead reduce the flag range in globals.hpp to some conservative sanity cap, for example 1024? That is already far beyond the processor count of any existing system I can think of, and I cannot imagine a realistic scenario where such a number of concurrent compiler threads would be needed. Strictly speaking, with such a cap in place, the "Compiler buffer size exceeds the architectural CodeCache limit" error should become unreachable; the explicit check in CodeCache::initialize_heaps() would then remain as a safety net and as a guard against inconsistent future changes.
- /* notice: the max range value here is max_jint, not max_intx */ \
- /* because of overflow issue */ \
product(intx, CICompilerCount, CI_COMPILER_COUNT, \
"Number of compiler threads to run") \
- range(0, max_jint) \
+ range(0, 1024) \
constraint(CICompilerCountConstraintFunc, AfterErgo) \There was a problem hiding this comment.
Yes, we could set a max value for CICompilerCount to something around 1024. That sounds reasonable to me, but maybe there is a legitimate reason to allow larger values for stress testing?
The alternative would be to check for overflow in expressions like
CompilationPolicy::c1_count() * Compiler::code_buffer_size()
with code like
if (Compiler::code_buffer_size() > std::numeric_limits<uint64_t>::max() / CompilationPolicy::c1_count()) {
This brings up the question of how to do overflow checking in general. Instead of every location in the code that needs overflow checks inventing its own, it would be nice if we had a standard way of doing it, like user-defined class wrappers and operator overloading, with a boolean overflow flag that can be queried.
There was a problem hiding this comment.
Dean, thanks for the feedback.
The idea of implementing a standardized safe-multiplication API is interesting, but it seems like a significant over-engineering at the moment. We should keep the fix focused. But, yes, let us have it in mind.
Regarding the 1024 limit for CICompilerCount: I have strong doubts about this magic number. There is no technical justification for it other than it being "larger than the core count of any system I'm aware of." That’s a weak rationale for a hard-coded limit in the JVM. I tend to remove it — the uint64_t approach with the subsequent check is enough to handle the overflow safely without imposing arbitrary constraints.
There was a problem hiding this comment.
I'm OK with not restricting CICompilerCount. And for some reason I thought the code was multiplying 64-bit size_t values, but now I see that the values are 32-bit, so a 64-bit result should be OK.
There was a problem hiding this comment.
There is logic in CompilationPolicy::initialize() to calculate number of compiler threads based on cores count and NonNMethodCodeHeapSize default value. It is called before CodeCache initialization in init_globals():
compilationPolicy_init();
codeCache_init();
We can check for unreasonable large CICompilerCount there and exit VM. "Unreasonable" could be > os::active_processor_count(). Or check it in CICompilerCountConstraintFunc().
Current fix also works but it is not clear why "compiler buffer size exceeds limit" and what it means.
There was a problem hiding this comment.
Thanks for the review!
I'm open to limiting CICompilerCount, but I do not think more threads than active_processor_count() is unreasonable. Even a single-core machine must be able to run two threads, C1 and C2. Someone might legitimately want to run ten compiler threads on a regular four-core desktop for testing purposes. I'm against imposing hard limits based on reasonableness rather than strict technical constraints.
Fair point on the error message. How about: CICompilerCount is too large: compiler buffer size exceeds the CodeCache size limit.
|
@bulasevich |
|
Please add a test. Otherwise, looks good. |
|
@dean-long, I have added the test. |
| if (compiler_buffer_size_uint64 > (uint64_t)CODE_CACHE_SIZE_LIMIT) { | ||
| vm_exit_during_initialization("Compiler buffer size exceeds the architectural CodeCache limit"); | ||
| } | ||
| size_t compiler_buffer_size = (size_t)compiler_buffer_size_uint64; |
There was a problem hiding this comment.
This would truncate the value on 32-bit. I think you need to use the new integer_cast_permit_tautology<>() to make this correct for both 32-bit and 64-bit. But maybe it's safe to assume that values <= CODE_CACHE_SIZE_LIMIT will fit in size_t, otherwise something is wrong. But I personally don't like "raw" casts.
There was a problem hiding this comment.
Good! I prefer a direct check over raw casts. Let it be check against MIN2((uint64_t)CODE_CACHE_SIZE_LIMIT, (uint64_t)SIZE_MAX) - this explicitly covers the 32-bit case and remains correct even if CODE_CACHE_SIZE_LIMIT changes in the future.
--- a/src/hotspot/share/code/codeCache.cpp
+++ b/src/hotspot/share/code/codeCache.cpp
uint64_t compiler_buffer_size_uint64 = 0;
COMPILER1_PRESENT(compiler_buffer_size_uint64 += (uint64_t)CompilationPolicy::c1_count() * Compiler::code_buffer_size());
COMPILER2_PRESENT(compiler_buffer_size_uint64 += (uint64_t)CompilationPolicy::c2_count() * C2Compiler::initial_code_buffer_size());
- if (compiler_buffer_size_uint64 > (uint64_t)CODE_CACHE_SIZE_LIMIT) {
- vm_exit_during_initialization("Compiler buffer size exceeds the architectural CodeCache limit");
+ if (compiler_buffer_size_uint64 > MIN2((uint64_t)CODE_CACHE_SIZE_LIMIT, (uint64_t)SIZE_MAX)) {
+ vm_exit_during_initialization("CICompilerCount is too large: compiler buffer size exceeds the CodeCache size limit");
}
size_t compiler_buffer_size = (size_t)compiler_buffer_size_uint64;There was a problem hiding this comment.
OK, but the raw cast is still there, but nothing is directly checking that (uint64)compiler_buffer_size == compiler_buffer_size_uint64 afterwards. Instead we are depending on the check above being correct (and staying correct). The "staying correct" is what I worry about, which is why I like the overflow check built into the assignment.
There was a problem hiding this comment.
Good point. I switched the assignment to checked_cast<size_t>() so the narrowing is checked at the conversion site rather than relying only on the preceding condition.
- if (compiler_buffer_size_uint64 > MIN2((uint64_t)CODE_CACHE_SIZE_LIMIT, (uint64_t)SIZE_MAX)) {
+ if (compiler_buffer_size_uint64 > (uint64_t)CODE_CACHE_SIZE_LIMIT) {
vm_exit_during_initialization("CICompilerCount is too large: compiler buffer size exceeds the CodeCache size limit");
}
- size_t compiler_buffer_size = (size_t)compiler_buffer_size_uint64;
+ size_t compiler_buffer_size = checked_cast<size_t>(compiler_buffer_size_uint64);|
The total number of required reviews for this PR has been set to 2 based on the presence of these labels: |
| if (compiler_buffer_size_uint64 > (uint64_t)CODE_CACHE_SIZE_LIMIT) { | ||
| vm_exit_during_initialization("CICompilerCount is too large: compiler buffer size exceeds the CodeCache size limit"); | ||
| } | ||
| size_t compiler_buffer_size = checked_cast<size_t>(compiler_buffer_size_uint64); |
There was a problem hiding this comment.
What do you think about using the new integer_cast_permit_tautology that I mentioned earlier (see also https://bugs.openjdk.org/browse/JDK-8314258)? The situation with size_t vs uint64_t on 32-bit vs 64-bit platforms is exactly where integer_cast_permit_tautology becomes useful.
There was a problem hiding this comment.
Right. This is a case where the new integer_cast_permit_tautology can be used.
- size_t compiler_buffer_size = checked_cast<size_t>(compiler_buffer_size_uint64);
+ size_t compiler_buffer_size = integer_cast_permit_tautology<size_t>(compiler_buffer_size_uint64);I checked it, and it works well. In fastdebug, it reports an error if the value is too large:
# Internal Error (src/hotspot/share/utilities/integerCast.hpp:122), pid=16376, tid=16377
# fatal error: integer_cast failed: 6000424478
… failed" when combining NonProfiledCodeHeapSize and large value for CICompilerCount
message updated - CICompilerCount is too large: compiler buffer size exceeds the CodeCache size limit
|
@bulasevich Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
…ailable in fastdebug only + message correction
| * trigger the old bug. | ||
| * @library /test/lib | ||
| * @requires vm.flagless | ||
| * @requires vm.debug |
There was a problem hiding this comment.
Can't we test this in product builds as well?
There was a problem hiding this comment.
In product builds, CICompilerCount is now capped to the processor-based limit. As a result, in product builds -XX:CICompilerCount=64 will, on most machines, trigger the "CICompilerCount is too large" error instead of the "compiler buffer size exceeds the CodeCache size limit" error that this test is meant to verify. For that reason, I disabled the test in product builds. Any suggestions?
CodeCache::initialize_heaps() computes additional non-nmethod space for compiler buffers from compiler thread counts and per-compiler buffer sizes. With abnormally large CICompilerCount values, the intermediate multiplication may overflow in 32-bit arithmetic before the result is accumulated into size_t.
As a result, the computed heap sizes may become inconsistent. This can bypass the expected
if (aligned_total > CODE_CACHE_SIZE_LIMIT)failure path with a normal “Code cache size exceeds platform limit” message, and instead lead to a later VM failure with an assertion.The fix is to promote the operands to size_t before multiplication, so the intermediate arithmetic is performed in the intended width and the computed heap sizes remain consistent.
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30121/head:pull/30121$ git checkout pull/30121Update a local copy of the PR:
$ git checkout pull/30121$ git pull https://git.openjdk.org/jdk.git pull/30121/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 30121View PR using the GUI difftool:
$ git pr show -t 30121Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30121.diff
Using Webrev
Link to Webrev Comment