Skip to content

8379260: C2: Separate volatile barrier and full barrier#30106

Closed
merykitty wants to merge 4 commits intoopenjdk:masterfrom
merykitty:membarfull
Closed

8379260: C2: Separate volatile barrier and full barrier#30106
merykitty wants to merge 4 commits intoopenjdk:masterfrom
merykitty:membarfull

Conversation

@merykitty
Copy link
Copy Markdown
Member

@merykitty merykitty commented Mar 6, 2026

Hi,

MemBarVolatileNode is described as:

// Ordering between a volatile store and a following volatile load.
// Requires multi-CPU visibility?
class MemBarVolatileNode: public MemBarNode

This is incorrect, as MemBarVolatileNode is used in VarHandle::fullFence intrinsics, which means it must act as a full fence. In addition, since MemBarVolatileNode must act as a full fence, it prevents most optimizations with volatile accesses.

This PR extracts MemBarFull out of MemBarVolatile as a proper full fence. This removes the confusion in the description of MemBarVolatileNode, as well as allows us to have a better chance optimizing memory accesses around volatile accesses.

Testing:

  • tier1,tier2,tier3,tier4,hs-comp-stress

Please kindly review, thanks a lot.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8379260: C2: Separate volatile barrier and full barrier (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30106/head:pull/30106
$ git checkout pull/30106

Update a local copy of the PR:
$ git checkout pull/30106
$ git pull https://git.openjdk.org/jdk.git pull/30106/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30106

View PR using the GUI difftool:
$ git pr show -t 30106

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30106.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link
Copy Markdown

bridgekeeper bot commented Mar 6, 2026

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk bot commented Mar 6, 2026

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8379260: C2: Separate volatile barrier and full barrier

Reviewed-by: fyang, mdoerr, amitkumar, aph, dlong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 112 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 6, 2026
@openjdk
Copy link
Copy Markdown

openjdk bot commented Mar 6, 2026

@merykitty The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 6, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge bot commented Mar 6, 2026

Webrevs

Comment thread src/hotspot/share/opto/library_call.cpp Outdated
access_store_at(nullptr, jt_addr, _gvn.type(jt_addr)->is_ptr(), ideal.ConI(1), TypeInt::BOOL, T_BOOLEAN, IN_NATIVE | MO_UNORDERED);
access_store_at(nullptr, vt_addr, _gvn.type(vt_addr)->is_ptr(), ideal.ConI(1), TypeInt::BOOL, T_BOOLEAN, IN_NATIVE | MO_UNORDERED);
insert_mem_bar(Op_MemBarVolatile);
insert_mem_bar(Op_MemBarFull);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says this code needs OrderAccess::storeload(), which can be weaker than a full fence on PPC and RISC-V. Should we also introduce a new Op_MemBarStoreLoad to support this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point, I have updated the PR.

@merykitty
Copy link
Copy Markdown
Member Author

I see that on RISC-V, MemBarVolatile is implemented as __ membar(MacroAssembler::StoreLoad), but MacroAssembler::StoreLoad is different from MacroAssembler::AnyAny, can anyone help me understand if I should raise this to a bug?

@dean-long
Copy link
Copy Markdown
Member

MemBarVolatile may be obsolete now if we have MemBarStoreLoad. My understanding is that MemBarVolatile is needed between a volatile store and a volatile load, and the old JSR-133 cookbook says that requires only MacroAssembler::StoreLoad, not the stronger MacroAssembler::AnyAny.

@merykitty
Copy link
Copy Markdown
Member Author

@dean-long A MemBarVolatile is weaker than a MemBarStoreLoad. A MemBarVolatile only prevents volatile accesses from moving past it, while a MemBarStoreLoad prevents all accesses from moving past it.

@merykitty
Copy link
Copy Markdown
Member Author

My understanding is that MemBarVolatile is needed between a volatile store and a volatile load, and the old JSR-133 cookbook says that requires only MacroAssembler::StoreLoad, not the stronger MacroAssembler::AnyAny.

But MemBarVolatile is used to implement VarHandle::fullFence, so it must act as a full-fence, a store-load is simply inadequate.

@theRealAph
Copy link
Copy Markdown
Contributor

@dean-long A MemBarVolatile is weaker than a MemBarStoreLoad. A MemBarVolatile only prevents volatile accesses from moving past it, while a MemBarStoreLoad prevents all accesses from moving past it.

That's not right.

In addition to obeying Acquire and Release properties, all Volatile operations are totally ordered with respect to each other. So, a MemBarVolatile is release; acquire; plus (in effect) a storeLoad sequenced before the next volatile load.

@theRealAph
Copy link
Copy Markdown
Contributor

I see that on RISC-V, MemBarVolatile is implemented as __ membar(MacroAssembler::StoreLoad), but MacroAssembler::StoreLoad is different from MacroAssembler::AnyAny, can anyone help me understand if I should raise this to a bug?

I suppose it is a bug, yes.

@merykitty
Copy link
Copy Markdown
Member Author

In addition to obeying Acquire and Release properties, all Volatile operations are totally ordered with respect to each other. So, a MemBarVolatile is release; acquire; plus (in effect) a storeLoad sequenced before the next volatile load.

A volatile load is a load followed by a MemBarAcquire. A volatile store is a MemBarRelease followed by a store. So, MemBarVolatile only has the responsibility of ensuring a volatile store is not reordered with subsequent volatile loads. When support_IRIW_for_not_multiple_copy_atomic_cpu is true, the MemBarVolatile is inserted before the volatile loads, while if it is false, the barrier is inserted after the volatile stores.

I suppose it is a bug, yes.

Thanks, I changed it to a bug.

@theRealAph
Copy link
Copy Markdown
Contributor

In addition to obeying Acquire and Release properties, all Volatile operations are totally ordered with respect to each other. So, a MemBarVolatile is release; acquire; plus (in effect) a storeLoad sequenced before the next volatile load.

A volatile load is a load followed by a MemBarAcquire. A volatile store is a MemBarRelease followed by a store.

So, MemBarVolatile only has the responsibility of ensuring a volatile store is not reordered with subsequent volatile loads.

Oh, ISWYM. MemBarVolatile, despite its name, is not a volatile fence. It is C2's name for a StoreLoad needed between volatile stores and loads. In practice that usually means a full fence, but not always.

format %{ "membar_full" %}
ins_encode %{
__ block_comment("membar_full");
__ membar(Assembler::AnyAny);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is making a distinction without a real difference between StoreLoad and AnyAny: it's dmb ish in both cases.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right, so there is no difference functional-wise on Aarch64. If you want to ask me why having a separate match block for MemBarFull and MemBarStoreLoad, I think the difference in the comment is useful.

Comment thread src/hotspot/cpu/aarch64/aarch64.ad Outdated
@theRealAph
Copy link
Copy Markdown
Contributor

Oh, ISWYM. MemBarVolatile, despite its name, is not a volatile fence. It is C2's name for a StoreLoad needed between volatile stores and loads. In practice that usually means a full fence, but not always.

So, what is MemBarVolatile for, when it's just a StoreLoad?

Co-authored-by: Andrew Haley <aph-open@littlepinkcloud.com>
@merykitty
Copy link
Copy Markdown
Member Author

A MemBarVolatile is weaker than a MemBarStoreLoad, so we can take advantage of that to optimize non-volatile accesses around it in the future.

@dean-long
Copy link
Copy Markdown
Member

But MemBarVolatile is used to implement VarHandle::fullFence, so it must act as a full-fence, a store-load is simply inadequate.

Yes, VarHandle::fullFence was using MemBarVolatile, but it should be using MemBarFull now.

A MemBarVolatile is weaker than a MemBarStoreLoad, so we can take advantage of that to optimize non-volatile accesses around it in the future.

Both are really StoreLoad now, if we consitently use MemBarFull when a full fence is needed, and MemBarStoreLoad only when a StoreLoad is needed. StoreLoad != full fence even though they both might get translated to the same thing depending on the hardware.

@merykitty
Copy link
Copy Markdown
Member Author

MemBarStoreLoad and MemBarVolatile are the same at code generation, but the compiler can move non-volatile accesses past MemBarVolatile when it is not normally possible with MemBarStoreLoad.

For example, if we have a volatile store in a loop and no other store or memory barrier in that loop, a MemBarStoreLoad will prevent all loads from being hoisted out. This is even a little inconsistent, because we only emit that barrier after the volatile store if support_IRIW_for_not_multiple_copy_atomic_cpu is false. Otherwise, the barrier is emitted before the volatile loads, so the loads can be hoisted fine.

@dean-long
Copy link
Copy Markdown
Member

Thanks, I hope I got it now :-). MemBarVolatile semantics are more at the Java level, while MemBarStoreLoad is closer to the memory level, I guess you could say?

@theRealAph
Copy link
Copy Markdown
Contributor

MemBarStoreLoad and MemBarVolatile are the same at code generation, but the compiler can move non-volatile accesses past MemBarVolatile when it is not normally possible with MemBarStoreLoad.

For example, if we have a volatile store in a loop and no other store or memory barrier in that loop, a MemBarStoreLoad will prevent all loads from being hoisted out. This is even a little inconsistent, because we only emit that barrier after the volatile store if support_IRIW_for_not_multiple_copy_atomic_cpu is false. Otherwise, the barrier is emitted before the volatile loads, so the loads can be hoisted fine.

That makes sense, so a MemBarVolatile can float around betwen seqCst stores and seqCst loads.

Copy link
Copy Markdown
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 9, 2026
Copy link
Copy Markdown
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I have two minor comments about the instruct format for RISC-V.

Comment thread src/hotspot/cpu/riscv/riscv.ad Outdated
ins_cost(VOLATILE_REF_COST);

format %{ "#@membar_full_rvtso\n\t"
"fence a, a"%}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "fence rw, rw"%}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your review, I have made the change.

Comment thread src/hotspot/cpu/riscv/riscv.ad Outdated
ins_cost(VOLATILE_REF_COST);

format %{ "#@membar_full_rvwmo\n\t"
"fence a, a"%}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "fence rw, rw"%}

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Mar 10, 2026
Copy link
Copy Markdown
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The RISC-V part looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 10, 2026
@merykitty
Copy link
Copy Markdown
Member Author

@theRealAph @dean-long @RealFYang Thanks a lot for your reviews and suggestions. Can I integrate now or should I wait for approval for the ppc and s390 parts?

@dean-long
Copy link
Copy Markdown
Member

Yes, wait for ppc and s390 and maybe ping the port maintainers.

@merykitty
Copy link
Copy Markdown
Member Author

@TheRealMDoerr Could you take a look? Thanks in advance.

Copy link
Copy Markdown
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Also tested jtreg:test/hotspot/jtreg/compiler/c2 on linux ppc64le. Thanks!

@merykitty
Copy link
Copy Markdown
Member Author

@offamitkumar Could you review the s390 part, please?

Copy link
Copy Markdown
Member

@offamitkumar offamitkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s390 looks fine, tier1 tests are also clean.

@merykitty
Copy link
Copy Markdown
Member Author

Thanks very much for your reviews!

/integrate

@openjdk
Copy link
Copy Markdown

openjdk bot commented Mar 12, 2026

Going to push as commit fd80329.
Since your change was applied there have been 114 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 12, 2026
@openjdk openjdk bot closed this Mar 12, 2026
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 12, 2026
@openjdk
Copy link
Copy Markdown

openjdk bot commented Mar 12, 2026

@merykitty Pushed as commit fd80329.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

6 participants