Skip to content

[herd] Align MTE async fault reporting with the architecture#1786

Open
diaolo01 wants to merge 1 commit into
herd:masterfrom
diaolo01:mte-async-flt
Open

[herd] Align MTE async fault reporting with the architecture#1786
diaolo01 wants to merge 1 commit into
herd:masterfrom
diaolo01:mte-async-flt

Conversation

@diaolo01
Copy link
Copy Markdown
Contributor

@diaolo01 diaolo01 commented Apr 10, 2026

This PR updates how herd reports asynchronous MTE tag-check faults so they match the architecture: async faults no longer contain a location or label, and only indicate that the thread saw a failing tag-check (mirroring TFSR_EL1 being set).

For the following test

AArch64 F005
Variant=mte,async

{
 0:X1=x:red; 0:X2=y:red;
}
 P0          ;
 MOV W0,#1   ;
 STR W0,[X1] ;
 STR W0,[X2] ;

forall (fault(P0))

The behaviour is as follows:

Before
Two async tag-checks on different locations appear as distinct faults:

Fault(P0,x:red,TagCheck);
Fault(P0,y:red,TagCheck);

After
Async tag-checks collapse into a single per thread marker:

Fault(P0,TagCheck);

This ensures async faults aren’t tied to a specific address or label.

@ShaleXIONG
Copy link
Copy Markdown
Collaborator

ShaleXIONG commented Apr 10, 2026

Following an offline message:

AArch64 LB+dmb.sy+amo.ldaddal-polt
Variant=memtag
Generator=diycross7 (version 7.58+1)
Prefetch=0:x=F,0:y=W,1:y=F,1:x=W
Com=Rf Rf
Orig=DMB.SYdRW RfePA Amo.LdAddAL PodWWLT RfeTP
Cycle=DMB.SYdRW RfePA Amo.LdAddAL PodWWLT RfeTP
"DMB.SYdRW RfePA Amo.LdAddAL PodWWLT RfeTP"
{
 0:X0=x:red; 0:X3=y:green;
 1:X0=x:red; 1:X3=y:green; 1:X4=x:green;
}
 P0               | P1                 ;
 L00: LDR W1,[X0] | MOV W2,#2          ;
 DMB SY           | LDADDAL W2,W1,[X3] ;
 MOV W2,#1        | STG X0,[X4]        ;
 STR W2,[X3]      |                    ;

exists ([y]=3 /\ 0:X1=0 /\ 1:X1=1 /\ not (fault(P0:L00,x)))

The CI runs tests such as above in extra -variant async, which leads to CI fails. I will make some change in diy

  • I will change the CI scripting to generate async  tests
  • I will change the fault in cycle.ml with a check on async vs sync. When it is async I only generate a fault with procedure number, e.g. fault(P0)

I made a patch in #1789 please feel free to cherry-pick.
ref: @fsestini @relokin for reviewing, if necessary.

@diaolo01 diaolo01 marked this pull request as draft April 10, 2026 12:42
@diaolo01 diaolo01 changed the title [herd] Align MTE async fault reporting with the architecture [herd+diy] Align MTE async fault reporting with the architecture Apr 10, 2026
@diaolo01 diaolo01 marked this pull request as ready for review April 10, 2026 13:36
@artkhyzha
Copy link
Copy Markdown
Collaborator

I agree that using the "Fault" predicate in ways that are not checkable with litmus7 should not be the way. As such, the syntax Fault(P0,x:red,TagCheck) looks suspicious in asynchronous / asymmetric mode. To me, it seems like there are two approaches:

  • To use Fault(P0,TagCheck:Async) or Fault(P0) in postconditions.
  • To use 0:TFSR_ELx=1 in postconditions, which is what we used in some litmus tests in catalogue.

I am not sure existence of the second option excludes the first option.

One way or another, the constraint to the syntax makes sense. I have three questions:

  • Are "TagCheck:Async" and "TagCheck" disjoint? In the sense that, is it right that Fault(P0,TagCheck) would not include async faults as per your proposal?
  • If one adds a fault handler for P0, does the presence of the async fault trigger fault handling?
  • Doesn't this need changes in litmus7 too?

@diaolo01
Copy link
Copy Markdown
Contributor Author

diaolo01 commented May 13, 2026

Are "TagCheck:Async" and "TagCheck" disjoint? In the sense that, is it right that Fault(P0,TagCheck) would not include async faults as per your proposal?

Indeed that the original proposal excluded async faults from Fault(P0,TagCheck). I have removed the Async qualifier in the fault type since it did not bring any value.
Also, with the removal of TagCheck:Async, I was able to avoid changes to diy.

If one adds a fault handler for P0, does the presence of the async fault trigger fault handling?

No. This can be observed by running herd ./herd/tests/instructions/AArch64.MTE/F001.litmus -show all -showevents all. The output of that test has not been changed as part of this PR.

Doesn't this need changes in litmus7 too?

No. This change is only about how herd represents async faults in its final states. I don't think litmus supports async faults yet.

@diaolo01 diaolo01 changed the title [herd+diy] Align MTE async fault reporting with the architecture [herd] Align MTE async fault reporting with the architecture May 13, 2026
@relokin
Copy link
Copy Markdown
Member

relokin commented May 13, 2026

One comment, otherwise, this looks good to me.

|L1: ;
| LDR W2,[X3] ;
exists ~fault (P1:L0,x) /\ ~fault(P1:L1,x)
exists ~fault (P1:L0,x) /\ ~fault(P1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we cleanup both faults?

Suggested change
exists ~fault (P1:L0,x) /\ ~fault(P1)
exists ~fault(P1)

@@ -0,0 +1,12 @@
AArch64 F005
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry actually, I don't understand why we are adding this test.

@@ -0,0 +1,12 @@
AArch64 F006
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants