-
Notifications
You must be signed in to change notification settings - Fork 212
Add streaming-compatible SVE variant to VFABI mangling #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -942,6 +942,13 @@ undefined. | |
| Zn.b [msb] ... 0x??????03 0x??????02 0x??????01 0x??????00 [lsb] | ||
| Zn.s [msb] ... 0x00000003 0x00000002 0x00000001 0x00000000 [lsb] | ||
|
|
||
| Streaming compatibility | ||
| ^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| If targeting SVE from a streaming or streaming-compatible region, | ||
| calls should be emitted to the streaming-compatible SVE rather than | ||
| the plain SVE variant (differentiated by mangling, as below). | ||
|
|
||
| Vector function name mangling | ||
| ----------------------------- | ||
|
|
||
|
|
@@ -983,6 +990,7 @@ Name mangling grammar for vector functions. | |
|
|
||
| <isa> := "n" (Advanced SIMD) | ||
| | "s" (SVE) | ||
| | "c" (Streaming-compatible SVE) | ||
|
|
||
| <mask> := "N" (No Mask) | ||
| | "M" (Mask) | ||
|
|
@@ -1195,6 +1203,19 @@ Note that the ``svbool_t`` parameter is described in `SVE masking`_. | |
| svfloat32_t _ZGVsM8vv_bar(svfloat64_t vx, svfloat64_t vy, | ||
| svbool_t vmask); | ||
|
|
||
| Streaming-compatible SVE Examples | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The use of ``#pragma omp declare simd`` with ``f``, ``g`` and ``foo`` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would the compiler know that we are in a streaming or streaming-compatible region? From what I can find in the ACLE the user adds I'm not at all familiar with OMP though so there could be a way of declaring a streaming compatible region, I've checked https://clang.llvm.org/docs/AttributeReference.html#pragma-omp-declare-simd Would be good to give an example if you can? If this part is somewhat speculative it could be worth separating it from the name mangling part, as I assume someone could manually write the functions rather than have the compiler auto-generate them. |
||
| in a streaming or streaming-compatible region will also generate: | ||
|
|
||
| * ``svfloat32_t _ZGVcMxv_f(svfloat64_t, svbool_t) __arm_streaming_compatible`` | ||
| streaming-compatible VLA signature for the vector version of ``f``; | ||
| * ``svfloat64_t _ZGVcMxv_g(svfloat32_t, svbool_t) __arm_streaming_compatible`` | ||
| streaming-compatible VLA signature for the vector version of ``g``; | ||
| * ``svint16_t _ZGVcMxvvv_foo(svint64_t, svint32_t, svint8_t, svbool_t) __arm_streaming_compatible`` | ||
| streaming-compatible VLA signature for the vector version of ``foo``. | ||
|
|
||
| Linear parameters examples | ||
| -------------------------- | ||
|
|
||
|
|
@@ -1364,17 +1385,19 @@ AArch64 Variant Traits | |
|
|
||
| .. table:: AArch64 traits for OpenMP contexts. | ||
|
|
||
| +------------------+-----------------------+-------------------------+ | ||
| |Trait set |Trait value |Notes | | ||
| +==================+=======================+=========================+ | ||
| |``device`` |``isa("simd")`` |Advanced SIMD call. | | ||
| +------------------+-----------------------+-------------------------+ | ||
| |``device`` |``isa("sve")`` |SVE call. | | ||
| +------------------+-----------------------+-------------------------+ | ||
| |``device`` |``arch("march-list")`` |Used to match | | ||
| | | |``-march=march-list`` | | ||
| | | |from the compiler. | | ||
| +------------------+-----------------------+-------------------------+ | ||
| +------------------+-----------------------+-------------------------------+ | ||
| |Trait set |Trait value |Notes | | ||
| +==================+=======================+===============================+ | ||
| |``device`` |``isa("simd")`` |Advanced SIMD call. | | ||
| +------------------+-----------------------+-------------------------------+ | ||
| |``device`` |``isa("sve")`` |SVE call. | | ||
| +------------------+-----------------------+-------------------------------+ | ||
| |``device`` |``isa("sc_sve")`` |Streaming-compatible SVE call. | | ||
| +------------------+-----------------------+-------------------------------+ | ||
| |``device`` |``arch("march-list")`` |Used to match | | ||
| | | |``-march=march-list`` | | ||
| | | |from the compiler. | | ||
| +------------------+-----------------------+-------------------------------+ | ||
|
|
||
| The scalar function ``f`` that is decorated with a ``declare | ||
| variant`` directive with a ``simd`` trait in the ``construct`` set is | ||
|
|
@@ -1391,8 +1414,9 @@ mapped to the vector function ``F`` according to the following rules: | |
|
|
||
| 1. ``isa("simd")`` targets Advanced SIMD function signatures. | ||
| 2. ``isa("sve")`` targets SVE function signatures. | ||
| 3. Either ``isa("simd")`` or ``isa("sve")`` must be specified. | ||
| 4. The ``arch`` traits of the ``device`` set is optional, and it | ||
| 3. ``isa("sc_sve")`` targets streaming-compatible SVE function signatures. | ||
| 4. One of ``isa("simd")``, ``isa("sve")`` or ``isa("sc_sve")`` must be specified. | ||
| 5. The ``arch`` traits of the ``device`` set is optional, and it | ||
| accepts any value that can be passed to the compiler via the | ||
| command line option ``-march``. | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative would be that streaming/streaming-compatible regions call the scalar functions? Or make all vector functions streaming-compatible. Which would avoid the name mangling, but lose performance.
Ideally a streaming-compatible target would be universally compatible so that we wouldn't need a non-streaming and streaming-compatible implementation, although I don't think we can represent that with name-mangling. I guess a synonym or wrapper could be used in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling scalar routines in a streaming-section would be highly detrimental to performance (we have evidence to back that). Yes, making all routines streaming compatible would indeed be a performance loss in rather important routines.
Do you mean that no calls to streaming compatible vector math routines would be emitted unless the target supports all instructions (
FEAT_SME_FA64)? In this case yes a single optimized versions can most likely be provided.And also we cannot suddenly add a new attribute to the existing symbol without breaching ABI, right?
Could you elaborate on "synonym" and "wrapper" a little? Is this something the VFABI should specify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to forgive me, that comment was 4 Months ago so I've forgotton a lot of the context.
What I think I was getting at was that if you had a streaming compatible implementation, then that would be sufficient (if not optimal) for SVE, streaming and streaming-compatible callers. If an implementation wanted to provide only one "streaming compatible" implementation (for whatever reason), then if functions were name mangled then they'd need to define symbols for each of the name manglings. These could be wrappers that just call the streaming compatible" implementation, or they could be aliases (symbols defined at the same address) as the streaming compatible implementation.
Whether this is at all useful or not I don't know. It maybe that the people using the vfabi are only using it because of maximum performance so there would always be an optimal implementation for each case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got you! Well remembered and good point.
Agreed that wrappers/aliases could be used to provide both manglings when only a single implementation is available (SVE or SSVE), which is a cheap way to get decent (not optimal) SVE and SSVE performance.
Yes, this is the case for a significant amount of implementations (e.g. SLEEF, AOR, libmvec).
Also agreed that VFABI may often be associated with best performance (e.g. AAVPCS), so we think the new mangling makes sense in that respect. Not to mention it offers a way to avoid performance drops due to mode switch (in math.h-heavy workloads).