[WIP] Use asm developed externally in hipblaslt#3960
Closed
newling wants to merge 3 commits intoROCm:developfrom
Closed
[WIP] Use asm developed externally in hipblaslt#3960newling wants to merge 3 commits intoROCm:developfrom
newling wants to merge 3 commits intoROCm:developfrom
Conversation
perfci run on commit a0f93bc |
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (49.21%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #3960 +/- ##
===========================================
- Coverage 61.41% 61.33% -0.08%
===========================================
Files 627 627
Lines 113687 113854 +167
Branches 20034 20053 +19
===========================================
+ Hits 69813 69825 +12
- Misses 36016 36168 +152
- Partials 7858 7861 +3
*This pull request uses carry forward flags. Click here to find out more.
🚀 New features to boost your workflow:
|
a0f93bc to
fa9ef0b
Compare
perfci run on commit 91fdf22 |
Contributor
|
NOTE: Experimental, do not merge |
fa9ef0b to
4ff21b2
Compare
Contributor
|
PR #4384 is, in part, based off of the experimental work done here. We can probably close this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal of this PR
We have been provided with an AITER .co file that we want to use instead of RocRoller's generated kernel, but only when Origami predicts that the best macro-tile size is 256x256 (see task description here).
Summary of PR
Currently (before this PR), the approach hipblaslt takes is, at a high-level
This PR changes this to
Another PR being worked on by @awhittle3 and others uses a cleaner approach for packaging external kernels, and it will use the approach:
Where the difference above is that the Origami logic can be used more cleanly is deciding whether to use the externel kernel ('assembly direct').
This current PR has 2 environment variables, set as:
The first of these is hacky -- these .co files should be packaged correctly within the install of hipblaslt in production.
I have generated a toy .co file (see poc_co.cpp in this PR) for doing single precision gemm. To use
Running hipblaslt-bench, I see the following logging which shows that the custom kernel ran:
If I disable the environment variable with
export HIPBLASLT_ENABLE_DIRECT_ASSEMBLY=0, the custom kernel is not used.Integrating the AITER kernel
For the production .co that we want to use, we need to get information from running the kernel through AITER. Below I describe this (this step will be dependent on the source of the kernel).
Building AITER: My approach was
Run the test that exercises the kernel:
Kernels are cached after compilation. To remove the cache (important if the printing / kernel changes) I did:
For figuring out kernel arguments, the branch is https://github.com/newling/aiter/tree/printing_for_kernel_arg_debug. I dumped the kernel arguments as bytes to help verify that intergration glue into hipblaslt is correct.
Important to note: Getting the kernel arguments matching was not enough for this specific kernel, the scale values needed to have a special layout (tiling) in HBM. That was the trickiest part of getting the kernel to give numerically correct results in hipblaslt (thanks to @bethune-bryant and @bnemanich for decoding the AITER logic).