Open
Conversation
27f8713 to
4c2b42b
Compare
d538558 to
d8902e2
Compare
d8902e2 to
7ad75e2
Compare
Hugobros3
reviewed
Aug 5, 2024
| aql.kernarg_address = kernel_info.kernarg_segment; | ||
| aql.private_segment_size = kernel_info.private_segment_size; | ||
| aql.group_segment_size = kernel_info.group_segment_size; | ||
| aql.group_segment_size = (kernel_info.group_segment_size + 15) / 16 * kernel_info.group_segment_size + launch_params.lmem; |
Contributor
There was a problem hiding this comment.
relevance of this change (the rounding part) ?
Contributor
Author
There was a problem hiding this comment.
It's to account for padding to align the start of the dynamic allocation.
stlemme
reviewed
Aug 5, 2024
| #[import(cc = "thorin", name = "opencl")] fn opencl_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> (); | ||
| #[import(cc = "thorin", name = "amdgpu_hsa")] fn amdgpu_hsa_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> (); | ||
| #[import(cc = "thorin", name = "amdgpu_pal")] fn amdgpu_pal_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> (); | ||
| #[import(cc = "thorin")] fn local_memory() -> &mut addrspace(3)[u8]; |
Member
There was a problem hiding this comment.
Rename that to shared_memory_base to be more specific
Contributor
Author
There was a problem hiding this comment.
We decided to call it "local memory" since that's a more adequate/common moniker outside of CUDA. But we can call it local_memory_base() I guess.
Comment on lines
48
to
52
| fn @@cuda(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { cuda_with_lmem(dev, grid, block, 0, body) } | ||
| fn @@nvvm(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { nvvm_with_lmem(dev, grid, block, 0, body) } | ||
| fn @@opencl(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { opencl_with_lmem(dev, grid, block, 0, body) } | ||
| fn @@amdgpu_hsa(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { amdgpu_hsa_with_lmem(dev, grid, block, 0, body) } | ||
| fn @@amdgpu_pal(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { amdgpu_pal_with_lmem(dev, grid, block, 0, body) } |
Member
There was a problem hiding this comment.
@@ should be @ at the function
7ad75e2 to
3c6e9a4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds a parameter to allocate a given amount of dynamic shared memory upon kernel launch. Wrapper functions that just pass 0 are provided for backwards compatibility with existing code. Currently implemented for CUDA only, other platforms will error.
corresponding thorin changes: AnyDSL/thorin#144