-
Notifications
You must be signed in to change notification settings - Fork 7
[Feature] Base/Row alignment support for Tensors #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zacharyvincze
wants to merge
45
commits into
ROCm:develop
Choose a base branch
from
zacharyvincze:zv/feature/add-memalign-class
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
742920c
Add MemAlignment to Tensor constructors
zacharyvincze f345ea0
Document all constructors for roccv::Tensor
zacharyvincze cf99a9c
Add MemAlign parameter to Tensor::CalcStrides()
zacharyvincze 21f0cb8
Added Tensor copy constructor an Tensor::dataSize() implementation
zacharyvincze 0a761ce
Remove copy constructor for roccv::Tensor
zacharyvincze fd84334
Implement memory padding and alignment for tensors
zacharyvincze c232d1f
Fix alignment calculations
zacharyvincze c90475b
Take padding into account during copies in test helpers
zacharyvincze 18d0890
Add isContiguous member function for roccv::Tensor
zacharyvincze e97207a
Remove non-contiguous reshaping from tests
zacharyvincze 6102764
Update utils.hpp to use memcpy2D to handle tensor padding
zacharyvincze 12ff5e9
Fix incorrect imageBytes calculation
zacharyvincze e562ebc
Use HIP streams where possible
zacharyvincze cae2e8e
Fix CopyMakeBorder and GammaContrast samples
zacharyvincze 6c42212
Fix WarpPerspective sample
zacharyvincze f7f3839
Fix CustomCrop sample
zacharyvincze 1063baa
Fix BilateralFilter sample
zacharyvincze 612cfb6
Fix CenterCrop sample
zacharyvincze b1ae692
Fix/cleanup BndBox sample
zacharyvincze e85d3a4
Fix/clean Composite sample
zacharyvincze d544ae6
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze 1d1879b
Fix crop and resize example
zacharyvincze b7cff20
Add more information to the help message
zacharyvincze ebabcfb
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze 370c337
Fix tensor copies for benchmarking suite
zacharyvincze 3ee9636
Fix tensor padding calculations
zacharyvincze 8c0b2fb
Add documentation to helper function
zacharyvincze 8073969
Reuse reshape logic
zacharyvincze a0fc480
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze 97efadb
Temp commit
zacharyvincze 53c02b0
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze 1ffdce0
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze 6ded6bf
Implement tensor reshape/reinterpret methods
zacharyvincze fb1cb44
Remove unused MemcpyParams from benchmark helpers
zacharyvincze a14435e
Add copyFromHost/copyToHost method definitions for Tensor
zacharyvincze dd11f3c
Use copyTo/copyFromHost implementations for test helpers
zacharyvincze 8ffcc95
Add tensor copy correctness tests
zacharyvincze 1ac14a9
Properly convert element-wise strides to byte-wise strides on DLPack
zacharyvincze add8a66
Add byte_offset from DLPack to rocCV tensor base pointer
zacharyvincze 2affddb
Add sensible default alignment for CPU allocated tensors
zacharyvincze 14932ec
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze ff67d89
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze ea91e85
Address PR comments
zacharyvincze ae83206
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze 30ba649
Merge branch 'develop' into zv/feature/add-memalign-class
zacharyvincze File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| /* | ||
| * Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Change year to 2026. |
||
| * Permission is hereby granted, free of charge, to any person obtaining a copy | ||
| * of this software and associated documentation files (the "Software"), to deal | ||
| * in the Software without restriction, including without limitation the rights | ||
| * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
| * copies of the Software, and to permit persons to whom the Software is | ||
| * furnished to do so, subject to the following conditions: | ||
| * | ||
| * The above copyright notice and this permission notice shall be included in | ||
| * all copies or substantial portions of the Software. | ||
| * | ||
| * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
| * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
| * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
| * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
| * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
| * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
| * THE SOFTWARE. | ||
| */ | ||
|
|
||
| #pragma once | ||
|
|
||
| #include <stdint.h> | ||
|
|
||
| namespace roccv { | ||
|
|
||
| constexpr int32_t ROCCV_CPU_DEFAULT_ALIGNMENT = 64; // Default alignment for CPU memory | ||
|
|
||
| /** | ||
| * @class MemAlignment | ||
| * @brief Class for specifying memory alignment constraints for buffer allocations. | ||
| * | ||
| * The MemAlignment class allows you to specify the memory alignment in bytes that should | ||
| * be used when allocating or working with device or host memory buffers. Proper alignment | ||
| * is important for performance reasons and hardware compatibility, especially for GPU operations. | ||
| * | ||
| * There are two types of alignment constraints: | ||
| * - Base address alignment: Alignment requirements for the starting address of the buffer. | ||
| * - Row address alignment: Alignment requirements for the starting address of each row, which is | ||
| * important for multi-dimensional data (e.g., images or tensors). | ||
| * | ||
| * Both alignment values default to 0, which implies that the system or device default alignment will be used. | ||
| * | ||
| * Example usage: | ||
| * @code | ||
| * roccv::MemAlignment align; | ||
| * align.baseAddr(256).rowAddr(128); | ||
| * @endcode | ||
| * | ||
| * @see roccv::Tensor | ||
| */ | ||
| class MemAlignment { | ||
| public: | ||
| MemAlignment() = default; | ||
|
|
||
| /** | ||
| * @brief Returns the base address alignment. | ||
| * | ||
| * @return The base address alignment. | ||
| */ | ||
| int32_t baseAddr() const; | ||
|
|
||
| /** | ||
| * @brief Returns the row address alignment. | ||
| * | ||
| * @return The row address alignment. | ||
| */ | ||
| int32_t rowAddr() const; | ||
|
|
||
| /** | ||
| * @brief Sets the base address alignment. | ||
| * | ||
| * @param[in] alignment Alignment in bytes. | ||
| * @return A reference to this object, with the base address set. | ||
| */ | ||
| MemAlignment& baseAddr(int32_t alignment); | ||
|
|
||
| /** | ||
| * @brief Sets the row address alignment. | ||
| * | ||
| * @param[in] alignment Alignment in bytes. | ||
| * @return A reference to this object, with the row address set. | ||
| */ | ||
| MemAlignment& rowAddr(int32_t alignment); | ||
|
|
||
| private: | ||
| int32_t m_baseAddrAlignment = 0; | ||
| int32_t m_rowAddrAlignment = 0; | ||
| }; | ||
| } // namespace roccv | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to only work for U8/S8. For other integer types, for example U16, numElements is the number of U16 values. Then rocrand_generate_char() only generates numElements of U8 values. For integer types, we should set numElements = tensor.dataSize() if only rocrand_generate_char() is used. Otherwise, we need to call rocrand_generate_XYZ() for each integer type.