[Feature] Base/Row alignment support for Tensors#136
[Feature] Base/Row alignment support for Tensors#136zacharyvincze wants to merge 45 commits intoROCm:developfrom
Conversation
Review: [Feature] Base/Row alignment support for TensorsMajor feature addition for Tensor memory layout: Key additions:
Assessment: Needs Review - Large architectural change. This is a substantial PR (+2010/-1165 lines across 24 files) that changes core tensor semantics. The approach looks solid - explicit handling of padding makes the API clearer. Questions for maintainers:
Well documented and includes tests. Ready for deep review. |
There was a problem hiding this comment.
Pull request overview
This PR adds base-pointer and per-row alignment (padding) support to roccv::Tensor, updating stride/reshape semantics and introducing host<->device copy helpers that are padding-aware. It also updates samples, Python bindings, benchmarks, and tests to operate correctly with non-contiguous (row-padded) tensors.
Changes:
- Introduce
MemAlignmentand updateTensorallocation/stride/reshape logic to support row/base alignment and non-contiguous layouts. - Add
Tensor::copyFromHost/Tensor::copyToHostand refactor tests/helpers/samples to use padding-aware copies. - Update Python DLPack import to honor provided strides (and convert element-strides to byte-strides), with a contiguous fallback.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/core/tensor.cpp |
Core implementation updates: alignment-aware stride calc, reshape for non-contiguous tensors, host copy helpers. |
include/core/tensor.hpp |
Public API updates for alignment-aware construction + new copy/contiguity helpers. |
include/core/mem_alignment.hpp / src/core/mem_alignment.cpp |
New alignment configuration type used during tensor allocation. |
include/core/utils.hpp |
New low-level helpers (power-of-two + align-up) used for alignment/stride logic. |
src/core/tensor_storage.cpp / include/core/tensor_storage.hpp |
Expose allocator used by TensorStorage. |
src/op_non_max_suppression.cpp |
Update reshape call sites to new reshape(dtype, shape) signature. |
python/src/py_tensor.cpp / python/include/py_tensor.hpp |
DLPack import correctness fixes + documentation/typing cleanup. |
tests/roccv/cpp/src/tests/core/tensor/test_tensor.cpp |
Add/adjust tensor tests for padding/reshape/copy helpers. |
tests/roccv/cpp/include/test_helpers.hpp |
Refactor host<->device copy helpers to use new tensor copy APIs. |
samples/common/utils.hpp |
Add batch image load/write utilities that respect tensor padding. |
samples/*.cpp (multiple) |
Update CLI parsing + use new image IO helpers (padding-aware). |
benchmarks/src/roccv/roccv_bench_helpers.cpp |
Update benchmark filling logic to account for padded allocation sizes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #136 +/- ##
===========================================
+ Coverage 74.26% 74.32% +0.06%
===========================================
Files 79 82 +3
Lines 3357 3548 +191
Branches 738 776 +38
===========================================
+ Hits 2493 2637 +144
- Misses 382 416 +34
- Partials 482 495 +13
🚀 New features to boost your workflow:
|
| if constexpr (std::is_integral_v<T>) { | ||
| rocrand_generate_char(m_gen, static_cast<unsigned char*>(tensor_data.basePtr()), | ||
| tensor.shape().size() * tensor.dtype().size()); | ||
| rocrand_generate_char(m_gen, static_cast<unsigned char*>(tensor_data.basePtr()), numElements); |
There was a problem hiding this comment.
This appears to only work for U8/S8. For other integer types, for example U16, numElements is the number of U16 values. Then rocrand_generate_char() only generates numElements of U8 values. For integer types, we should set numElements = tensor.dataSize() if only rocrand_generate_char() is used. Otherwise, we need to call rocrand_generate_XYZ() for each integer type.
| @@ -0,0 +1,91 @@ | |||
| /* | |||
| * Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. | |||
PR Description
General
roccv::Tensor.MemAlignclass which is used as a parameter when allocating memory for Tensors. This specifies what the row alignment and base alignment should be for newly created tensors.Samples
Tensors
copyToHostandcopyFromHostmethods to the tensor, which accepts and produces contiguous host buffers. Since padding makes copying memory from the host to the device non-trivial, this allows the user to perform a copy which is agnostic to the underlying row padding that a tensor may have.Python bindings (rocpycv)
Tests
Tensor::copyFromHost,Tensor::copyToHostdirectly instead of using duplicate code.roccv::Tensorare covered. Special attention is paid to the reshape method, since this has undergone the most changes.