Full documentation for hipTensor is available at rocm.docs.amd.com/projects/hiptensor.
- Added element-wise binary operation support.
- Added element-wise trinary operation support.
- Added support for new GPU target gfx950.
- Added dynamic unary and binary operator support for element-wise operations and permutation.
- Added a CMake check for
f8datatype availability. - Added
hiptensorDestroyOperationDescriptorto free all resources related to the provided descriptor. - Added
hiptensorOperationDescriptorSetAttributeto set attribute of ahiptensorOperationDescriptor_tobject. - Added
hiptensorOperationDescriptorGetAttributeto retrieve an attribute of the providedhiptensorOperationDescriptor_tobject. - Added
hiptensorCreatePlanPreferenceto allocate thehiptensorPlanPreference_tand enabled users to limit the applicable kernels for a given plan or operation. - Added
hiptensorDestroyPlanPreferenceto free all resources related to the provided preference. - Added
hiptensorPlanPreferenceSetAttributeto set attribute of ahiptensorPlanPreference_tobject. - Added
hiptensorPlanGetAttributeto retrieve information about an already-created plan. - Added
hiptensorEstimateWorkspaceSizeto determine the required workspaceSize for the given operation. - Added
hiptensorCreatePlanto allocate ahiptensorPlan_tobject, select an appropriate kernel for a given operation and prepare a plan that encodes the execution. - Added
hiptensorDestroyPlanto free all resources related to the provided plan.
- Removed architecture support for gfx940 and gfx941.
- Generalized opaque buffer now for any descriptor.
- Replaced
hipDataTypewithhiptensorDataType_tfor all supported types, for example,HIP_R_32FtoHIPTENSOR_R_32F. - Replaced
hiptensorComputeType_twithhiptensorComputeDescriptor_tfor all supported types. - Replaced
hiptensorInitTensorDescriptorwithhiptensorCreateTensorDescriptor. - Changed handle type and API usage from
*handletohandle. - Replaced
hiptensorContractionDescriptor_twithhipTensorOperationDescriptor_t. - Replaced
hiptensorInitContractionDescriptorwithhiptensorCreateContraction. - Replaced
hiptensorContractionFind_twithhiptensorPlanPreference_t. - Replaced
hiptensorInitContractionFindwithhiptensorCreatePlanPreference. - Replaced
hiptensorContractionGetWorkspaceSizewithhiptensorEstimateWorkspaceSize. - Replaced
HIPTENSOR_WORKSPACE_RECOMMENDEDwithHIPTENSOR_WORKSPACE_DEFAULT. - Replaced
hiptensorContractionPlan_twithhiptensorPlan_t. - Replaced
hiptensorInitContractionPlanwithhiptensorCreatePlan. - Replaced
hiptensorContractionwithhiptensorContract. - Replaced
hiptensorPermutationwithhiptensorPermute. - Replaced
hiptensorReductionwithhiptensorReduce. - Replaced
hiptensorElementwiseBinarywithhiptensorElementwiseBinaryExecute. - Replaced
hiptensorElementwiseTrinarywithhiptensorElementwiseTrinaryExecute. - Removed function
hiptensorReductionGetWorkspaceSize.
- Added benchmarking suites for contraction, permutation, and reduction. YAML files are categorized into bench and validation folders for organization
- Added emulation test suites for contraction, permutation, and reduction
- Support has been added for changing the default data layout using the
HIPTENSOR_DEFAULT_STRIDES_COL_MAJORenvironment variable
- Used
GPU_TARGETSinstead ofAMDGPU_TARGETSincmakelists.txt - Binary sizes can be reduced on supported compilers by using the
--offload-compresscompiler flag
- Optimized the hyper-parameter selection algorithm for permutation
- For CMake bug workaround, set
CMAKE_NO_BUILTIN_CHRPATHwhenBUILD_OFFLOAD_COMPRESSis unset
- Added support for tensor reduction, including APIs, CPU reference, unit tests, and documentation
- ASAN builds only support xnack+ targets.
- ASAN builds use
-mcmodel=largeto accommodate library sizes greater than 2GB. - Updated the permute backend to accommodate changes to element-wise operations.
- Updated the actor-critic implementation.
- Split kernel instances to improve build times
- Fixed a bug in randomized tensor input data generation.
- Fixed the default strides calculation to be in column major order.
- Fixed a small memory leak by properly destroying HIP event objects in tests.
- Default strides calculations now follow column-major convention.
- Various documentation formatting updates and fixes.
- Added support for tensor permutation of ranks of 2, 3, 4, 5 and 6
- Added tests for tensor permutation of ranks of 2, 3, 4, 5 and 6
- Added support for tensor contraction of M6N6K6: M, N, K up to rank 6
- Added tests for tensor contraction of M6N6K6: M, N, K up to rank 6
- Added new test YAML parsing to support sequential parameters ordering
- Documentation updates for installation, programmer's guide and API reference
- Prefer amd-llvm-devel package before system LLVM library
- Preferred compilers changed to CC=amdclang CXX=amdclang++
- Updated actor-critic selection for new contraction kernel additions
- Fixed LLVM parsing crash
- Fixed memory consumption issue in complex kernels
- Work-around implemented for compiler crash during debug build
- Allow random modes ordering for tensor contractions
- API support for permutation of rank 4 tensors: f16 and f32
- New datatype support in contractions of rank 4: f16, bf16, complex f32, complex f64
- Added scale and bilinear contraction samples and tests for new supported data types
- Added permutation samples and tests for f16, f32 types
- Fixed bug in contraction calculation with data type f32
- Architecture support for gfx940, gfx941, and gfx942
- Client tests configuration parameters now support YAML file input format
- Doxygen now treats warnings as errors
- Client tests output redirections now behave accordingly
- Removed dependency static library deployment
- Security issues for documentation
- Compile issues in debug mode
- Corrected soft link for ROCm deployment
- Initial prototype enablement of hipTensor library that supports tensor operations
- Kernel selection support for Default and Actor-Critic algorithms
- API support for:
- Definition and contraction of rank 4 tensors
- Contextual logging and output redirection
- Kernel selection caching
- Data type support for f32 and f64
- Architecture support for gfx908 and gfx90a