Skip to content

Latest commit

 

History

History
161 lines (115 loc) · 6.84 KB

File metadata and controls

161 lines (115 loc) · 6.84 KB

Changelog for hipTensor

Full documentation for hipTensor is available at rocm.docs.amd.com/projects/hiptensor.

(Unreleased) hipTensor 2.0.0 for ROCm 7.0.0

Added

  • Added element-wise binary operation support.
  • Added element-wise trinary operation support.
  • Added support for new GPU target gfx950.
  • Added dynamic unary and binary operator support for element-wise operations and permutation.
  • Added a CMake check for f8 datatype availability.
  • Added hiptensorDestroyOperationDescriptor to free all resources related to the provided descriptor.
  • Added hiptensorOperationDescriptorSetAttribute to set attribute of a hiptensorOperationDescriptor_t object.
  • Added hiptensorOperationDescriptorGetAttribute to retrieve an attribute of the provided hiptensorOperationDescriptor_t object.
  • Added hiptensorCreatePlanPreference to allocate the hiptensorPlanPreference_t and enabled users to limit the applicable kernels for a given plan or operation.
  • Added hiptensorDestroyPlanPreference to free all resources related to the provided preference.
  • Added hiptensorPlanPreferenceSetAttribute to set attribute of a hiptensorPlanPreference_t object.
  • Added hiptensorPlanGetAttribute to retrieve information about an already-created plan.
  • Added hiptensorEstimateWorkspaceSize to determine the required workspaceSize for the given operation.
  • Added hiptensorCreatePlan to allocate a hiptensorPlan_t object, select an appropriate kernel for a given operation and prepare a plan that encodes the execution.
  • Added hiptensorDestroyPlan to free all resources related to the provided plan.

Changed

  • Removed architecture support for gfx940 and gfx941.
  • Generalized opaque buffer now for any descriptor.
  • Replaced hipDataType with hiptensorDataType_t for all supported types, for example, HIP_R_32F to HIPTENSOR_R_32F.
  • Replaced hiptensorComputeType_t with hiptensorComputeDescriptor_t for all supported types.
  • Replaced hiptensorInitTensorDescriptor with hiptensorCreateTensorDescriptor.
  • Changed handle type and API usage from *handle to handle.
  • Replaced hiptensorContractionDescriptor_t with hipTensorOperationDescriptor_t.
  • Replaced hiptensorInitContractionDescriptor with hiptensorCreateContraction.
  • Replaced hiptensorContractionFind_t with hiptensorPlanPreference_t.
  • Replaced hiptensorInitContractionFind with hiptensorCreatePlanPreference.
  • Replaced hiptensorContractionGetWorkspaceSize with hiptensorEstimateWorkspaceSize.
  • Replaced HIPTENSOR_WORKSPACE_RECOMMENDED with HIPTENSOR_WORKSPACE_DEFAULT.
  • Replaced hiptensorContractionPlan_t with hiptensorPlan_t.
  • Replaced hiptensorInitContractionPlan with hiptensorCreatePlan.
  • Replaced hiptensorContraction with hiptensorContract.
  • Replaced hiptensorPermutation with hiptensorPermute.
  • Replaced hiptensorReduction with hiptensorReduce.
  • Replaced hiptensorElementwiseBinary with hiptensorElementwiseBinaryExecute.
  • Replaced hiptensorElementwiseTrinary with hiptensorElementwiseTrinaryExecute.
  • Removed function hiptensorReductionGetWorkspaceSize.

hipTensor 1.5.0 for ROCm 6.4.0

Added

  • Added benchmarking suites for contraction, permutation, and reduction. YAML files are categorized into bench and validation folders for organization
  • Added emulation test suites for contraction, permutation, and reduction
  • Support has been added for changing the default data layout using the HIPTENSOR_DEFAULT_STRIDES_COL_MAJOR environment variable

Changed

  • Used GPU_TARGETS instead of AMDGPU_TARGETS in cmakelists.txt
  • Binary sizes can be reduced on supported compilers by using the --offload-compress compiler flag

Optimized

  • Optimized the hyper-parameter selection algorithm for permutation

Resolved issues

  • For CMake bug workaround, set CMAKE_NO_BUILTIN_CHRPATH when BUILD_OFFLOAD_COMPRESS is unset

hipTensor 1.4.0 for ROCm 6.3.0

Added

  • Added support for tensor reduction, including APIs, CPU reference, unit tests, and documentation

Changed

  • ASAN builds only support xnack+ targets.
  • ASAN builds use -mcmodel=large to accommodate library sizes greater than 2GB.
  • Updated the permute backend to accommodate changes to element-wise operations.
  • Updated the actor-critic implementation.

Optimized

  • Split kernel instances to improve build times

Resolved issues

  • Fixed a bug in randomized tensor input data generation.
  • Fixed the default strides calculation to be in column major order.
  • Fixed a small memory leak by properly destroying HIP event objects in tests.
  • Default strides calculations now follow column-major convention.
  • Various documentation formatting updates and fixes.

hipTensor 1.3.0 for ROCm 6.2.0

Additions

  • Added support for tensor permutation of ranks of 2, 3, 4, 5 and 6
  • Added tests for tensor permutation of ranks of 2, 3, 4, 5 and 6
  • Added support for tensor contraction of M6N6K6: M, N, K up to rank 6
  • Added tests for tensor contraction of M6N6K6: M, N, K up to rank 6
  • Added new test YAML parsing to support sequential parameters ordering

Changes

  • Documentation updates for installation, programmer's guide and API reference
  • Prefer amd-llvm-devel package before system LLVM library
  • Preferred compilers changed to CC=amdclang CXX=amdclang++
  • Updated actor-critic selection for new contraction kernel additions

Fixes

  • Fixed LLVM parsing crash
  • Fixed memory consumption issue in complex kernels
  • Work-around implemented for compiler crash during debug build
  • Allow random modes ordering for tensor contractions

hipTensor 1.2.0 for ROCm 6.1.0

Additions

  • API support for permutation of rank 4 tensors: f16 and f32
  • New datatype support in contractions of rank 4: f16, bf16, complex f32, complex f64
  • Added scale and bilinear contraction samples and tests for new supported data types
  • Added permutation samples and tests for f16, f32 types

Fixes

  • Fixed bug in contraction calculation with data type f32

hipTensor 1.1.0 for ROCm 6.0.0

Additions

  • Architecture support for gfx940, gfx941, and gfx942
  • Client tests configuration parameters now support YAML file input format

Changes

  • Doxygen now treats warnings as errors

Fixes

  • Client tests output redirections now behave accordingly
  • Removed dependency static library deployment
  • Security issues for documentation
  • Compile issues in debug mode
  • Corrected soft link for ROCm deployment

hipTensor 1.0.0 for ROCm 5.7.0

Additions

  • Initial prototype enablement of hipTensor library that supports tensor operations
  • Kernel selection support for Default and Actor-Critic algorithms
  • API support for:
    • Definition and contraction of rank 4 tensors
    • Contextual logging and output redirection
    • Kernel selection caching
  • Data type support for f32 and f64
  • Architecture support for gfx908 and gfx90a