Draft
Conversation
Temporary "const char *" objects can disappear while getting to the parser internals. Moving strings to parse into a permanent container solves the problem.
src1 with different data types cannot be fused.
And refactored sdpa primitive integration for better compilation performance. Currently the new kernel only supports floating point sdpa.
This allows oneDNN to build successfully with GCC 7.x
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Level Zero query currently returns wrong result wrt support for atomics by the device. This commit reverts to using ocl query until the issue is fixed in level zero.
3.5 squash list: [FORK][FIX] Corrected brgemm rd_step for bf16 compressed weights
3.5 squash list: [Fork][Fix] Fix avx2 bf16 reorder
[FORK][FEATURE] Enable avx2 jit reorder for bf16 data type
* fix matmul decompress test case Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * save tmp Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * [FORK][FIX] IP weights compression: scalar scale [FORK][FEATURE] InnerProduct primitive: squashed weight decompression Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * [FORK][FIX] IP weights compression: max bcast blocking computation [FORK][FEATURE] InnerProduct primitive: squashed weight decompression * fix compile issue Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * fix crash issue Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * try to fix compare issue Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * contiue fix some accrucy issue Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * fix f4_e2m1 Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * continue to fix f4e2m1 Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * fix confict on smoke_FC_(2|3)D_I8_sparse Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * clean debug and unused code Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * revert this change, should affect test case Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> --------- Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> Co-authored-by: dmitrygo <dmitry.gorokhov@intel.com>
Co-authored-by: Xuxin, Zeng <xuxin.zeng@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
[FORK][FEATURE] InnerProduct primitive: squashed weight decompression
- allocate aux accums regs on stack - precompute grouped src sums - optimize pointer arithmetic - reduce aux vecs count requred for the microkernel
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
after migration to 3.8, the default value of runtime_scale_t is undef instead of f32 Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
dzarukin
reviewed
Jul 10, 2025
| } | ||
|
|
||
| inline float load_float_value(data_type_t dt, const void *ptr, dim_t idx) { | ||
| FORCE_INLINE float load_float_value(data_type_t dt, const void *ptr, dim_t idx) { |
There was a problem hiding this comment.
I would assume this is done for performance improvement. Probably a better alternative is to change the loading method in the kernel/implementation of interest. The switch below is mostly the killer of any benefits. The function was not designed to be performant in any way.
Feel free to resolve, it's just a general observation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please include a summary of the change. Please also include relevant motivation and context. See contribution guidelines for more details. If the change fixes an issue not documented in the project's Github issue tracker, please document all steps necessary to reproduce it.
Fixes # (github issue)
Checklist
General
make testandmake test_benchdnn_*) pass locally for each commit?Performance improvements
New features
Bug fixes
RFC PR