Skip to content

20260115 applyWithType SVE implement#134

Open
sugar-little-bit wants to merge 4 commits intobytedance:mainfrom
sugar-little-bit:applyWithType
Open

20260115 applyWithType SVE implement#134
sugar-little-bit wants to merge 4 commits intobytedance:mainfrom
sugar-little-bit:applyWithType

Conversation

@sugar-little-bit
Copy link
Copy Markdown

@sugar-little-bit sugar-little-bit commented Jan 15, 2026

What problem does this PR solve?

Issue Number: close #127

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Added SVE vectorized implement for bytedance::bolt::functions::sparksql::hash::applyWithType

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Compared to before the optimization, there is an overall improvement of 2.5% in tpcds 1T database.
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Added SVE vectorized implement for bytedance::bolt::functions::sparksql::hash::applyWithType.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jan 15, 2026

CLA assistant check
All committers have signed the CLA.

#include <type/Type.h>
#include <vector/ComplexVector.h>
#include <cstdint>
#include <arm_sve.h>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file exist on x86 machines?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the arm_sve.h header file does not exist on standard x86 machines, as it is specific to the ARM architecture. I will modify the code to be compatible with both arm64 and x86 architectures.

If the current architecture is x86, then follow the original logic.

bits::forEachSetBit(rows.getBitData(), begin, end, func);
}
#else
rows.applyToSelected([&](int row) { result.set(row, hashSeed); });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the x86 architecture, does this line of code become vectorized execution code?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it only work on ARM architecture.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it only works on ARM architecture.

emm, I can express it more clearly in another way. The code you added is about SIMD optimization for the ARM architecture. Did you notice that rows.applyToSelected([&](int row) { result.set(row, hashSeed); }); gets compiled into SIMD instructions on the X86 architecture but does not become SIMD instructions on the ARM architecture, so you made this optimization?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that a function with similar functionality already exists in bolt/vector/SelectivityVector.h:

const uint64_t* allBits() const {
    return bits_.data();
  }

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have removed it.

@frankobe frankobe added the ARM ARM CPU specific support label Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM ARM CPU specific support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add SVE optimization for applyWithType function in Hash.cpp

7 participants