part: optimize node16 findIndex with SIMD instructions#145
Draft
dylandreimerink wants to merge 1 commit intomainfrom
Draft
part: optimize node16 findIndex with SIMD instructions#145dylandreimerink wants to merge 1 commit intomainfrom
dylandreimerink wants to merge 1 commit intomainfrom
Conversation
Go v1.26 introduced the experimental simd/archsimd package. This package
provides a convenient API for using SIMD instructions in Go on
supported architectures.
The academic paper on which our part implementation is based suggested
using SIMD instructions to optimize the findIndex method of node16. This
commit implements this optimization using the simd/archsimd package.
The algorithm works by loading the 16 bytes into a SIMD register. It
takes the search key and broadcasts it across all lanes of another SIMD
register such that all 16 bytes in that register have the same search
key. Then, a SIMD equal is performed between the two registers,
resulting in a mask where each bit indicates equality. We mask off any
bits that are not considered, in case the node16 has less than 16 keys.
If the mask is zero, we do the same but for the greater than op, so we
can determine which index the search key would need to be inserted to
maintain sorted order. For both the equality and greater than cases, we
count the trailing zeros to get an array index from the masks.
Since the simd/archsimd package is experimental, we have to put the
implementation in a separate file with a build tag. This should not
be nessecary once the simd/archsimd package is stabilized and can be
used without a build tag. However, this setup allows someone to take
advantage of the optimization by building with `GOEXPERIMENT=simd`.
Benchmarks on my machine show a 50%+ improvement in speed, for 16 byte
nodes.
```
benchstat before.txt after.txt
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb/part
cpu: 13th Gen Intel(R) Core(TM) i7-13800H
│ before.txt │ after.txt │
│ sec/op │ sec/op vs base │
_findIndex16-20 7.346n ± 1% 3.454n ± 2% -52.98% (p=0.000 n=10)
```
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Go v1.26 introduced the experimental simd/archsimd package. This package provides a convenient API for using SIMD instructions in Go on supported architectures.
The academic paper on which our part implementation is based suggested using SIMD instructions to optimize the findIndex method of node16. This commit implements this optimization using the simd/archsimd package.
The algorithm works by loading the 16 bytes into a SIMD register. It takes the search key and broadcasts it across all lanes of another SIMD register such that all 16 bytes in that register have the same search key. Then, a SIMD equal is performed between the two registers, resulting in a mask where each bit indicates equality. We mask off any bits that are not considered, in case the node16 has less than 16 keys. If the mask is zero, we do the same but for the greater than op, so we can determine which index the search key would need to be inserted to maintain sorted order. For both the equality and greater than cases, we count the trailing zeros to get an array index from the masks.
Since the simd/archsimd package is experimental, we have to put the implementation in a separate file with a build tag. This should not be nessecary once the simd/archsimd package is stabilized and can be used without a build tag. However, this setup allows someone to take advantage of the optimization by building with
GOEXPERIMENT=simd.Benchmarks on my machine show a 50%+ improvement in speed, for 16 byte nodes.