Description
I noticed a TODO comment at hll/coupon_list.go:193:
// TODO there must be a more efficient to reinterpret the byte array as an int array
for it := 0; it < couponCount; it++ {
list.couponIntArr[it] = int(binary.LittleEndian.Uint32(byteArray[listIntArrStart+it*4 : listIntArrStart+it*4+4]))
}
I'd like to discuss potential optimizations for this code path.
Research
I investigated how other DataSketches implementations handle this:
| Language |
Approach |
| Java |
Memory.getIntArray() - bulk copy using Unsafe.copyMemory() |
| C++ |
std::memcpy() - direct memory copy |
| Go (current) |
Loop with binary.LittleEndian.Uint32() |
Benchmark Results
I ran some benchmarks comparing different Go approaches:
| Method |
8 items |
32 items |
128 items |
1024 items |
| Current (loop) |
27.6 ns |
75.4 ns |
270 ns |
2153 ns |
| binary.Read |
101 ns |
237 ns |
692 ns |
5112 ns |
| Precompute offset |
26.8 ns |
69.9 ns |
253 ns |
2000 ns |
| Loop unrolling |
27.6 ns |
69.7 ns |
248 ns |
1998 ns |
Key findings:
binary.Read is actually slower due to reflection and extra allocations
- Pre-computing the offset provides ~7% improvement
- The current implementation is already quite efficient
Proposed Changes
Option A: Minor optimization (pre-compute offset)
base := listIntArrStart
for it := 0; it < couponCount; it++ {
list.couponIntArr[it] = int(binary.LittleEndian.Uint32(byteArray[base:]))
base += 4
}
- ~7% performance improvement
- Cleaner code
- Remove or update TODO comment
Option B: Use unsafe package
src := byteArray[listIntArrStart : listIntArrStart+couponCount*4]
int32Slice := unsafe.Slice((*int32)(unsafe.Pointer(&src[0])), couponCount)
- Near-zero overhead (similar to C++ memcpy)
- Introduces
unsafe dependency
- May have portability concerns
Option C: Keep current implementation
The current implementation is already quite efficient. Perhaps just update the TODO to:
// Note: Alternative approaches (binary.Read, unsafe) were benchmarked but did not
// provide significant improvements. The current loop-based approach is optimal for Go.
Questions for Maintainers
- Would you accept Option A (minor optimization) as a PR?
- Is the project open to using
unsafe for performance-critical paths?
- Should the TODO be updated/removed to reflect that the current approach is adequate?
Environment
- Go version: 1.25.5
- OS: Windows
- CPU: Intel Core i7-10700K
I'm happy to submit a PR for any of these options based on your feedback!
Description
I noticed a TODO comment at
hll/coupon_list.go:193:I'd like to discuss potential optimizations for this code path.
Research
I investigated how other DataSketches implementations handle this:
Memory.getIntArray()- bulk copy usingUnsafe.copyMemory()std::memcpy()- direct memory copybinary.LittleEndian.Uint32()Benchmark Results
I ran some benchmarks comparing different Go approaches:
Key findings:
binary.Readis actually slower due to reflection and extra allocationsProposed Changes
Option A: Minor optimization (pre-compute offset)
Option B: Use
unsafepackageunsafedependencyOption C: Keep current implementation
The current implementation is already quite efficient. Perhaps just update the TODO to:
Questions for Maintainers
unsafefor performance-critical paths?Environment
I'm happy to submit a PR for any of these options based on your feedback!