Conversation
|
Turns out there are a bunch of new |
On the MacBook Air M4Main @ 6e7b616 (ISPC 1.20...)This
|
|
Looks like performance is not restored in 1.28, or we're still doing something wrong. Barely any change compared against 1.27 (which was 24% slower than This PR @ 00d0256❯ cargo bench
Downsample `square_test.png` using ispc_downsampler
time: [48.412 ms 48.475 ms 48.539 ms]
change: [+29.098% +29.471% +29.821%] (p = 0.00 < 0.05)
Performance has regressed. |
|
Re-running this test on my host, recompiled on this ISPC version: ❯ ispc --version
Intel(r) Implicit SPMD Program Compiler (Intel(r) ISPC), 1.28.2 (build commit @ 20250924, LLVM 20.1.8)On latest ❯ cargo bench
Downsample `square_test.png` using ispc_downsampler
time: [46.776 ms 46.875 ms 46.969 ms]Then following the suggestion from @Jasper-Bekkers in Traverse-Research/intel-tex-rs-2#42 to only use i32x4 because NEON is 128-bits slightly regresses performance: ❯ cargo bench
Downsample `square_test.png` using ispc_downsampler
time: [48.003 ms 48.101 ms 48.196 ms]
change: [+2.3395% +2.6161% +2.9034%] (p = 0.00 < 0.05)
Performance has regressed.Also, this M4 chip is supposed to save SME (Scalable Matrix Extensions) but not SVE (Scalable Vector Extensions) and confirmed with Perhaps this needs to be reported upstream as I'm slightly out of ideas how to best bisect this compiler performance regression. |
|
Just went back in history to generate the blobs for all missing versions: ISPC
|
|
Yeah I closed thar PR because later I realized why there was a big delta: I was profiling on battery. |
|
@Jasper-Bekkers Oh I'm also exclusively developing on battery (the perks of Apple putting RTGs in these MacBooks 🤤) but the ±37ms vs ±45ms regression remains consistent. |
|
Hey all, apologies for the delay. I'm going to tag @aneshlya but I would create an issue on the ispc GitHub and link this issue. That's the best way to report these kinds of things right now. |
|
Hi there, I don't have Apple HW to test performance on, but it would be helpful if you can share the ispc code corresponding to the benchmark. I have a theory what ispc commit is guilty. I can create a custom build of ispc for you to test in your environment and check if the regression disappears (if you're OK with it). As Pete mentioned, don't hesitate to submit such bugs to ispc/ispc repo. |
|
Thanks for the reply @pbrubaker. Best to track this at the official ISPC repo as I was merely using this pull request to first bisect and track where the performance regression was happening and/or if our invocation arguments are at fault. Issue is opened at ispc/ispc#3688. @aneshlya I'd be more than happy to try a custom-built |
| #[allow(clippy::unnecessary_operation, clippy::identity_op)] | ||
| const _: () = { | ||
| ["Size of WeightDimensions"][::std::mem::size_of::<WeightDimensions>() - 12usize]; | ||
| ["Alignment of WeightDimensions"][::std::mem::align_of::<WeightDimensions>() - 4usize]; |
There was a problem hiding this comment.
bindgen picks the latest MSRV by default, we should perhaps see if it can be configured through ispc_compile if this is breaking; and set our rust-version in Cargo.toml accordingly too.
Performance is indeed mostly restored in 1.30, thanks @aneshlya! |
Glad to hear it! |
Thanks for checking! I've also added |
|
That little 2% regression mostly seems to be noise, it fluctuates a bit on retesting, and doesn't seem to be affected by locally compiling the ISPC kernel (as opposed to the one from CI, i.e. with slightly different host detection). The suggested |
https://github.com/ispc/ispc/releases/tag/v1.30.0
https://github.com/Traverse-Research/ispc-downsampler/actions/runs/21717824119