Regenerate binaries on ISPC 1.30.0 by MarijnS95 · Pull Request #60 · Traverse-Research/ispc-downsampler

MarijnS95 · 2025-05-21T08:36:44Z

https://github.com/ispc/ispc/releases/tag/v1.30.0
https://github.com/Traverse-Research/ispc-downsampler/actions/runs/21717824119

MarijnS95 · 2025-05-21T09:12:26Z

Turns out there are a bunch of new generic target ISAs to streamline which vector sizes/widths to select, as well as Apple-specific CPU targets :)

MarijnS95 · 2025-05-26T10:30:49Z

On the MacBook Air M4

Main @ `6e7b616` (ISPC 1.20...)

Downsample `square_test.png` using ispc_downsampler
                        time:   [38.827 ms 38.848 ms 38.884 ms]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

This `ispc-1.27` PR @ `3556673`

Downsample `square_test.png` using ispc_downsampler
                        time:   [48.220 ms 48.253 ms 48.287 ms]
                        change: [+24.103% +24.190% +24.278%] (p = 0.00 < 0.05)
                        Performance has regressed.

Recompiling locally on the M4 Air (ispc 1.27.0 from brew using cargo b -rF ispc):

Downsample `square_test.png` using ispc_downsampler
                        time:   [46.576 ms 46.586 ms 46.596 ms]
                        change: [-3.5237% -3.4550% -3.3855%] (p = 0.00 < 0.05)
                        Performance has improved.

That's a significant performance deficit, which we should investigate before merging. Even playing around with the new CPU flags from Twinklebear/ispc-rs#42, or the generic ISAs, or removing .target_isas() altogether to compile natively for the host yields no improvement.

Funny thing is, with the ISPC test this M4 Air whines a little, but it doesn't during resize 😓

MarijnS95 · 2025-08-13T20:45:16Z

Looks like performance is not restored in 1.28, or we're still doing something wrong. Barely any change compared against 1.27 (which was 24% slower than main per the above):

This PR @ `00d0256`

❯ cargo bench
Downsample `square_test.png` using ispc_downsampler
                        time:   [48.412 ms 48.475 ms 48.539 ms]
                        change: [+29.098% +29.471% +29.821%] (p = 0.00 < 0.05)
                        Performance has regressed.

MarijnS95 · 2025-12-24T09:13:44Z

Re-running this test on my host, recompiled on this ISPC version:

❯ ispc --version
Intel(r) Implicit SPMD Program Compiler (Intel(r) ISPC), 1.28.2 (build commit  @ 20250924, LLVM 20.1.8)

On latest main @ f2ddfab (but not using those prebuilts)

❯ cargo bench
Downsample `square_test.png` using ispc_downsampler
                        time:   [46.776 ms 46.875 ms 46.969 ms]

Then following the suggestion from @Jasper-Bekkers in Traverse-Research/intel-tex-rs-2#42 to only use i32x4 because NEON is 128-bits slightly regresses performance:

❯ cargo bench
Downsample `square_test.png` using ispc_downsampler
                        time:   [48.003 ms 48.101 ms 48.196 ms]
                        change: [+2.3395% +2.6161% +2.9034%] (p = 0.00 < 0.05)
                        Performance has regressed.

Also, this M4 chip is supposed to save SME (Scalable Matrix Extensions) but not SVE (Scalable Vector Extensions) and confirmed with sysctl -a hw.optional (and NEON is confirmed as well).

Perhaps this needs to be reported upstream as I'm slightly out of ideas how to best bisect this compiler performance regression.

MarijnS95 · 2025-12-24T10:38:36Z

Just went back in history to generate the blobs for all missing versions:

ISPC `1.23` @ `754d4bf`

Downsample `square_test.png` using ispc_downsampler
                        time:   [37.430 ms 37.550 ms 37.666 ms]

ISPC `1.24` @

Downsample `square_test.png` using ispc_downsampler
                        time:   [37.180 ms 37.317 ms 37.454 ms]
                        change: [-1.1514% -0.6188% -0.1473%] (p = 0.01 < 0.05)
                        Change within noise threshold.

ISPC `1.25.3`

Downsample `square_test.png` using ispc_downsampler
                        time:   [38.024 ms 38.151 ms 38.315 ms]
                        change: [+1.6876% +2.2352% +2.8037%] (p = 0.00 < 0.05)
                        Performance has regressed.

ISPC `1.26`

Downsample `square_test.png` using ispc_downsampler
                        time:   [49.251 ms 49.422 ms 49.588 ms]
                        change: [+29.523% +30.093% +30.690%] (p = 0.00 < 0.05)
                        Performance has regressed.

1.26 is where this regression happened.

Turns out that 1.26 release is exactly where a bunch of Apple improvements have been announced. Unfortunately, playing with that new --darwin-version-min flag, or the new CPU targets (which are only available up to A17, the "predecessor" to M4 in the iPhone space) mentioned above, don't make a difference. I couldn't immediately find if those iPhone skews have support for vector extensions at all..?

Jasper-Bekkers · 2025-12-24T10:57:38Z

Yeah I closed thar PR because later I realized why there was a big delta: I was profiling on battery.

MarijnS95 · 2025-12-24T11:17:02Z

@Jasper-Bekkers Oh I'm also exclusively developing on battery (the perks of Apple putting RTGs in these MacBooks 🤤) but the ±37ms vs ±45ms regression remains consistent.

pbrubaker · 2026-01-06T19:35:49Z

Hey all, apologies for the delay. I'm going to tag @aneshlya but I would create an issue on the ispc GitHub and link this issue. That's the best way to report these kinds of things right now.

aneshlya · 2026-01-07T00:17:46Z

Hi there, I don't have Apple HW to test performance on, but it would be helpful if you can share the ispc code corresponding to the benchmark. I have a theory what ispc commit is guilty. I can create a custom build of ispc for you to test in your environment and check if the regression disappears (if you're OK with it).

As Pete mentioned, don't hesitate to submit such bugs to ispc/ispc repo.

https://github.com/ispc/ispc/releases/v1.23.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/20482968113

https://github.com/ispc/ispc/releases/tag/v1.24.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/20483103780

https://github.com/ispc/ispc/releases/tag/v1.25.3 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/20483121741

https://github.com/ispc/ispc/releases/tag/v1.26.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/20483271384

https://github.com/ispc/ispc/releases/tag/v1.27.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/15157304944

https://github.com/ispc/ispc/releases/tag/v1.28.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/16948395642

https://github.com/ispc/ispc/releases/tag/v1.29.1

MarijnS95 · 2026-01-07T11:54:14Z

Thanks for the reply @pbrubaker. Best to track this at the official ISPC repo as I was merely using this pull request to first bisect and track where the performance regression was happening and/or if our invocation arguments are at fault. Issue is opened at ispc/ispc#3688.

@aneshlya I'd be more than happy to try a custom-built ispc on these kernels, thanks!

MarijnS95 · 2026-02-05T15:40:01Z

src/ispc/downsample_ispc.rs

+#[allow(clippy::unnecessary_operation, clippy::identity_op)]
+const _: () = {
+    ["Size of WeightDimensions"][::std::mem::size_of::<WeightDimensions>() - 12usize];
+    ["Alignment of WeightDimensions"][::std::mem::align_of::<WeightDimensions>() - 4usize];


bindgen picks the latest MSRV by default, we should perhaps see if it can be configured through ispc_compile if this is breaking; and set our rust-version in Cargo.toml accordingly too.

https://github.com/ispc/ispc/releases/tag/v1.30.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/21717824119

MarijnS95 · 2026-02-05T15:49:13Z

Downsample `square_test.png` using ispc_downsampler
                        time:   [39.530 ms 39.542 ms 39.554 ms]
                        change: [+2.1826% +2.2370% +2.2959%] (p = 0.00 < 0.05)
                        Performance has regressed.

Performance is indeed mostly restored in 1.30, thanks @aneshlya!

pbrubaker · 2026-02-05T17:00:24Z

Downsample `square_test.png` using ispc_downsampler
                        time:   [39.530 ms 39.542 ms 39.554 ms]
                        change: [+2.1826% +2.2370% +2.2959%] (p = 0.00 < 0.05)
                        Performance has regressed.

Performance is indeed mostly restored in 1.30, thanks @aneshlya!

Glad to hear it!

aneshlya · 2026-02-05T18:08:39Z

Performance is indeed mostly restored in 1.30.

Thanks for checking! I've also added ispc-downsampler to our CI but we can only check stability on open source runners.

MarijnS95 · 2026-02-10T12:25:46Z

That little 2% regression mostly seems to be noise, it fluctuates a bit on retesting, and doesn't seem to be affected by locally compiling the ISPC kernel (as opposed to the one from CI, i.e. with slightly different host detection).

The suggested i32x4 change from Traverse-Research/intel-tex-rs-2#42 seems to consistently make things about 0.5-1ms slower, though?

Downsample `square_test.png` using ispc_downsampler
                        time:   [38.503 ms 38.570 ms 38.636 ms]
                        change: [+2.0616% +3.1908% +4.0920%] (p = 0.00 < 0.05)
                        Performance has regressed.

MarijnS95 requested a review from Jasper-Bekkers May 21, 2025 08:36

Jasper-Bekkers approved these changes May 21, 2025

View reviewed changes

Jasper-Bekkers approved these changes May 23, 2025

View reviewed changes

MarijnS95 force-pushed the ispc-1.27 branch from 2dd4da5 to 3556673 Compare May 26, 2025 10:29

MarijnS95 marked this pull request as draft May 26, 2025 10:31

MarijnS95 changed the title ~~Regenerate binaries on ISPC 1.27~~ Regenerate binaries on ISPC 1.28 Aug 13, 2025

MarijnS95 force-pushed the ispc-1.27 branch from 6dbaf34 to 00d0256 Compare August 13, 2025 20:44

MarijnS95 changed the title ~~Regenerate binaries on ISPC 1.28~~ Regenerate binaries on ISPC 1.29.1 Dec 24, 2025

MarijnS95 force-pushed the ispc-1.27 branch from 00d0256 to 59f82a0 Compare December 24, 2025 08:44

MarijnS95 added 10 commits January 7, 2026 12:37

CI/generate-binaries: Merge artifacts for simplified download

6e2e6e3

Fix Rust 1.88 clippy lints

f63260a

Regenerate binaries on ISPC 1.23

6d7f84f

https://github.com/ispc/ispc/releases/v1.23.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/20482968113

Regenerate binaries on ISPC 1.24

322b467

https://github.com/ispc/ispc/releases/tag/v1.24.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/20483103780

Regenerate binaries on ISPC 1.25.3

79ad01e

https://github.com/ispc/ispc/releases/tag/v1.25.3 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/20483121741

Regenerate binaries on ISPC 1.26.0

3d12fb8

https://github.com/ispc/ispc/releases/tag/v1.26.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/20483271384

Regenerate binaries on ISPC 1.27

348736d

https://github.com/ispc/ispc/releases/tag/v1.27.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/15157304944

FIXUP: Cleanup CI names

2ac5179

Regenerate binaries on ISPC 1.28

d7e2d9e

https://github.com/ispc/ispc/releases/tag/v1.28.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/16948395642

Regenerate binaries on ISPC 1.29.1

408afc3

https://github.com/ispc/ispc/releases/tag/v1.29.1

MarijnS95 force-pushed the ispc-1.27 branch from 59f82a0 to 408afc3 Compare January 7, 2026 11:37

MarijnS95 mentioned this pull request Jan 7, 2026

Apple Silicon (at least M4) performance regression since ISPC 1.26 ispc/ispc#3688

Closed

MarijnS95 changed the title ~~Regenerate binaries on ISPC 1.29.1~~ Regenerate binaries on ISPC 1.30.0 Feb 5, 2026

MarijnS95 commented Feb 5, 2026

View reviewed changes

Regenerate binaries on ISPC 1.30.0

165460a

https://github.com/ispc/ispc/releases/tag/v1.30.0 https://github.com/Traverse-Research/ispc-downsampler/actions/runs/21717824119

MarijnS95 force-pushed the ispc-1.27 branch from 48c4321 to 165460a Compare February 5, 2026 15:49

MarijnS95 mentioned this pull request Feb 5, 2026

Regenerate binaries on ISPC 1.30 Traverse-Research/intel-tex-rs-2#43

Open

MarijnS95 marked this pull request as ready for review February 10, 2026 12:27

Comments

Conversation

MarijnS95 commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarijnS95 commented May 21, 2025

Uh oh!

MarijnS95 commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

On the MacBook Air M4

Main @ 6e7b616 (ISPC 1.20...)

This ispc-1.27 PR @ 3556673

Uh oh!

MarijnS95 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR @ 00d0256

Uh oh!

MarijnS95 commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarijnS95 commented Dec 24, 2025

ISPC 1.23 @ 754d4bf

ISPC 1.24 @

ISPC 1.25.3

ISPC 1.26

Uh oh!

Jasper-Bekkers commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarijnS95 commented Dec 24, 2025

Uh oh!

pbrubaker commented Jan 6, 2026

Uh oh!

aneshlya commented Jan 7, 2026

Uh oh!

MarijnS95 commented Jan 7, 2026

Uh oh!

MarijnS95 Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

MarijnS95 commented Feb 5, 2026

Uh oh!

pbrubaker commented Feb 5, 2026

Uh oh!

aneshlya commented Feb 5, 2026

Uh oh!

MarijnS95 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MarijnS95 commented May 21, 2025 •

edited

Loading

MarijnS95 commented May 26, 2025 •

edited

Loading

Main @ `6e7b616` (ISPC 1.20...)

This `ispc-1.27` PR @ `3556673`

MarijnS95 commented Aug 13, 2025 •

edited

Loading

This PR @ `00d0256`

MarijnS95 commented Dec 24, 2025 •

edited

Loading

ISPC `1.23` @ `754d4bf`

ISPC `1.24` @

ISPC `1.25.3`

ISPC `1.26`

Jasper-Bekkers commented Dec 24, 2025 •

edited

Loading

MarijnS95 commented Feb 10, 2026 •

edited

Loading