Vectorize GaussianBlur effect. by jpobst · Pull Request #2086 · PintaProject/Pinta

jpobst · 2026-03-31T07:05:53Z

I'm not sure what Pinta's policy on AI assisted code is, but one thing it's pretty good at is vectorizing code. I let it rewrite the algorithm in GaussianBlurEffect to use modern .NET's hardware accelerated intrinsics that weren't available 15+ years ago.

The results are pretty impressive on my AMD Ryzen 5900X (PintaBenchmarks):

Method	Mean	Error	StdDev	Allocated
Current GaussianBlurEffect (Radius = 5)	794.88 ms	12.031 ms	11.254 ms	920 B
Current GaussianBlurEffect (Radius = 25)	3,341.77 ms	12.794 ms	11.341 ms	1080 B
Current GaussianBlurEffect (Radius = 45)	5,923.35 ms	39.630 ms	35.131 ms	1240 B
Vectorized GaussianBlurEffect (Radius = 5)	20.81 ms	0.415 ms	0.919 ms	64038820 B
Vectorized GaussianBlurEffect (Radius = 25)	64.29 ms	1.262 ms	1.240 ms	64039722 B
Vectorized GaussianBlurEffect (Radius = 45)	98.91 ms	0.955 ms	0.893 ms	64037248 B

There are 2 tradeoffs:

Increased memory use - The algorithm uses 16 bytes per pixel while it is running. For example 64 MB for the 2000x2000 test image in PintaBenchmark used above.
IsTileable = false - The algorithm relies on operating on the entire image at once, so it does not support progressive preview rendering. It simply shows the finished image when it is completely done. Perhaps progressive preview was an artifact of its time and ideally we can be fast enough that it isn't really needed any more. 😉 Other effects that call GaussianBlurEffect were also marked as not Tileable.

AI Summary: Separable Gaussian Blur with SIMD Accumulation

What changed

File: Pinta.Effects/Effects/GaussianBlurEffect.cs — complete rewrite of the Render method

Algorithm change: 2D → Separable 1D + 1D

The old implementation applied the Gaussian kernel as a combined 2D operation using a sliding window. The new implementation decomposes it into two sequential 1D passes:

Horizontal pass — convolves each source row with the 1D kernel, storing Σ w·B, Σ w·G, Σ w·R, Σ w·A per pixel into int[] intermediate arrays
Vertical pass — convolves each intermediate column with the 1D kernel per ROI row, then computes final output

This reduces per-pixel work from O(kernel²) to O(2·kernel), giving a ~100× speedup at radius=100.

SIMD optimization

The vertical pass inner loop uses Vector256 intrinsics to process 8 intermediate values per SIMD iteration (widening from int to long via Vector256.Widen, then multiply-accumulate).

Memory efficiency

Row accumulators use ArrayPool.Shared to avoid per-row heap allocation.

cameronwhite · 2026-04-01T00:35:31Z

Hey Jonathan!
I don't have any objection in principle with this type of change - it's more the low effort / untested AI pull requests that bother me :)

I'll need to find some time to do a closer review of the changes, but I think the main considerations are maintainability (e.g. perhaps there are some vectorization utilities that could be factored out for other effects) and that extra memory usage requirement.

pedropaulosuzuki

Having the Gaussian Blur be separated into two passes (horizontal/vertical) is a huge improvement, even more than the SIMD stuff (which is also very good to have). Even though we probably want to use the GPU to do those calculations in the future, having a fast CPU fallback implementation is a good idea.

About using LLMs, I think it's not a good idea, as all the commercial models are unethically trained on copyrighted data, without respect for Free and Open Source licenses. So even if the code is reviewed and made good, I'd still probably be careful on using LLM generated code, even if it is improved afterwards by human hands, as it still taints the project and people will look at it with bad eyes (myself included). Though, as this MR is from Pinta's original creator, maybe people won't care. Just my two cents.

pedropaulosuzuki · 2026-04-01T11:48:58Z

 		}
+
+		// Scalar tail
+		for (; i < length; ++i)


Maybe it would be more elegant to use a while loop here, as we're not using the variable declaration expression, which makes it look a bit weird.

while (i < length) { // code ++i; }

Same for the previous loop, though the increment being on the end might create a small disconnection, due to that loop doing more things.

cameronwhite · 2026-04-02T21:49:05Z

Happy to hear more input from others in the community on this, since it shouldn't just be my decision :)

The other thing to think about is that if we do add some sort of policy, how would it be enforced? It's easy to detect slop, and possibly more generative cases like this, but it's quite possible for the changes to be indistinguishable from something a contributor could have written themselves.

Lehonti · 2026-04-03T01:41:55Z

My contributions to this specific project are mostly impact-driven (having powerful and permissively-licensed software tools), not an academic or artistic exercise, so in principle I support any tool that helps get things done correctly and with good quality.

I am not someone to adopt fads, but AI is here to stay, and in the jurisdictions I operate in, there are basically no problems with copyright as long as there is a human in the loop. I sometimes use it for speed, to avoid spending lots of time searching through the documentation and do some of the boring work, and in any case I curate and refine the output, because I want to know exactly what my changes do.

I am not a fan of policies. But if I had to draw a line it would be those pull requests where there is zero or insufficient human oversight. "Insufficient" can be subjective or case-by-case, but if it works I am okay with that.

yioannides · 2026-04-04T13:41:20Z

Happy to hear more input from others in the community on this, since it shouldn't just be my decision :)

I don't mind LLMs, in the sense that I don't mind automating a. menial tasks (like batch conversion) that b. I'm fully capable of understanding and operating myself, via LLMs that only refers to the source material itself and doesn't infringe on other people's intellectual property.

However, I'm very anti-vibe coding for moral (and environmental) reasons, knowing that all popular LLMs have been illicitly trained on other people's code without any permission like @pedropaulosuzuki wrote and the slippery slope it's creating; start throwing some AI sprinkles here and there, humans taking a backseat over AI in the name of convenience and ending up with a future discussion over 'reclaiming intellectual ownership' of a creative project seeming quaint, as well as AI now referring to its own wasteland of a database, since humans won't be coding anymore (dead internet theory and all that).

I don't care as much about whether we could be using AI on open-source projects because it's not illegal, as I care about whether we should be using AI to begin with. Obviously people can go about it as they wish, it's just my own personal opinion on the matter.

pedropaulosuzuki · 2026-04-13T12:58:46Z

Adding two references of what I believe are good LLM policies:

GNOME Calendar and LibAdwaita:

This project does not allow contributions generated by large languages models (LLMs) and chatbots.
This ban includes tools like ChatGPT, Claude, Copilot, DeepSeek, and Devin AI. We are taking these
steps as precaution due to the potential negative influence of AI generated content on quality, as
well as likely copyright violations.
This ban of AI generated content applies to all parts of the projects, including, but not limited
to, code, documentation, issues, and artworks. An exception applies for purely translating texts
for issues and comments to English.
AI tools can be used to answer questions and find information. However, we encourage contributors
to avoid them in favor of using [existing documentation](https://developer.gnome.org/) and our
[chats and forums](https://welcome.gnome.org/). Since AI generated information is frequently
misleading or false, we cannot supply support on anything referencing AI output.

They have a good compromise of allowing the usage of LLMs for translating issue text and comments, which help people who don't speak english and allow them to be able to contribute and speak with everyone else. They also allow using LLMs for studying how things work when the documentation is lacking, which is also acceptable, although they advise against it when possible, as LLMs are unpredictable by nature and can (and possibly will) give false information.

For anything else, using LLMs is not allowed.

The other thing to think about is that if we do add some sort of policy, how would it be enforced? It's easy to detect slop, and possibly more generative cases like this, but it's quite possible for the changes to be indistinguishable from something a contributor could have written themselves.

About this, it's the same as "if people commit a crime and don't report, then we might not know they did it, so it's best to allow them to kill and torture people around, because at least we will know when it happens". If we can't detect something, then we can't detect it, there's not much we can do. But allowing those tools to be used freely is not a good idea, and I'd probably leave the project if "LLM assisted contributions" (except for the exceptions listed above) were accepted.

pedropaulosuzuki · 2026-04-14T01:38:38Z

As this is blocked at the moment due to the discussion on LLM usage, I made a small refactor on the previous Gaussian Blur implementation (25fee0b) as an exercise, since the original implementation was very messy and hard to read. While doing so, I had a small gain in performance (radius 100 - 6000x4000 image went from 3 minutes to 2 minutes and 40 seconds, also another 3000x3000 image went from 1 minute and 10 seconds to one minute). Nowhere near as impressive as the gains from this PR, as I only did the horizontal/vertical separation, and none of the SIMD/multithreading.

My goal was not to open a PR, as I'm sure that @jpobst would be quick to implement the SIMD/Multithreaded stuff even without LLMs, but I just wanted to throw this here, as the LLM generated code is also a bit messy and hard to read (though more readable than the original implementation), even though it is way, way faster. Also, my rewrite also suffers from not updating live (it could if we did the two passes as two separate filters, which we could do in the future), so it's probably not suitable to merge anyway. My goal was to try to find ways of improving the readability of this code, even more so for the future GPU acceleration of this and other functions. (#2097)

Vectorize GaussianBlur effect.

c2e5978

pedropaulosuzuki reviewed Apr 1, 2026

View reviewed changes

cameronwhite mentioned this pull request Apr 11, 2026

feat: add quality configuration for saving WebP images #2094

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize GaussianBlur effect.#2086

Vectorize GaussianBlur effect.#2086
jpobst wants to merge 1 commit into
PintaProject:masterfrom
jpobst:vectorized-gaussian-blur

jpobst commented Mar 31, 2026

Uh oh!

cameronwhite commented Apr 1, 2026

Uh oh!

pedropaulosuzuki left a comment

Uh oh!

pedropaulosuzuki Apr 1, 2026

Uh oh!

cameronwhite commented Apr 2, 2026

Uh oh!

Lehonti commented Apr 3, 2026 •

edited

Loading

Uh oh!

yioannides commented Apr 4, 2026

Uh oh!

pedropaulosuzuki commented Apr 13, 2026

Uh oh!

pedropaulosuzuki commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jpobst commented Mar 31, 2026

AI Summary: Separable Gaussian Blur with SIMD Accumulation

What changed

Algorithm change: 2D → Separable 1D + 1D

SIMD optimization

Memory efficiency

Uh oh!

cameronwhite commented Apr 1, 2026

Uh oh!

pedropaulosuzuki left a comment

Choose a reason for hiding this comment

Uh oh!

pedropaulosuzuki Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

cameronwhite commented Apr 2, 2026

Uh oh!

Lehonti commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yioannides commented Apr 4, 2026

Uh oh!

pedropaulosuzuki commented Apr 13, 2026

Uh oh!

pedropaulosuzuki commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Lehonti commented Apr 3, 2026 •

edited

Loading

pedropaulosuzuki commented Apr 14, 2026 •

edited

Loading