Vectorize GaussianBlur effect.#2086
Conversation
|
Hey Jonathan! I'll need to find some time to do a closer review of the changes, but I think the main considerations are maintainability (e.g. perhaps there are some vectorization utilities that could be factored out for other effects) and that extra memory usage requirement. |
pedropaulosuzuki
left a comment
There was a problem hiding this comment.
Having the Gaussian Blur be separated into two passes (horizontal/vertical) is a huge improvement, even more than the SIMD stuff (which is also very good to have). Even though we probably want to use the GPU to do those calculations in the future, having a fast CPU fallback implementation is a good idea.
About using LLMs, I think it's not a good idea, as all the commercial models are unethically trained on copyrighted data, without respect for Free and Open Source licenses. So even if the code is reviewed and made good, I'd still probably be careful on using LLM generated code, even if it is improved afterwards by human hands, as it still taints the project and people will look at it with bad eyes (myself included). Though, as this MR is from Pinta's original creator, maybe people won't care. Just my two cents.
| } | ||
|
|
||
| // Scalar tail | ||
| for (; i < length; ++i) |
There was a problem hiding this comment.
Maybe it would be more elegant to use a while loop here, as we're not using the variable declaration expression, which makes it look a bit weird.
while (i < length) {
// code
++i;
}Same for the previous loop, though the increment being on the end might create a small disconnection, due to that loop doing more things.
|
Happy to hear more input from others in the community on this, since it shouldn't just be my decision :) The other thing to think about is that if we do add some sort of policy, how would it be enforced? It's easy to detect slop, and possibly more generative cases like this, but it's quite possible for the changes to be indistinguishable from something a contributor could have written themselves. |
|
My contributions to this specific project are mostly impact-driven (having powerful and permissively-licensed software tools), not an academic or artistic exercise, so in principle I support any tool that helps get things done correctly and with good quality. I am not someone to adopt fads, but AI is here to stay, and in the jurisdictions I operate in, there are basically no problems with copyright as long as there is a human in the loop. I sometimes use it for speed, to avoid spending lots of time searching through the documentation and do some of the boring work, and in any case I curate and refine the output, because I want to know exactly what my changes do. I am not a fan of policies. But if I had to draw a line it would be those pull requests where there is zero or insufficient human oversight. "Insufficient" can be subjective or case-by-case, but if it works I am okay with that. |
I don't mind LLMs, in the sense that I don't mind automating a. menial tasks (like batch conversion) that b. I'm fully capable of understanding and operating myself, via LLMs that only refers to the source material itself and doesn't infringe on other people's intellectual property. However, I'm very anti-vibe coding for moral (and environmental) reasons, knowing that all popular LLMs have been illicitly trained on other people's code without any permission like @pedropaulosuzuki wrote and the slippery slope it's creating; start throwing some AI sprinkles here and there, humans taking a backseat over AI in the name of convenience and ending up with a future discussion over 'reclaiming intellectual ownership' of a creative project seeming quaint, as well as AI now referring to its own wasteland of a database, since humans won't be coding anymore (dead internet theory and all that). I don't care as much about whether we could be using AI on open-source projects because it's not illegal, as I care about whether we should be using AI to begin with. Obviously people can go about it as they wish, it's just my own personal opinion on the matter. |
|
Adding two references of what I believe are good LLM policies: GNOME Calendar and LibAdwaita: They have a good compromise of allowing the usage of LLMs for translating issue text and comments, which help people who don't speak english and allow them to be able to contribute and speak with everyone else. They also allow using LLMs for studying how things work when the documentation is lacking, which is also acceptable, although they advise against it when possible, as LLMs are unpredictable by nature and can (and possibly will) give false information. For anything else, using LLMs is not allowed.
About this, it's the same as "if people commit a crime and don't report, then we might not know they did it, so it's best to allow them to kill and torture people around, because at least we will know when it happens". If we can't detect something, then we can't detect it, there's not much we can do. But allowing those tools to be used freely is not a good idea, and I'd probably leave the project if "LLM assisted contributions" (except for the exceptions listed above) were accepted. |
|
As this is blocked at the moment due to the discussion on LLM usage, I made a small refactor on the previous Gaussian Blur implementation (25fee0b) as an exercise, since the original implementation was very messy and hard to read. While doing so, I had a small gain in performance (radius 100 - 6000x4000 image went from 3 minutes to 2 minutes and 40 seconds, also another 3000x3000 image went from 1 minute and 10 seconds to one minute). Nowhere near as impressive as the gains from this PR, as I only did the horizontal/vertical separation, and none of the SIMD/multithreading. My goal was not to open a PR, as I'm sure that @jpobst would be quick to implement the SIMD/Multithreaded stuff even without LLMs, but I just wanted to throw this here, as the LLM generated code is also a bit messy and hard to read (though more readable than the original implementation), even though it is way, way faster. Also, my rewrite also suffers from not updating live (it could if we did the two passes as two separate filters, which we could do in the future), so it's probably not suitable to merge anyway. My goal was to try to find ways of improving the readability of this code, even more so for the future GPU acceleration of this and other functions. (#2097) |
I'm not sure what Pinta's policy on AI assisted code is, but one thing it's pretty good at is vectorizing code. I let it rewrite the algorithm in
GaussianBlurEffectto use modern .NET's hardware accelerated intrinsics that weren't available 15+ years ago.The results are pretty impressive on my AMD Ryzen 5900X (
PintaBenchmarks):There are 2 tradeoffs:
GaussianBlurEffectwere also marked as not Tileable.AI Summary: Separable Gaussian Blur with SIMD Accumulation
What changed
File: Pinta.Effects/Effects/GaussianBlurEffect.cs — complete rewrite of the Render method
Algorithm change: 2D → Separable 1D + 1D
The old implementation applied the Gaussian kernel as a combined 2D operation using a sliding window. The new implementation decomposes it into two sequential 1D passes:
This reduces per-pixel work from O(kernel²) to O(2·kernel), giving a ~100× speedup at radius=100.
SIMD optimization
The vertical pass inner loop uses Vector256 intrinsics to process 8 intermediate values per SIMD iteration (widening from int to long via Vector256.Widen, then multiply-accumulate).
Memory efficiency
Row accumulators use ArrayPool.Shared to avoid per-row heap allocation.