Stack shuffler: use mark-on-enqueue for bringUpTargetSlot#16499
Stack shuffler: use mark-on-enqueue for bringUpTargetSlot#16499
Conversation
93e3c88 to
165be80
Compare
| { | ||
| auto offset = *toVisit.begin(); | ||
| toVisit.erase(toVisit.begin()); | ||
| visited.emplace(offset); |
There was a problem hiding this comment.
Am I correct that this bug would never result in an infinite loop, but just make things inefficient?
If I understand things correctly, this visit() meant that once a graph node was drawn from toVisit for the first time, we would stop adding new copies of it. So from that point on we had a potentially large but finite number of copies to go through. We were guaranteed to draw a different node at the latest when we were through all those copies. So in the end we would be guaranteed to finish going through all the nodes.
There was a problem hiding this comment.
Yes the original implementation had a faulty implementation of BFS and it also never leads to an infinite loop but it can lead to an extremely large toVisit list.
There was a problem hiding this comment.
This needs a changelog entry (as a bugfix). I guess that like in #16486 the only observable effect is performance improvement?
Can we at least narrow down cases it would affect from user's perspective? Did it affect compiler performance in common cases (just to a lesser degree) or did it show up only in pathological corner cases?
There was a problem hiding this comment.
I benchmarked it on my desktop and laptop against eigenlayer and saw a ~1% performance increase. Didn't think it significant enough. But we can add a changelog entry, sure. Fine with either. :)
There was a problem hiding this comment.
The outcome of the shuffler (ie of this method) should be exactly the same compared to the previous version. just more efficient.
There was a problem hiding this comment.
or did it show up only in pathological corner cases?
Its severity depends on the density of the dependency graph, in particular two or more paths reaching nextOffset. So things where the same yul variable has to go into many target positions are bad. Not sure how representative eigenlayer is but I recon it's big enough that we'd see something more significant if it was a frequent issue. Or it's an issue bad enough that people work around it much like with stack too deep and that makes it hard to find good examples. I am not sure.
There was a problem hiding this comment.
But we can add a changelog entry, sure. Fine with either. :)
In this case I think it is warranted regardless of how much it improves performance, just on the basis of it being a bug with effects clearly observable to the user.
But I'd mention the general performance improvement too. If it's a reproducible 1% on a real-life contract and not just a random fluke from a single run, it's still not negligible.
There was a problem hiding this comment.
It's reproducible on my machine at least.
165be80 to
97aab59
Compare
|
With input b3.yul.gz we get: devel branch: this branch: perf report for the slow one: Perf report for the fast one: |
| Language Features: | ||
|
|
||
| Compiler Features: | ||
| * ViaIR Code Generator: Improve stack shuffler performance by fixing a BFS deduplication issue in ``bringUpTargetSlot``. |
There was a problem hiding this comment.
This is not in the codegen. Also, I don't think implementation details are relevant here (this is meant for users), just how it affects the operation of the compiler.
| * ViaIR Code Generator: Improve stack shuffler performance by fixing a BFS deduplication issue in ``bringUpTargetSlot``. | |
| * Yul EVM Code Transform: Improve stack shuffler performance by fixing a BFS deduplication issue. |
There was a problem hiding this comment.
how is it not in the codegen? :) it's the stack shuffler that is directly responsible for codegen. we just hit it sooner by proxy of the optimizer trying to compress the stack which in itself invokes codegen
There was a problem hiding this comment.
i've used your suggestion now but yeah, i guess i am a bit fuzzy on what precisely constitutes codegen
There was a problem hiding this comment.
i guess i am a bit fuzzy on what precisely constitutes codegen
I never noticed it until now, but you have a point here. We do seem to have two overlapping meanings for it, which makes things ambiguous. I guess you can refer to everything in the pipeline past analysis as "Yul codegen" or "IR-based code generator", but in the project it's most often used to describe specifically the Solidity AST->Yul transformation. We have more specific names for other parts (Yul->EVM transform, optimizer, bytecode generation) and we refer to the whole as a "code generation pipeline".
Perhaps we should introduce a new, more precise term, but I'm afraid this usage "Yul codegen" may be too entrenched by now. The way it's used e.g. in Solidity IR-based Codegen Changes seems to adhere to the narrower scope. And the term is also how we refer to optimizations that happen in, well, Yul codegen :) We introduced that term in #14650 and it was my idea, but many people saw the PR, including Daniel and no one even said a word that it was wrong or ambiguous, so I assume the whole team shares that understanding of "codegen".
There was a problem hiding this comment.
Thanks for bringing it into context, it makes much more sense to me now!
208fa7a to
afeb6a4
Compare
afeb6a4 to
d153bfe
Compare
Fix a BFS deduplication bug in
bringUpTargetSlotwhere the same offset could be enqueued multiple times before being visited, by marking offsets as seen on enqueue rather than on dequeue.