Skip to content

Stack shuffler: use mark-on-enqueue for bringUpTargetSlot#16499

Merged
clonker merged 1 commit intodevelopfrom
mark-on-enqueue-for-bring-up-target-slot-stack-traversal
Mar 9, 2026
Merged

Stack shuffler: use mark-on-enqueue for bringUpTargetSlot#16499
clonker merged 1 commit intodevelopfrom
mark-on-enqueue-for-bring-up-target-slot-stack-traversal

Conversation

@clonker
Copy link
Copy Markdown
Member

@clonker clonker commented Mar 4, 2026

Fix a BFS deduplication bug in bringUpTargetSlot where the same offset could be enqueued multiple times before being visited, by marking offsets as seen on enqueue rather than on dequeue.

@clonker clonker force-pushed the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch from 93e3c88 to 165be80 Compare March 4, 2026 15:18
@clonker clonker marked this pull request as ready for review March 4, 2026 15:42
{
auto offset = *toVisit.begin();
toVisit.erase(toVisit.begin());
visited.emplace(offset);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that this bug would never result in an infinite loop, but just make things inefficient?

If I understand things correctly, this visit() meant that once a graph node was drawn from toVisit for the first time, we would stop adding new copies of it. So from that point on we had a potentially large but finite number of copies to go through. We were guaranteed to draw a different node at the latest when we were through all those copies. So in the end we would be guaranteed to finish going through all the nodes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the original implementation had a faulty implementation of BFS and it also never leads to an infinite loop but it can lead to an extremely large toVisit list.

Comment thread libyul/backends/evm/StackHelpers.h
Copy link
Copy Markdown
Collaborator

@cameel cameel Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a changelog entry (as a bugfix). I guess that like in #16486 the only observable effect is performance improvement?

Can we at least narrow down cases it would affect from user's perspective? Did it affect compiler performance in common cases (just to a lesser degree) or did it show up only in pathological corner cases?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked it on my desktop and laptop against eigenlayer and saw a ~1% performance increase. Didn't think it significant enough. But we can add a changelog entry, sure. Fine with either. :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outcome of the shuffler (ie of this method) should be exactly the same compared to the previous version. just more efficient.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or did it show up only in pathological corner cases?

Its severity depends on the density of the dependency graph, in particular two or more paths reaching nextOffset. So things where the same yul variable has to go into many target positions are bad. Not sure how representative eigenlayer is but I recon it's big enough that we'd see something more significant if it was a frequent issue. Or it's an issue bad enough that people work around it much like with stack too deep and that makes it hard to find good examples. I am not sure.

Copy link
Copy Markdown
Collaborator

@cameel cameel Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can add a changelog entry, sure. Fine with either. :)

In this case I think it is warranted regardless of how much it improves performance, just on the basis of it being a bug with effects clearly observable to the user.

But I'd mention the general performance improvement too. If it's a reproducible 1% on a real-life contract and not just a random fluke from a single run, it's still not negligible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reproducible on my machine at least.

@clonker clonker force-pushed the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch from 165be80 to 97aab59 Compare March 5, 2026 11:14
@msooseth
Copy link
Copy Markdown
Contributor

msooseth commented Mar 5, 2026

With input b3.yul.gz we get:

devel branch:

        Command being timed: "./solc/solc --strict-assembly --optimize-yul b3.yul"
        User time (seconds): 125.72
        System time (seconds): 1.13
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:07.34
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2705356
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 674667
        Voluntary context switches: 1
        Involuntary context switches: 2994
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

this branch:

        Command being timed: "./solc/solc --strict-assembly --optimize-yul b3.yul"
        User time (seconds): 0.15
        System time (seconds): 0.00
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.15
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 17384
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 817
        Voluntary context switches: 1
        Involuntary context switches: 3
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

perf report for the slow one:

+  100.00%     0.00%  solc     solc                  [.] _start                                                                                                                             ▒
+  100.00%     0.00%  solc     libc.so.6             [.] __libc_start_main                                                                                                                  ◆
+  100.00%     0.00%  solc     libc.so.6             [.] 0x00007f5fe4d186c1                                                                                                                 ▒
+  100.00%     0.00%  solc     solc                  [.] main                                                                                                                               ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::frontend::CommandLineInterface::run(int, char const* const*)                                                             ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::frontend::CommandLineInterface::assembleYul(solidity::yul::Language, solidity::yul::YulStack::Machine)                   ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::yul::YulStack::optimize()                                                                                                ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::yul::ObjectOptimizer::optimize(solidity::yul::Object&, solidity::yul::ObjectOptimizer::Settings const&, bool)            ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::yul::OptimiserSuite::run(solidity::yul::GasMeter const*, solidity::yul::Object&, bool, std::basic_string_view<char, std::▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG const&, solidity::yul::EVMDialect const&)               ▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG const&, solidity::yul::YulString, solidity::yul::EVMDial▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackCompressor::run(solidity::yul::Object const&, bool, unsigned long)                                             ▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::processEntryPoint(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo cons▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo const*)    ▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo const*)::{l▒
+   99.98%     0.00%  solc     solc                  [.] void solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidi▒
+   99.92%     0.01%  solc     solc                  [.] solidity::yul::Shuffler<solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicB▒
+   95.84%    45.68%  solc     solc                  [.] solidity::yul::Shuffler<solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicB▒
+   49.19%    49.15%  solc     solc                  [.] decltype(auto) std::__do_visit<std::__detail::__variant::__variant_idx_cookie, std::__detail::__variant::__compare<bool, std::varia▒
+    2.15%     1.54%  solc     solc                  [.] std::pair<std::_Rb_tree_iterator<unsigned long>, bool> std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, s▒
+    0.85%     0.17%  solc     libstdc++.so.6.0.34   [.] operator new(unsigned long)                                                                                                        ▒
+    0.75%     0.75%  solc     libc.so.6             [.] malloc                                                                                                                             ▒
+    0.62%     0.62%  solc     libc.so.6             [.] cfree                                                                                                                              ▒
+    0.59%     0.00%  solc     libc.so.6             [.] 0x00007f5fe4d970bf                                                                                                                 ▒
+    0.51%     0.00%  solc     libc.so.6             [.] 0x00007f5fe4d95789                                                                                                                 ▒

Perf report for the fast one:

+   99.60%     0.00%  solc     solc                  [.] _start                                                                                                                             ◆
+   99.60%     0.00%  solc     libc.so.6             [.] __libc_start_main                                                                                                                  ▒
+   99.19%     0.00%  solc     libc.so.6             [.] 0x00007ffa767186c1                                                                                                                 ▒
+   99.19%     0.00%  solc     solc                  [.] main                                                                                                                               ▒
+   99.19%     0.00%  solc     solc                  [.] solidity::frontend::CommandLineInterface::run(int, char const* const*)                                                             ▒
+   99.07%     0.00%  solc     solc                  [.] solidity::frontend::CommandLineInterface::assembleYul(solidity::yul::Language, solidity::yul::YulStack::Machine)                   ▒
+   97.62%     0.00%  solc     solc                  [.] solidity::yul::YulStack::optimize()                                                                                                ▒
+   97.45%     0.00%  solc     solc                  [.] solidity::yul::ObjectOptimizer::optimize(solidity::yul::Object&, solidity::yul::ObjectOptimizer::Settings const&, bool)            ▒
+   97.45%     0.00%  solc     solc                  [.] solidity::yul::OptimiserSuite::run(solidity::yul::GasMeter const*, solidity::yul::Object&, bool, std::basic_string_view<char, std::▒
+   92.67%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG const&, solidity::yul::EVMDialect const&)               ▒
+   92.67%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG const&, solidity::yul::YulString, solidity::yul::EVMDial▒
+   91.94%     0.00%  solc     solc                  [.] solidity::yul::StackCompressor::run(solidity::yul::Object const&, bool, unsigned long)                                             ▒
+   91.67%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::processEntryPoint(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo cons▒
+   88.92%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo const*)    ▒
+   88.92%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo const*)::{l▒
+   86.25%     5.73%  solc     solc                  [.] void solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidi▒
+   24.89%     6.70%  solc     solc                  [.] solidity::yul::Shuffler<solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicB▒
+   19.86%    17.76%  solc     solc                  [.] solidity::yul::Multiplicity::operator[](std::variant<solidity::yul::FunctionCallReturnLabelSlot, solidity::yul::FunctionReturnLabel▒
+   18.59%    18.59%  solc     solc                  [.] decltype(auto) std::__do_visit<std::__detail::__variant::__variant_idx_cookie, std::__detail::__variant::__compare<bool, std::varia▒
+   17.23%    17.23%  solc     solc                  [.] bool boost::multiprecision::default_ops::eval_lt<boost::multiprecision::backends::cpp_int_backend<256ul, 256ul, (boost::multiprecis▒
+   14.45%    10.47%  solc     solc                  [.] solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::y▒
+    7.85%     7.85%  solc     solc                  [.] solidity::yul::Multiplicity::at(std::variant<solidity::yul::FunctionCallReturnLabelSlot, solidity::yul::FunctionReturnLabelSlot, so▒
+    4.43%     0.00%  solc     solc                  [.] solidity::yul::OptimiserSuite::runSequence(std::basic_string_view<char, std::char_traits<char> >, solidity::yul::Block&, bool)     ▒
+    4.31%     0.11%  solc     solc                  [.] solidity::yul::OptimiserSuite::runSequence(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char▒
+    3.23%     1.29%  solc     solc                  [.] solidity::yul::Shuffler<solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicB▒
+    3.04%     0.00%  solc     solc                  [.] solidity::yul::ASTModifier::operator()(solidity::yul::Block&)                                                                      ▒
+    2.85%     0.00%  solc     solc                  [.] solidity::yul::DataFlowAnalyzer::operator()(solidity::yul::Block&)                                                                 ▒
+    2.26%     0.00%  solc     solc                  [.] (anonymous namespace)::findStackTooDeep(std::vector<std::variant<solidity::yul::FunctionCallReturnLabelSlot, solidity::yul::Functio▒
+    2.26%     0.00%  solc     solc                  [.] void solidity::yul::createStackLayout<(anonymous namespace)::findStackTooDeep(std::vector<std::variant<solidity::yul::FunctionCallR▒
+    2.01%     0.19%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::propagateStackThroughOperation(std::vector<std::variant<solidity::yul::FunctionCallReturnLabel▒
+    1.84%     0.00%  solc     solc                  [.] solidity::yul::DataFlowAnalyzer::operator()(solidity::yul::VariableDeclaration&)                                                   ▒
+    1.82%     0.00%  solc     solc                  [.] std::vector<std::variant<solidity::yul::FunctionCallReturnLabelSlot, solidity::yul::FunctionReturnLabelSlot, solidity::yul::Variabl▒
+    1.64%     1.64%  solc     libc.so.6             [.] malloc                                                                                                                             ▒
+    1.54%     0.00%  solc     solc                  [.] solidity::yul::DataFlowAnalyzer::operator()(solidity::yul::FunctionDefinition&)                                                    ▒
+    1.52%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG::BasicBlock const&) const                               ▒
+    1.27%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::propagateStackThroughBlock(std::vector<std::variant<solidity::yul::FunctionCallReturnLabelSlot▒
+    1.11%     1.11%  solc     libstdc++.so.6.0.34   [.] std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)             ▒
+    1.05%     0.00%  solc     solc                  [.] solidity::yul::YulStack::assemble(solidity::yul::YulStack::Machine)                                                                ▒
+    1.05%     0.00%  solc     solc                  [.] solidity::yul::YulStack::assembleWithDeployed(std::optional<std::basic_string_view<char, std::char_traits<char> > >)               ▒
+    1.05%     0.00%  solc     solc                  [.] solidity::yul::YulStack::assembleEVMWithDeployed(std::optional<std::basic_string_view<char, std::char_traits<char> > >)            ▒
+    1.04%     1.04%  solc     libc.so.6             [.] cfree                                                                                                                              ▒
+    1.01%     0.00%  solc     solc                  [.] solidity::yul::DataFlowAnalyzer::handleAssignment(std::set<solidity::yul::YulString, std::less<solidity::yul::YulString>, std::allo▒
+    0.99%     0.39%  solc     libstdc++.so.6.0.34   [.] operator new(unsigned long)                                                                                                        ▒

Comment thread Changelog.md Outdated
Language Features:

Compiler Features:
* ViaIR Code Generator: Improve stack shuffler performance by fixing a BFS deduplication issue in ``bringUpTargetSlot``.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not in the codegen. Also, I don't think implementation details are relevant here (this is meant for users), just how it affects the operation of the compiler.

Suggested change
* ViaIR Code Generator: Improve stack shuffler performance by fixing a BFS deduplication issue in ``bringUpTargetSlot``.
* Yul EVM Code Transform: Improve stack shuffler performance by fixing a BFS deduplication issue.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is it not in the codegen? :) it's the stack shuffler that is directly responsible for codegen. we just hit it sooner by proxy of the optimizer trying to compress the stack which in itself invokes codegen

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've used your suggestion now but yeah, i guess i am a bit fuzzy on what precisely constitutes codegen

Copy link
Copy Markdown
Collaborator

@cameel cameel Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess i am a bit fuzzy on what precisely constitutes codegen

I never noticed it until now, but you have a point here. We do seem to have two overlapping meanings for it, which makes things ambiguous. I guess you can refer to everything in the pipeline past analysis as "Yul codegen" or "IR-based code generator", but in the project it's most often used to describe specifically the Solidity AST->Yul transformation. We have more specific names for other parts (Yul->EVM transform, optimizer, bytecode generation) and we refer to the whole as a "code generation pipeline".

Perhaps we should introduce a new, more precise term, but I'm afraid this usage "Yul codegen" may be too entrenched by now. The way it's used e.g. in Solidity IR-based Codegen Changes seems to adhere to the narrower scope. And the term is also how we refer to optimizations that happen in, well, Yul codegen :) We introduced that term in #14650 and it was my idea, but many people saw the PR, including Daniel and no one even said a word that it was wrong or ambiguous, so I assume the whole team shares that understanding of "codegen".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing it into context, it makes much more sense to me now!

@clonker clonker force-pushed the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch 2 times, most recently from 208fa7a to afeb6a4 Compare March 5, 2026 14:09
@clonker clonker force-pushed the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch from afeb6a4 to d153bfe Compare March 5, 2026 22:29
@clonker clonker merged commit 6313134 into develop Mar 9, 2026
83 checks passed
@clonker clonker deleted the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch March 9, 2026 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants