Stack shuffler: use mark-on-enqueue for bringUpTargetSlot by clonker · Pull Request #16499 · argotorg/solidity

clonker · 2026-03-04T15:08:31Z

Fix a BFS deduplication bug in bringUpTargetSlot where the same offset could be enqueued multiple times before being visited, by marking offsets as seen on enqueue rather than on dequeue.

cameel · 2026-03-05T10:25:41Z

 		{
 			auto offset = *toVisit.begin();
 			toVisit.erase(toVisit.begin());
-			visited.emplace(offset);


Am I correct that this bug would never result in an infinite loop, but just make things inefficient?

If I understand things correctly, this visit() meant that once a graph node was drawn from toVisit for the first time, we would stop adding new copies of it. So from that point on we had a potentially large but finite number of copies to go through. We were guaranteed to draw a different node at the latest when we were through all those copies. So in the end we would be guaranteed to finish going through all the nodes.

Yes the original implementation had a faulty implementation of BFS and it also never leads to an infinite loop but it can lead to an extremely large toVisit list.

cameel · 2026-03-05T10:32:02Z

This needs a changelog entry (as a bugfix). I guess that like in #16486 the only observable effect is performance improvement?

Can we at least narrow down cases it would affect from user's perspective? Did it affect compiler performance in common cases (just to a lesser degree) or did it show up only in pathological corner cases?

I benchmarked it on my desktop and laptop against eigenlayer and saw a ~1% performance increase. Didn't think it significant enough. But we can add a changelog entry, sure. Fine with either. :)

The outcome of the shuffler (ie of this method) should be exactly the same compared to the previous version. just more efficient.

or did it show up only in pathological corner cases?

Its severity depends on the density of the dependency graph, in particular two or more paths reaching nextOffset. So things where the same yul variable has to go into many target positions are bad. Not sure how representative eigenlayer is but I recon it's big enough that we'd see something more significant if it was a frequent issue. Or it's an issue bad enough that people work around it much like with stack too deep and that makes it hard to find good examples. I am not sure.

But we can add a changelog entry, sure. Fine with either. :)

In this case I think it is warranted regardless of how much it improves performance, just on the basis of it being a bug with effects clearly observable to the user.

But I'd mention the general performance improvement too. If it's a reproducible 1% on a real-life contract and not just a random fluke from a single run, it's still not negligible.

It's reproducible on my machine at least.

msooseth · 2026-03-05T13:11:27Z

With input b3.yul.gz we get:

devel branch:

        Command being timed: "./solc/solc --strict-assembly --optimize-yul b3.yul"
        User time (seconds): 125.72
        System time (seconds): 1.13
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:07.34
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2705356
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 674667
        Voluntary context switches: 1
        Involuntary context switches: 2994
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

this branch:

        Command being timed: "./solc/solc --strict-assembly --optimize-yul b3.yul"
        User time (seconds): 0.15
        System time (seconds): 0.00
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.15
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 17384
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 817
        Voluntary context switches: 1
        Involuntary context switches: 3
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

perf report for the slow one:

+  100.00%     0.00%  solc     solc                  [.] _start                                                                                                                             ▒
+  100.00%     0.00%  solc     libc.so.6             [.] __libc_start_main                                                                                                                  ◆
+  100.00%     0.00%  solc     libc.so.6             [.] 0x00007f5fe4d186c1                                                                                                                 ▒
+  100.00%     0.00%  solc     solc                  [.] main                                                                                                                               ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::frontend::CommandLineInterface::run(int, char const* const*)                                                             ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::frontend::CommandLineInterface::assembleYul(solidity::yul::Language, solidity::yul::YulStack::Machine)                   ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::yul::YulStack::optimize()                                                                                                ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::yul::ObjectOptimizer::optimize(solidity::yul::Object&, solidity::yul::ObjectOptimizer::Settings const&, bool)            ▒
+  100.00%     0.00%  solc     solc                  [.] solidity::yul::OptimiserSuite::run(solidity::yul::GasMeter const*, solidity::yul::Object&, bool, std::basic_string_view<char, std::▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG const&, solidity::yul::EVMDialect const&)               ▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG const&, solidity::yul::YulString, solidity::yul::EVMDial▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackCompressor::run(solidity::yul::Object const&, bool, unsigned long)                                             ▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::processEntryPoint(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo cons▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo const*)    ▒
+   99.99%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo const*)::{l▒
+   99.98%     0.00%  solc     solc                  [.] void solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidi▒
+   99.92%     0.01%  solc     solc                  [.] solidity::yul::Shuffler<solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicB▒
+   95.84%    45.68%  solc     solc                  [.] solidity::yul::Shuffler<solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicB▒
+   49.19%    49.15%  solc     solc                  [.] decltype(auto) std::__do_visit<std::__detail::__variant::__variant_idx_cookie, std::__detail::__variant::__compare<bool, std::varia▒
+    2.15%     1.54%  solc     solc                  [.] std::pair<std::_Rb_tree_iterator<unsigned long>, bool> std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, s▒
+    0.85%     0.17%  solc     libstdc++.so.6.0.34   [.] operator new(unsigned long)                                                                                                        ▒
+    0.75%     0.75%  solc     libc.so.6             [.] malloc                                                                                                                             ▒
+    0.62%     0.62%  solc     libc.so.6             [.] cfree                                                                                                                              ▒
+    0.59%     0.00%  solc     libc.so.6             [.] 0x00007f5fe4d970bf                                                                                                                 ▒
+    0.51%     0.00%  solc     libc.so.6             [.] 0x00007f5fe4d95789                                                                                                                 ▒

Perf report for the fast one:

+   99.60%     0.00%  solc     solc                  [.] _start                                                                                                                             ◆
+   99.60%     0.00%  solc     libc.so.6             [.] __libc_start_main                                                                                                                  ▒
+   99.19%     0.00%  solc     libc.so.6             [.] 0x00007ffa767186c1                                                                                                                 ▒
+   99.19%     0.00%  solc     solc                  [.] main                                                                                                                               ▒
+   99.19%     0.00%  solc     solc                  [.] solidity::frontend::CommandLineInterface::run(int, char const* const*)                                                             ▒
+   99.07%     0.00%  solc     solc                  [.] solidity::frontend::CommandLineInterface::assembleYul(solidity::yul::Language, solidity::yul::YulStack::Machine)                   ▒
+   97.62%     0.00%  solc     solc                  [.] solidity::yul::YulStack::optimize()                                                                                                ▒
+   97.45%     0.00%  solc     solc                  [.] solidity::yul::ObjectOptimizer::optimize(solidity::yul::Object&, solidity::yul::ObjectOptimizer::Settings const&, bool)            ▒
+   97.45%     0.00%  solc     solc                  [.] solidity::yul::OptimiserSuite::run(solidity::yul::GasMeter const*, solidity::yul::Object&, bool, std::basic_string_view<char, std::▒
+   92.67%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG const&, solidity::yul::EVMDialect const&)               ▒
+   92.67%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG const&, solidity::yul::YulString, solidity::yul::EVMDial▒
+   91.94%     0.00%  solc     solc                  [.] solidity::yul::StackCompressor::run(solidity::yul::Object const&, bool, unsigned long)                                             ▒
+   91.67%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::processEntryPoint(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo cons▒
+   88.92%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo const*)    ▒
+   88.92%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::yul::CFG::FunctionInfo const*)::{l▒
+   86.25%     5.73%  solc     solc                  [.] void solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidi▒
+   24.89%     6.70%  solc     solc                  [.] solidity::yul::Shuffler<solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicB▒
+   19.86%    17.76%  solc     solc                  [.] solidity::yul::Multiplicity::operator[](std::variant<solidity::yul::FunctionCallReturnLabelSlot, solidity::yul::FunctionReturnLabel▒
+   18.59%    18.59%  solc     solc                  [.] decltype(auto) std::__do_visit<std::__detail::__variant::__variant_idx_cookie, std::__detail::__variant::__compare<bool, std::varia▒
+   17.23%    17.23%  solc     solc                  [.] bool boost::multiprecision::default_ops::eval_lt<boost::multiprecision::backends::cpp_int_backend<256ul, 256ul, (boost::multiprecis▒
+   14.45%    10.47%  solc     solc                  [.] solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicBlock const&, solidity::y▒
+    7.85%     7.85%  solc     solc                  [.] solidity::yul::Multiplicity::at(std::variant<solidity::yul::FunctionCallReturnLabelSlot, solidity::yul::FunctionReturnLabelSlot, so▒
+    4.43%     0.00%  solc     solc                  [.] solidity::yul::OptimiserSuite::runSequence(std::basic_string_view<char, std::char_traits<char> >, solidity::yul::Block&, bool)     ▒
+    4.31%     0.11%  solc     solc                  [.] solidity::yul::OptimiserSuite::runSequence(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char▒
+    3.23%     1.29%  solc     solc                  [.] solidity::yul::Shuffler<solidity::yul::createStackLayout<solidity::yul::StackLayoutGenerator::fillInJunk(solidity::yul::CFG::BasicB▒
+    3.04%     0.00%  solc     solc                  [.] solidity::yul::ASTModifier::operator()(solidity::yul::Block&)                                                                      ▒
+    2.85%     0.00%  solc     solc                  [.] solidity::yul::DataFlowAnalyzer::operator()(solidity::yul::Block&)                                                                 ▒
+    2.26%     0.00%  solc     solc                  [.] (anonymous namespace)::findStackTooDeep(std::vector<std::variant<solidity::yul::FunctionCallReturnLabelSlot, solidity::yul::Functio▒
+    2.26%     0.00%  solc     solc                  [.] void solidity::yul::createStackLayout<(anonymous namespace)::findStackTooDeep(std::vector<std::variant<solidity::yul::FunctionCallR▒
+    2.01%     0.19%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::propagateStackThroughOperation(std::vector<std::variant<solidity::yul::FunctionCallReturnLabel▒
+    1.84%     0.00%  solc     solc                  [.] solidity::yul::DataFlowAnalyzer::operator()(solidity::yul::VariableDeclaration&)                                                   ▒
+    1.82%     0.00%  solc     solc                  [.] std::vector<std::variant<solidity::yul::FunctionCallReturnLabelSlot, solidity::yul::FunctionReturnLabelSlot, solidity::yul::Variabl▒
+    1.64%     1.64%  solc     libc.so.6             [.] malloc                                                                                                                             ▒
+    1.54%     0.00%  solc     solc                  [.] solidity::yul::DataFlowAnalyzer::operator()(solidity::yul::FunctionDefinition&)                                                    ▒
+    1.52%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::reportStackTooDeep(solidity::yul::CFG::BasicBlock const&) const                               ▒
+    1.27%     0.00%  solc     solc                  [.] solidity::yul::StackLayoutGenerator::propagateStackThroughBlock(std::vector<std::variant<solidity::yul::FunctionCallReturnLabelSlot▒
+    1.11%     1.11%  solc     libstdc++.so.6.0.34   [.] std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)             ▒
+    1.05%     0.00%  solc     solc                  [.] solidity::yul::YulStack::assemble(solidity::yul::YulStack::Machine)                                                                ▒
+    1.05%     0.00%  solc     solc                  [.] solidity::yul::YulStack::assembleWithDeployed(std::optional<std::basic_string_view<char, std::char_traits<char> > >)               ▒
+    1.05%     0.00%  solc     solc                  [.] solidity::yul::YulStack::assembleEVMWithDeployed(std::optional<std::basic_string_view<char, std::char_traits<char> > >)            ▒
+    1.04%     1.04%  solc     libc.so.6             [.] cfree                                                                                                                              ▒
+    1.01%     0.00%  solc     solc                  [.] solidity::yul::DataFlowAnalyzer::handleAssignment(std::set<solidity::yul::YulString, std::less<solidity::yul::YulString>, std::allo▒
+    0.99%     0.39%  solc     libstdc++.so.6.0.34   [.] operator new(unsigned long)                                                                                                        ▒

cameel · 2026-03-05T13:48:21Z

 Language Features:

 Compiler Features:
+* ViaIR Code Generator: Improve stack shuffler performance by fixing a BFS deduplication issue in ``bringUpTargetSlot``.


This is not in the codegen. Also, I don't think implementation details are relevant here (this is meant for users), just how it affects the operation of the compiler.

Suggested change

* ViaIR Code Generator: Improve stack shuffler performance by fixing a BFS deduplication issue in ``bringUpTargetSlot``.

* Yul EVM Code Transform: Improve stack shuffler performance by fixing a BFS deduplication issue.

how is it not in the codegen? :) it's the stack shuffler that is directly responsible for codegen. we just hit it sooner by proxy of the optimizer trying to compress the stack which in itself invokes codegen

i've used your suggestion now but yeah, i guess i am a bit fuzzy on what precisely constitutes codegen

i guess i am a bit fuzzy on what precisely constitutes codegen

I never noticed it until now, but you have a point here. We do seem to have two overlapping meanings for it, which makes things ambiguous. I guess you can refer to everything in the pipeline past analysis as "Yul codegen" or "IR-based code generator", but in the project it's most often used to describe specifically the Solidity AST->Yul transformation. We have more specific names for other parts (Yul->EVM transform, optimizer, bytecode generation) and we refer to the whole as a "code generation pipeline".

Perhaps we should introduce a new, more precise term, but I'm afraid this usage "Yul codegen" may be too entrenched by now. The way it's used e.g. in Solidity IR-based Codegen Changes seems to adhere to the narrower scope. And the term is also how we refer to optimizations that happen in, well, Yul codegen :) We introduced that term in #14650 and it was my idea, but many people saw the PR, including Daniel and no one even said a word that it was wrong or ambiguous, so I assume the whole team shares that understanding of "codegen".

Thanks for bringing it into context, it makes much more sense to me now!

…aversal

clonker force-pushed the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch from 93e3c88 to 165be80 Compare March 4, 2026 15:18

clonker marked this pull request as ready for review March 4, 2026 15:42

cameel reviewed Mar 5, 2026

View reviewed changes

cameel added the performance 🐎 label Mar 5, 2026

clonker force-pushed the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch from 165be80 to 97aab59 Compare March 5, 2026 11:14

cameel reviewed Mar 5, 2026

View reviewed changes

Comment thread test/libyul/yulOptimizerTests/fullSuite/stack_shuffler_bring_up_target_slot_bfs_dedup.yul Outdated

clonker force-pushed the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch 2 times, most recently from 208fa7a to afeb6a4 Compare March 5, 2026 14:09

Stack shuffler: use mark-on-enqueue for bring up target slot stack tr…

d153bfe

…aversal

clonker force-pushed the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch from afeb6a4 to d153bfe Compare March 5, 2026 22:29

cameel approved these changes Mar 9, 2026

View reviewed changes

clonker merged commit 6313134 into develop Mar 9, 2026
83 checks passed

clonker deleted the mark-on-enqueue-for-bring-up-target-slot-stack-traversal branch March 9, 2026 10:50

This was referenced Mar 10, 2026

Add via SSA CFG in CLI and Standard Json Interface #16503

Merged

Optimize cfg side effects collector and function reference resolver #16486

Merged

clonker mentioned this pull request Mar 16, 2026

repair changelog: move performance optimization from 0.8.34 to 0.8.35 and reclassify BFS deduplication issue as bugfix #16525

Merged

	* ViaIR Code Generator: Improve stack shuffler performance by fixing a BFS deduplication issue in ``bringUpTargetSlot``.
	* Yul EVM Code Transform: Improve stack shuffler performance by fixing a BFS deduplication issue.

Conversation

clonker commented Mar 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cameel Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cameel Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msooseth commented Mar 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cameel Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cameel Mar 5, 2026 •

edited

Loading

cameel Mar 5, 2026 •

edited

Loading

cameel Mar 9, 2026 •

edited

Loading