Skip to content

Optimize rule-matching by reordering the OutSection-Rule-Sections loop#820

Open
parth-07 wants to merge 1 commit intoqualcomm:mainfrom
parth-07:OptimizeRuleMatching
Open

Optimize rule-matching by reordering the OutSection-Rule-Sections loop#820
parth-07 wants to merge 1 commit intoqualcomm:mainfrom
parth-07:OptimizeRuleMatching

Conversation

@parth-07
Copy link
Contributor

@parth-07 parth-07 commented Feb 12, 2026

This commit improves the performance of linker script rule-matching by reordering the nested OutputSection-LinkerScriptRule-InputSections loop.

Currently, the nested loop is as follows:

for (auto O : OutSections) {
  for (auto R : O.rules()) {
    for (auto S : InputSections) {
      // ...
    }
  }
}

This patch changes the nested loop to:

for (auto S : InputSections) {
  for (auto O : OutSections) {
    for (auto R : O.rules()) {
      // ...
    }
  }
}

The new loop structure has the following benefits:

  • Once an input section is matched to a rule, we waste no iterations over it. On the contrary, in the old design, all the input sections are always iterated over for EACH rule.
  • We do a lot of computation for a section. In the old design, all the computations are repeated for EACH rule, whereas in the new design all the computations happen once-per-section in the outermost loop.

On a contrived test case that make good use of linker script rule-matching, this patch results in ~15% performance improvement for the rule-matching phase.

@parth-07 parth-07 force-pushed the OptimizeRuleMatching branch 2 times, most recently from 81a7bc3 to 60b7158 Compare February 12, 2026 18:32
@quic-seaswara
Copy link
Contributor

I like this approach, and potential savings.

break;
}
}
bool shouldBreak = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool shouldBreak = false;

This shouldBreak is currently a no-op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it is a no-op? It's value is used at the end of the for (auto *Out : SectionMap) loop body to break out of this loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the inner-loop was not correctly setting it to true. It really was a no-op before. I have fixed the inner-loop to correctly set it true when we should break the outer loop as well.

}
// For all output sections.
for (auto *Out : SectionMap) {
if (ELFSect) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ELFSect) {
if (Section->getOutputSection() && !IsRetrySect)
break;
if (ELFSect) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (Section->getOutputSection() && !IsRetrySect) check needs to stay within the input-section-description loop (for (auto *In : *Out)). Otherwise, we will keep traversing input section descriptions even though the section has already matched an input section description.

// RuleMatchTimes[In] += std::chrono::system_clock::now() - Start;
} // end each rule
if (shouldBreak)
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
break;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need shouldBreak here to signal that more output sections do not need to be traversed for this input section.

This commit improves the performance of linker script rule-matching by
reordering the nested OutputSection-LinkerScriptRule-InputSections loop.

Currently, the nested loop is as follows:

```
for (auto O : OutSections) {
  for (auto R : O.rules()) {
    for (auto S : InputSections) {
      // ...
    }
  }
}
```

This patch changes the nested loop to:

```
for (auto S : InputSections) {
  for (auto O : OutSections) {
    for (auto R : O.rules()) {
      // ...
    }
  }
}
```

The new loop structure has the following benefits:

- Once an input section is matched to a rule, we waste no iterations
  over it. On the contrary, in the old design, all the input sections
  are always iterated over for EACH rule.
- We do a lot of computation for a section. In the old design, all the
  computations are repeated for EACH rule, whereas in the new design all
  the computations happen once-per-section in the outermost loop.

On a contrived test case that make good use of linker script
rule-matching, this patch results in ~15% performance improvement
for the rule-matching phase.

Signed-off-by: Parth Arora <partaror@qti.qualcomm.com>
@parth-07 parth-07 force-pushed the OptimizeRuleMatching branch from 60b7158 to a49fa5c Compare March 17, 2026 10:35
@parth-07 parth-07 marked this pull request as ready for review March 17, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants