Optimize rule-matching by reordering the OutSection-Rule-Sections loop#820
Optimize rule-matching by reordering the OutSection-Rule-Sections loop#820parth-07 wants to merge 1 commit intoqualcomm:mainfrom
Conversation
81a7bc3 to
60b7158
Compare
|
I like this approach, and potential savings. |
lib/Object/ObjectBuilder.cpp
Outdated
| break; | ||
| } | ||
| } | ||
| bool shouldBreak = false; |
There was a problem hiding this comment.
| bool shouldBreak = false; |
This shouldBreak is currently a no-op
There was a problem hiding this comment.
Why do you think it is a no-op? It's value is used at the end of the for (auto *Out : SectionMap) loop body to break out of this loop.
There was a problem hiding this comment.
Previously the inner-loop was not correctly setting it to true. It really was a no-op before. I have fixed the inner-loop to correctly set it true when we should break the outer loop as well.
| } | ||
| // For all output sections. | ||
| for (auto *Out : SectionMap) { | ||
| if (ELFSect) { |
There was a problem hiding this comment.
| if (ELFSect) { | |
| if (Section->getOutputSection() && !IsRetrySect) | |
| break; | |
| if (ELFSect) { |
There was a problem hiding this comment.
if (Section->getOutputSection() && !IsRetrySect) check needs to stay within the input-section-description loop (for (auto *In : *Out)). Otherwise, we will keep traversing input section descriptions even though the section has already matched an input section description.
| // RuleMatchTimes[In] += std::chrono::system_clock::now() - Start; | ||
| } // end each rule | ||
| if (shouldBreak) | ||
| break; |
There was a problem hiding this comment.
We need shouldBreak here to signal that more output sections do not need to be traversed for this input section.
This commit improves the performance of linker script rule-matching by
reordering the nested OutputSection-LinkerScriptRule-InputSections loop.
Currently, the nested loop is as follows:
```
for (auto O : OutSections) {
for (auto R : O.rules()) {
for (auto S : InputSections) {
// ...
}
}
}
```
This patch changes the nested loop to:
```
for (auto S : InputSections) {
for (auto O : OutSections) {
for (auto R : O.rules()) {
// ...
}
}
}
```
The new loop structure has the following benefits:
- Once an input section is matched to a rule, we waste no iterations
over it. On the contrary, in the old design, all the input sections
are always iterated over for EACH rule.
- We do a lot of computation for a section. In the old design, all the
computations are repeated for EACH rule, whereas in the new design all
the computations happen once-per-section in the outermost loop.
On a contrived test case that make good use of linker script
rule-matching, this patch results in ~15% performance improvement
for the rule-matching phase.
Signed-off-by: Parth Arora <partaror@qti.qualcomm.com>
60b7158 to
a49fa5c
Compare
This commit improves the performance of linker script rule-matching by reordering the nested OutputSection-LinkerScriptRule-InputSections loop.
Currently, the nested loop is as follows:
This patch changes the nested loop to:
The new loop structure has the following benefits:
On a contrived test case that make good use of linker script rule-matching, this patch results in ~15% performance improvement for the rule-matching phase.