Skip to content

Conversation

@heikkitoivonen
Copy link
Contributor

@heikkitoivonen heikkitoivonen commented Jan 8, 2026

Avoid doing list.pop(0) in a loop, and use indexes instead.

pyperf report:

Mean +- std dev: [decode_header_baseline] 6.21 us +- 0.05 us ->
[decode_header_optimized] 6.11 us +- 0.06 us: 1.02x faster

Avoid doing `list.pop(0)` in a loop, and use indexes instead.

pyperf reports:
```
Mean +- std dev: [decode_header_baseline] 6.21 us +- 0.05 us ->
[decode_header_optimized] 6.11 us +- 0.06 us: 1.02x faster
```
Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you benchmark a version where we do parts.reverse() up front and then parts.pop()?

That would preserve some of the readability of the original — introducing more state with an index makes the code meaningfully more complicated for me

Comment on lines +104 to +109
charset = parts[i].lower()
i += 1
encoding = parts[i].lower()
i += 1
encoded = parts[i]
i += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may get out of range. You need a check after each i += 1. Because of that I'm not really interested in a 2% speed-up. Usually, we are interested in 10% speed-up when it comes down to a microbenchmark. In addition, your benchmark includes the import time as it's not part of the setup.

Copy link
Contributor Author

@heikkitoivonen heikkitoivonen Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code has the same issue, 3 pop(0) calls in succession without checking if any elements remain. So my code retains the same behavior.

I'll create a proper benchmark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was wrong. The split would always have 3 parts because of the regex.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by the change. This is only a 2% increase for a micro-benchmark which also doesn't isolate the import statement (and thus is counted in the overall time).

@bedevere-app
Copy link

bedevere-app bot commented Jan 9, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@heikkitoivonen
Copy link
Contributor Author

Could you benchmark a version where we do parts.reverse() up front and then parts.pop()?

That would preserve some of the readability of the original — introducing more state with an index makes the code meaningfully more complicated for me

I initially considered changing the logic to process the list from the end to start, but decided against that because then I would need to reorder everything inside the loop, which in my opinion would be even harder to reason about. reverse() has the same issue. But I will think some more about this.

@heikkitoivonen
Copy link
Contributor Author

With the attached bm_email_header.py, which uses 3 strings from unit tests, I see 1.03x improvement:

Mean +- std dev: [decode_header_baseline] 30.4 us +- 0.4 us -> 
[decode_header_optimized] 29.6 us +- 0.2 us: 1.03x faster

bm_email_header.py

@picnixz
Copy link
Member

picnixz commented Jan 9, 2026

I'm still not convinced even with 3% improvements. Usually:

  • 2-3% improvements on macro benchmarks are fine.
  • =10% improvements on micro benchmarks are fine.

In addition, microbenchmarks for this specific function aren't really useful IMO. We should see how if E2E examples would benefit from it. In addition your benchmarks time 3 function calls and not just one (you're evaluating 3 times the function), so there is still noise.

I am still -1 on this one also because adding indices reduces readability.

but decided against that because then I would need to reorder everything inside the loo

I'm not sure I follow. With reverse(), all .pop(0) calls become .pop() calls. So I don't know why you'd need to reorder anything.

@heikkitoivonen
Copy link
Contributor Author

Modifying the benchmark file to just keep the test for middle one:

Mean +- std dev: [decode_header_baseline] 6.07 us +- 0.06 us ->
[decode_header_optimized] 5.94 us +- 0.04 us: 1.02x faster

You are right, reverse() and pop() seems to work, I don't know why it seemed wrong to me. I run the benchmark on that as well, and it is a little worse in performance than the index approach:

Mean +- std dev: [decode_header_baseline] 6.07 us +- 0.06 us ->
[decode_header_optimized_pops] 6.03 us +- 0.03 us: 1.01x faster

If you feel this is not significant enough, feel free to close. Or let me know if you want me to modify the PR for reverse() and pop().

@picnixz
Copy link
Member

picnixz commented Jan 9, 2026

I run the benchmark on that as well, and it is a little worse in performance than the index approach

We don't always select the fast approach because sometimes readability counts as much as performance. If we lose readability at the cost of a 1-3% speed-up in microbenchmarks, we usually don't bother about the change. Thanks for the suggestion but I'm closing the PR.

@picnixz picnixz closed this Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants