⚡️ Speed up method _DocxPartitioner._header_footer_text by 11%
#60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 11% (0.11x) speedup for
_DocxPartitioner._header_footer_textinunstructured/partition/docx.py⏱️ Runtime :
251 microseconds→226 microseconds(best of250runs)📝 Explanation and details
The optimization replaces a generator-based approach with a direct list accumulation, resulting in an 11% performance improvement.
Key Changes:
iter_hdrftr_texts()generator function that yielded text items, then filtered and joined them in a generator expression.Why This is Faster:
The original code had multiple layers of abstraction: a generator function that yielded items, then a generator expression that filtered empty strings, and finally a join operation. Each generator creates overhead for Python's iterator protocol. The optimized version eliminates this by:
Performance Characteristics:
This optimization is particularly effective for document parsing workloads where headers/footers are processed frequently, as it reduces the per-call overhead without changing the algorithmic complexity.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_DocxPartitioner._header_footer_text-mjdvlnc9and push.